A unified YOLO-based framework for multi-animal pose detection, tracking, and behavioural analysis of Caenorhabditis elegans.
Deep-Pose-Tracker: a unified model for behavioural analysis of Caenorhabditis elegans
Saha D., Chaudhary S., Vyas D., Roy A.G., Sharma R.
The model is composed of the following features:
- Pose detection of C. elegans with head-tail identification
- Quantification of eigenworms
- Worm tracking and speed (average and instantaneous) measurement
- Spatial exploration or trajectory extent measurement
- Orientation of motion detection
- Identifying forward-reverse movement
- Detection of omega turns
- Multi-class (worms and eggs) detection and counting
Note:
- All the codes are optimized such that they can be used for a single worm as well as multiple worms.
- The codes work on videos as well as images (wherever applicable).
- You can keep multiple files in a folder, and give the folder path as input. That will work as well.
- All the analysis results will be saved inside the
outputsfolder. For example, if you are running thepose.ipynbfile on some video, the outputs will be saved in theoutputs/pose/run1folder. If you run this file several times, respective outputs will be saved inrun2,run3, ... inside the mainoutputs/posefolder.
The repository consists of the following files-
- All the codes for the analysis of different behavioural features.
- Pretrained weights for pose detection, detection of worms and eggs, detecting and counting C. elegans worms, and bacteria counting.
- Sample input videos for practice.
- Pose detection sample examples.
- Training dataset for pose estimation and worms and eggs detection.
- Spreadsheet containing video-level splitting of the training dataset for pose detection.
- Weight files for pose estimation and worms and eggs detection.
- Tracking metrics for evaluating MOTA, IDF1, etc.
- Optimization datasets for omega turn and reversal detection.
- Eigenworm dataset.
Here are the steps to install Deep-Pose-Tracker. We highly recommend installing the dependencies in a separate environment. Here we have shown the steps using conda. One can use python3 as well for creating the environments.
-
Clone the repository
git clone https://github.com/cebpLab/Deep-Pose-Tracker.gitcd Deep-Pose-Tracker -
Create the Conda environment
conda env create -f environment.yml -
Activate the environment
conda activate deep-pose-tracker -
Verify installation
yolo help
python -c "import motmetrics; print('motmetrics OK')"
Annotations are performed in Roboflow. It is suitable for labelling images to train models for object detection, pose detection, image segmentation, and classification tasks. But there are other platforms like CVAT where you can label your images.
The complete training datasets for "pose detection" and "worms and eggs detection" are made available in this Google Drive link. Training images and labels are both provided.
The datasets are divided into three subsets:
- Training set: used for model training
- Validation set: used for evaluation
- Test set: used exclusively for final evaluations, like tracking and pose accuracies.
Deep-Pose-Tracker:
- train: 7383 images
- valid: 400 images
- test: 157 images
Worms and eggs detection dataset:
- train: 963 images
- valid: 49 images
- test: 39 images
Deep-Pose-Tracker dataset:
- Each annotation contains:
class id, x, y, w, h, keypoints - keypoints are stored as (x, y, v), where the visibility flag v is used as
v = 0: keypoint not labelled (out-of-frame or deleted)
v = 1: keypoint labelled but occluded
v = 2: keypoint visible - total 11 keypoints
During evaluation, keypoints with v = 1 and 2 are considered. Keypoints with v = 0 are excluded from evaluation.
Worms and eggs detection dataset:
- Each annotation contains:
class id, x, y, w, h
The training process is the same as the standard training procedure of YOLO. In this model, we have used YOLOv8 for all the analysis. The training was performed on a custom dataset with labelled images of C. elegans. Here are the following details that we followed during training:
- A total of 3018 images were taken for training different YOLOv8 architectures.
- We trained different YOLOv8 architectures (
medium,large, andextra) forposedetection. - Training was performed on
$1024 \times 1024$ input image size.
Here is the detailed training procedure. We assume that you have properly installed untralytics.
-
Import YOLO.
from ultralytics import YOLO -
Define the model on which you want to perform the training. Here is the list of different YOLOv8 models for pose detection.
model = YOLO("yolo8m-pose.pt") -
Once the model is defined, you can now start the training.
model.train(data="/path/to/data.yaml", epochs=100, save=True, batch=32, val=True, device=[0,1], imgsz=640)
A snapshot of the training is shown below:
That's it. Now let's unpack each of the parameters defined here.
-
The
data.yamlfile consists of the details of the training dataset, which includes the names of individual classes with class indices, and paths of the folders containing training, validation and test datasets. In our case, there are only one classwormwith class index0. -
epochsis the number of times you want to repeat or iterate the training process. Ideally, we start with a small number (such as 100, 200), but we can increase this value as required. -
The
saveflag is used to save the training outputs. -
batchis an important parameter which defines the number of images on which we want the model to be trained at once. This factor depends on the hardware configuration (GPU memory), network architecture and input image size. -
valis set to beTrueto ensure the model is performing well on the validation dataset, and does not overfit. -
deviceis used to specify where you want to perform the training. If you are training on a GPU, definedevice = 0. If you are training on multiple GPUs, definedevice = [0, 1, 2, 3](depending upon the number of GPUs available). It is highly recommended to train on GPUs only. Make sure thatCUDAand compatiblePyTorchare properly installed, which are necessary for GPU training. -
imgszdefines the image size on which you want to train the network.
In point 2, we defined a model, which is basically a YOLO architecture for pose detection with pretrained weights. Here, we have chosen yolo8m-pose.pt as we are interested in training YOLO on posture data. If you are working with a detection problem, you would have to choose 'yolo8m.pt'. Here you can choose any architecture (small, medium, large, etc.), according to the requirements.
- Resize the images (by cropping) before annotations. In Roboflow, you can resize images after annotations; however, try to avoid squeezing the images to fit within the given dimensions. It would distort the spatial features in the training data, resulting in poor performance during predictions.
- Resize images into square shapes, because the YOLO architectures perform best when the inputs are given in the form of a square matrix.
- Training images should contain as much variability as possible in the data. It helps the model to learn about all the differences that can appear in an image. If you are working with pose detection, take images with complex structures as well as normal sinusoidal shapes. It helps in predicting complex shapes and enhances accuracy.
- Use augmentations to increase the number of training images by various transformations. It also increases the variability in the images, which makes the model more robust.
The training outputs are by default saved in the runs folder. Every time you run the training, a new folder with the name train1, train2, ... is created, which contains all the training outputs, including weights (inside the weights folder). The weights folder contains two weight files: best.pt (corresponding to the best performance on validation data) and last.pt (corresponding to the performance on the last training epoch). We have used the best.pt file and renamed it to yolov8x-832.pt according to the network architecture and training image size, for convenience.
This is the most important step, where you use different algorithms on real experimental data (videos or images), focusing on various assays. Here we show how to use the code and how to read the outputs. We will use the pose detection code pose.ipynb for demonstration. The same follows for all the other codes.
The first few lines of the code are shown here. Once the libraries are imported, we do the following:
-
Let's first define the model
model = YOLO("/path/to/yolov8x-832.pt")which is the YOLO architecture itself, with trained weights on our own custom data for posture detection of C. elegans. The weights can be found in the
weightsfolder. Choose the appropriate weight for your work. Here we are working with the pose detection, which is why we chose the pose weights, which are in theweights/posefolder. -
Define the
sourceon which you want to run the program. You can define the input in several ways.-
You can run on a video file. Videos with
.avi,.mp4,.movand.mkvare supported.source = "/path/to/video.mp4" -
You can run on an image file. The following image formats are supported:
.jpg,.jpeg,.pngand.bmp.source = "/path/to/image.jpg" -
If you want to run it on multiple input files in a single run, just provide the folder path which contains all the files. It may contain images as well as videos.
source = "/path/to/the/folder/"
-
With these changes, you can now run the program. The outputs will be saved in the outputs folder with the pose subfolder in it, which is defined in the base_out_dir variable. You can change the folder names, but that is optional.
The tracking metrics are evaluated to test the tracking performance of the model in overlapping conditions. Standard tracking metrics, such as MOTA, MOTP and IDF1, are measured. These metrics are calculated by comparing the model predictions with the manually annotated ground truths using the MOTChallenge algorithm, implemented through the motmetrics library.
Ground truth and predicted bounding boxes are matched for each frame of the video using the intersection-over-union (IoU). An IoU threshold of 0.5 is used, below which predictions are ignored from evaluation. The assignment of one-to-one matching between the predictions and ground truths is performed using the Hungarian algorithm as implemented in motmetrics. The evaluation code and the necessary datasets are provided here, ensuring reproducibility.
Some of the codes may need optimization before using them to get better accuracy. For example,
- The omega turns detection algorithm contains two adjustable parameters: proximity distance and bending angle, which need to be tuned on a ground truth dataset.
- The script for forward-reversal detection has three parameters: the angle threshold (
$H$ ), body index ($\alpha$ ), and buffer, which should be optimized on a ground truth dataset before using.
This is necessary because these parameters are not fixed and can vary from one experimental condition to another.
- If you are using DPT and find any issue that needs to be solved, please feel free to email us mentioning them.
- If you want to use DPT, but are having difficulties in installing the packages or understanding the workflow (annotation, training and prediction), please do not hesitate to reach out to us. We will be happy to help you.
- Ultralytics for YOLO, a state-of-the-art computer vision model, making it all open-source.
- Roboflow and CVAT for annotations facility.
If you find this work useful, please cite it with the following:
@article {Saha2026DPT,
author = {Saha, Debasish and Chaudhary, Shivam and Vyas, Dhyey and Ghosh-Roy, Anindya and Sharma, Rati},
title = {Deep-Pose-Tracker: a unified model for behavioural analysis of Caenorhabditis elegans},
elocation-id = {2025.11.23.689997},
year = {2026},
doi = {10.1101/2025.11.23.689997},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2026/05/24/2025.11.23.689997},
eprint = {https://www.biorxiv.org/content/early/2026/05/24/2025.11.23.689997.full.pdf},
journal = {bioRxiv}
}
Contact us through this email address: rati@iiserb.ac.in