A Computer Vision project for object detection and annotation management using YOLOv8, SAHI, and FiftyOne, with the primary aim of counting objects (Robben) in large aerial images.
This repository provides a complete MLOps pipeline for:
- Data Preparation: Converting raw CVAT annotations (XML) and large images into a tiled, YOLO-compatible dataset.
- Automated Experiments: Systematically training and tuning YOLOv8 models.
- Tiled Inference: Running optimized inference (SAHI) on large, high-resolution images for object counting.
- Evaluation: Assessing model performance for both detection (mAP) and counting (MAE, RMSE, R²).
- Visualization: Analyzing datasets and model predictions interactively with FiftyOne.
Pretrained model weights are available on Hugging Face: https://huggingface.co/ki-ideenwerkstatt-23/robbenblick/
The project is designed to follow a clear, sequential workflow:
- Prepare Data (
create_dataset.py): Organize your raw images and CVATannotations.xmlindata/raw/as shown below.Run the script to generate a tiled, YOLO-formatted dataset indata/raw/ ├── dataset_01/ │ ├── annotations.xml │ └── images/ └── dataset_02/ ...data/processed/and ground truth count CSVs. - Tune Model (
run_experiments.py): Define a set of hyperparameters (e.g., models, freeze layers, augmentation) inconfigs/base_iter_config.yaml. Run the script to train a model for every combination and find the best performer. - Validate Model (
yolo.py): Take therun_idof your best experiment and run validation on the hold-outtestset to get detection metrics (mAP). - Infer & Count (
predict_tiled.py): Use the bestrun_idto run sliced inference on new, large images. This script generates final counts and visual outputs. - Evaluate Counts (
evaluate_counts.py): Compare thedetection_counts.csvfrom inference against theground_truth_counts.csvto get counting metrics (MAE, RMSE). - Visualize (
run_fiftyone.py): Visually inspect your ground truth dataset or your model's predictions at any stage.
This project uses two separate configuration files, managed by robbenblick.utils.load_config.
-
configs/base_config.yaml- Purpose: The single source of truth for single runs.
- Used By:
create_dataset.py,predict_tiled.py,run_fiftyone.py, andyolo.py(for validation/single-predict). - Content: Defines static parameters like data paths (
dataset_output_dir), model (model), and inference settings (confidence_thresh).
-
configs/base_iter_config.yaml- Purpose: The configuration file for experiments and tuning.
- Used By:
run_experiments.py. - Content: Any parameter defined as a YAML list (e.g.,
model: [yolov8n.pt, yolov8s.pt]) will be iterated over.run_experiments.pywill test every possible combination of all lists.
-
Clone the repository:
git clone git@github.com:ki-iw/robbenblick.git cd robbenblick -
Create the Conda environment:
conda env create --file environment.yml conda activate RobbenBlick
-
(Optional) Install pre-commit hooks:
pre-commit install
- Purpose: Converts raw CVAT-annotated images and XML files into a YOLO-compatible dataset, including tiling and label conversion.
- How it works:
- Loads configuration from a config file.
- Scans
data/raw/for dataset subfolders. - Parses CVAT XML annotations and extracts polygons.
- Tiles large images into smaller crops based on
imgszandtile_overlapfrom the config. - Converts polygon annotations to YOLO bounding box format for each tile.
- Splits data into
train,val, andtestsets and writes them todata/processed/dataset_yolo. - Saves a
ground_truth_counts.csvfile in each raw dataset subfolder, providing a baseline for counting evaluation.
- Run:
# Do a 'dry run' to see statistics without writing files python -m robbenblick.create_dataset --dry-run --config configs/base_config.yaml # Create the dataset, holding out dataset #4 as the test set python -m robbenblick.create_dataset --config configs/base_config.yaml --test-dir-index 4
- Key Arguments:
--config: Path to thebase_config.yamlfile.--dry-run: Run in statistics-only mode.--test-dir-index: 1-based index of the dataset subfolder to use as a hold-out test set.--val-ratio: Ratio of the remaining data to use for validation.
- Purpose: This is the main training script. It automates hyperparameter tuning by iterating over parameters defined in
base_iter_config.yaml. - How it works:
- Finds all parameters in the config file that are lists (e.g.,
freeze: [None, 10]). - Generates a "variant" for every possible combination of these parameters.
- For each variant, it calls
yolo.py --mode trainas a subprocess with a uniquerun_id. - After all runs are complete, it reads the
results.csvfrom each run directory, sorts them bymAP50, and prints a final ranking table.
- Finds all parameters in the config file that are lists (e.g.,
- Run:
# Start the experiment run defined in the iteration config python -m robbenblick.run_experiments --config configs/base_iter_config.yaml # Run experiments and only show the top 5 results python -m robbenblick.run_experiments --config configs/base_iter_config.yaml --top-n 5
- Purpose: This is the main inference script. It runs a trained YOLOv8 model on new, full-sized images using Sliced Aided Hyper Inference (SAHI).
- How it works:
- Loads a trained
best.ptmodel specified by the--run_idargument. - Loads inference parameters (like
confidence_thresh,tile_overlap) from thebase_config.yaml. - Uses
get_sliced_predictionfrom SAHI to perform tiled inference on each image. - Saves outputs, including visualized images (if
--save-visuals), YOLO.txtlabels (if--save-yolo), and adetection_counts.csvfile.
- Loads a trained
- Run:
# Run inference on a folder of new images and save the visual results python -m robbenblick.predict_tiled \ --config configs/base_config.yaml \ --run_id "best_run_from_experiments" \ --source "data/new_images_to_count/" \ --output-dir "data/inference_results/" \ --save-visuals
- Purpose: Evaluates the counting performance of a model by comparing its predicted counts against the ground truth counts.
- How it works:
- Loads the
ground_truth_counts.csvgenerated bycreate_dataset.py. - Loads the
detection_counts.csvgenerated bypredict_tiled.py. - Merges them by
image_name. - Calculates and prints key regression metrics (MAE, RMSE, R²) to assess the accuracy of the object counting.
- Loads the
- Run:
# Evaluate the counts from a specific run python -m robbenblick.evaluate_counts \ --gt-csv "data/raw/dataset_02/ground_truth_counts.csv" \ --pred-csv "data/inference_results/detection_counts.csv"
- Purpose: The core engine for training, validation, and standard prediction. This script is called by
run_experiments.pyfor training. You can use it directly for validation. - How it works:
--mode train: Loads a base model (yolov8s.pt) and trains it on the dataset specified in the config.--mode validate: Loads a trained model (best.ptfrom a run directory) and validates it against thetestsplit defined indataset.yaml. This provides detection metrics (mAP).--mode predict: Runs standard (non-tiled) YOLO prediction on a folder.
- Run:
# Validate the 'test' set performance of a completed run python -m robbenblick.yolo \ --config configs/base_config.yaml \ --mode validate \ --run_id "best_run_from_experiments"
- Purpose: Visualizes datasets and predictions using FiftyOne.
- How it works:
--dataset groundtruth: Loads the processed YOLO dataset (images and ground truth labels) fromdata/processed/.--dataset predictions: Loads images, runs a specified model (--run_id) on them, and displays the model's predictions.
- Run:
# View the ground truth annotations for the 'val' split python -m robbenblick.run_fiftyone \ --config configs/base_config.yaml \ --dataset groundtruth \ --split val \ --recreate # View the predictions from 'my_best_run' on the 'test' split python -m robbenblick.run_fiftyone \ --config configs/base_config.yaml \ --dataset predictions \ --split test \ --run_id "my_best_run" \ --recreate
- Purpose: Quick test runs with the trained model of your choice for counting the seals in the image(s) and visualization.
- How it works:
- Loads the selected YOLO model from
runs/detect/. - Upload images, run model, then displays the counts and model's predictions as image visualization.
- Loads the selected YOLO model from
- Run:
# View the ground truth annotations for the 'val' split export PYTHONPATH=$PWD && streamlit run robbenblick/streamlit_app.py
-
Add Raw Data:
- Place your first set of images and annotations in
data/raw/dataset_01/images/anddata/raw/dataset_01/annotations.xml. - Place your second set (e.g., from a different location) in
data/raw/dataset_02/images/anddata/raw/dataset_02/annotations.xml.
- Place your first set of images and annotations in
-
Create Dataset:
- Run
python -m robbenblick.create_dataset --dry-runto see your dataset statistics. Note the indices of your datasets. - Let's say
dataset_02is a good hold-out set. Run:python -m robbenblick.create_dataset --config configs/base_config.yaml --test-dir-index 2 - This creates
data/raw/dataset_02/ground_truth_counts.csvfor later.
- Run
-
Find Best Model:
- Edit
configs/base_iter_config.yaml. Define your experiments.# Example: Test two models and two freeze strategies model: ['yolov8s.pt', 'yolov8m.pt'] freeze: [None, 10] yolo_hyperparams: scale: [0.3, 0.5]
- Run the experiments:
python -m robbenblick.run_experiments. - Note the
run_idof the top-ranked model, e.g.,iter_run_model_yolov8m.pt_freeze_10_scale_0.3.
- Edit
-
Validate on Test Set (Detection mAP):
- Check your best model's performance on the unseen test data:
python -m robbenblick.yolo --mode validate --run_id "iter_run_model_yolov8m.pt_freeze_10_scale_0.3" --config configs/base_config.yaml - This tells you how well it detects objects (mAP).
- Check your best model's performance on the unseen test data:
-
Apply Model for Counting:
- Get a new folder of large, un-annotated images (e.g.,
data/to_be_counted/). - Run
predict_tiled.py:python -m robbenblick.predict_tiled --run_id "iter_run_model_yolov8m.pt_freeze_10_scale_0.3" --source "data/to_be_counted/" --output-dir "data/final_counts/" --save-visuals - This creates
data/final_counts/detection_counts.csv.
- Get a new folder of large, un-annotated images (e.g.,
-
Evaluate Counting Performance (MAE, RMSE):
- Now, compare the predicted counts (Step 5) with the ground truth counts (Step 2). Let's assume your "to_be_counted" folder was your
dataset_02.python -m robbenblick.evaluate_counts --gt-csv "data/raw/dataset_02/ground_truth_counts.csv" --pred-csv "data/final_counts/detection_counts.csv" - This gives you the final MAE, RMSE, and R² metrics for your counting task.
- Now, compare the predicted counts (Step 5) with the ground truth counts (Step 2). Let's assume your "to_be_counted" folder was your
This repository contains only the source code of the project. The training data and the fine-tuned model weights are not included or published.
The repository is currently not being actively maintained. Future updates are not planned at this time.
For transparency, please note that the underlying model used throughout this project is based on YOLOv8 by Ultralytics.
Copyright (c) 2025 Birds on Mars.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
This aligns with the license of the underlying YOLOv8 model architecture used in this project.
Please note: Training data and fine-tuned model weights are not part of the licensed materials and are not included in this repository.
For full details, see the LICENSE file.
Try using --recreate flag to force FiftyOne to reload the dataset:
python robbenblick/run_fiftyone.py --dataset groundtruth --split val --recreateIf you get:
fiftyone.core.service.ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port
Try killing any lingering fiftyone or mongod processes:
pkill -f fiftyone
pkill -f mongod
Then rerun your script.The code for this project has been developed through a collaborative effort between WWF Büro Ostsee and KI-Ideenwerkstatt, technical implementation by Birds on Mars.
Technical realization
An AI initiative by