Skip to content

Automate model compatibility checks#907

Open
juhoinkinen wants to merge 40 commits into
mainfrom
issue906-automate-model-compatibility-checks
Open

Automate model compatibility checks#907
juhoinkinen wants to merge 40 commits into
mainfrom
issue906-automate-model-compatibility-checks

Conversation

@juhoinkinen

@juhoinkinen juhoinkinen commented Oct 30, 2025

Copy link
Copy Markdown
Member

This pull request introduces automated model compatibility and reproducibility checks for the backends, ensuring that changes to the codebase do not introduce significant metric regressions.

Key changes include:

Continuous Integration and Automation:

  • Added a new GitHub Actions workflow (.github/workflows/model-compatibility.yml) that runs model compatibility and reproducibility checks on workflow_dispatch trigger executing the tests/check_models_compatability_consistency.py script with the --ci option.

Testing Infrastructure and Scripts:

The script functions as follows in the two check modes:

  1. Download existing models and metrics from a Hugging Face Hub repository which is set via a repository GH Actions secret.
  2. Depends on mode:
    • In compatibility mode/subcommand:
      • evaluate the downloaded models with the current Annif code and compare to previous evaluation metrics.
    • In consistency mode/subcommand:
      • train new models with the current Annif code
      • evaluate the trained models and compare to previous evaluation metrics
  3. Flag all significant differences found in the comparison; a default threshold is 0.01 of the relative difference (= abs(prev_value - new_value) / abs(prev_value)) for compatibility, and 0.03 for consistency (the larger value allow non-determinism in training).
  • When running with the --ci option and detecting differences, the script exits with code 1 failing the GH Action job.

The upload subcommand of the script uploads the newly trained models and their evalution metrics to the HFH repo, thus "resetting" the state:

python tests/check_models_compatibility_consistency.py upload --hf_repo <repo-id-to-upload>

In the above command, upload can be changed to compatibility or consistency for running in those modes.

Configuration for Model Checks:

  • Added tests/projects-compatibility.cfg and tests/projects-consistency.cfg configuration files, which define the set of Annif projects (models) to be checked for compatibility and consistency, respectively. The first configuration is for projects of non-trainable backends.

This testing is probably best used via the workflow dispatch trigger from the GH Actions workflow page, which allows also checking the status: Model Compatibility Check

TODO:

  • Remove trigger on pushes to main or the feature branch.

@juhoinkinen juhoinkinen added this to the 1.5 milestone Oct 30, 2025
@juhoinkinen juhoinkinen added maintenance github_actions Pull requests that update GitHub Actions code labels Oct 30, 2025
@codecov

codecov Bot commented Oct 30, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 191 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.37%. Comparing base (27e4ac7) to head (5c3aff6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
tests/check_models_reproducibility.py 0.00% 191 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #907      +/-   ##
==========================================
- Coverage   99.63%   97.37%   -2.26%     
==========================================
  Files         103      104       +1     
  Lines        8238     8429     +191     
==========================================
  Hits         8208     8208              
- Misses         30      221     +191     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juhoinkinen juhoinkinen force-pushed the issue906-automate-model-compatibility-checks branch from f633916 to 924e812 Compare October 31, 2025 10:09
@juhoinkinen juhoinkinen force-pushed the issue906-automate-model-compatibility-checks branch 2 times, most recently from 9b1ea17 to bd74716 Compare October 31, 2025 15:34
@juhoinkinen juhoinkinen force-pushed the issue906-automate-model-compatibility-checks branch from bd74716 to d08a04b Compare November 12, 2025 13:11

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces automated model compatibility and reproducibility checks for Annif models through a new GitHub Actions workflow. The implementation enables systematic verification that code changes don't break model backward compatibility or training reproducibility.

Key changes:

  • New GitHub Actions workflow (model-compatibility.yml) that runs compatibility and consistency checks on workflow dispatch or push events
  • Python script (check_models_compatability_consistency.py) that downloads models from Hugging Face Hub, evaluates them, compares metrics against baselines, and reports significant differences
  • Two configuration files defining project setups for compatibility testing (8 projects including ensemble backends) and consistency testing (8 projects focusing on base backends)

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.

File Description
.github/workflows/model-compatibility.yml GitHub Actions workflow orchestrating the compatibility checks with steps for environment setup and running both compatibility and consistency tests
tests/check_models_compatability_consistency.py Python script implementing the core logic for downloading models/metrics, training, evaluation, comparison, and uploading results to Hugging Face Hub
tests/projects-compatibility.cfg Configuration defining 8 projects (including yake-fi and ensemble-fi) for backward compatibility testing against existing trained models
tests/projects-consistency.cfg Configuration defining 8 projects for reproducibility testing through retraining and metric comparison

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/model-compatibility.yml Outdated
Comment thread .github/workflows/model-compatibility.yml
Comment thread tests/check_models_reproducibility.py
Comment thread tests/check_models_compatibility_consistency.py Outdated
Comment thread .github/workflows/model-compatibility.yml Outdated
Comment thread tests/check_models_compatibility_consistency.py Outdated
Comment thread tests/check_models_reproducibility.py
Comment thread .github/workflows/model-compatibility.yml Outdated
Comment thread tests/check_models_reproducibility.py
Comment thread tests/projects-consistency.cfg Outdated
juhoinkinen and others added 8 commits November 13, 2025 15:03
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@juhoinkinen juhoinkinen marked this pull request as ready for review December 9, 2025 13:50
@sonarqubecloud

Copy link
Copy Markdown

@juhoinkinen

Copy link
Copy Markdown
Member Author

Also checks for model size and (time) performance could be useful, but they are better to be implemented separate to this PR.

Comment thread tests/check_models_reproducibility.py Fixed
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@sonarqubecloud

sonarqubecloud Bot commented Feb 6, 2026

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

github_actions Pull requests that update GitHub Actions code maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants