Automate model compatibility checks#907
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #907 +/- ##
==========================================
- Coverage 99.63% 97.37% -2.26%
==========================================
Files 103 104 +1
Lines 8238 8429 +191
==========================================
Hits 8208 8208
- Misses 30 221 +191 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f633916 to
924e812
Compare
9b1ea17 to
bd74716
Compare
Projects for compat includes also non-trainable backends
bd74716 to
d08a04b
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR introduces automated model compatibility and reproducibility checks for Annif models through a new GitHub Actions workflow. The implementation enables systematic verification that code changes don't break model backward compatibility or training reproducibility.
Key changes:
- New GitHub Actions workflow (
model-compatibility.yml) that runs compatibility and consistency checks on workflow dispatch or push events - Python script (
check_models_compatability_consistency.py) that downloads models from Hugging Face Hub, evaluates them, compares metrics against baselines, and reports significant differences - Two configuration files defining project setups for compatibility testing (8 projects including ensemble backends) and consistency testing (8 projects focusing on base backends)
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.
| File | Description |
|---|---|
.github/workflows/model-compatibility.yml |
GitHub Actions workflow orchestrating the compatibility checks with steps for environment setup and running both compatibility and consistency tests |
tests/check_models_compatability_consistency.py |
Python script implementing the core logic for downloading models/metrics, training, evaluation, comparison, and uploading results to Hugging Face Hub |
tests/projects-compatibility.cfg |
Configuration defining 8 projects (including yake-fi and ensemble-fi) for backward compatibility testing against existing trained models |
tests/projects-consistency.cfg |
Configuration defining 8 projects for reproducibility testing through retraining and metric comparison |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
Also checks for model size and (time) performance could be useful, but they are better to be implemented separate to this PR. |
Co-authored-by: aider (openai/gpt-5.2-chat) <aider@aider.chat>
Co-authored-by: aider (openai/gpt-5.2-chat) <aider@aider.chat>
Co-authored-by: aider (openai/gpt-5.2-chat) <aider@aider.chat>
Co-authored-by: aider (openai/gpt-5.2) <aider@aider.chat>
Co-authored-by: aider (openai/gpt-5.2) <aider@aider.chat>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
|



This pull request introduces automated model compatibility and reproducibility checks for the backends, ensuring that changes to the codebase do not introduce significant metric regressions.
Key changes include:
Continuous Integration and Automation:
.github/workflows/model-compatibility.yml) that runs model compatibility and reproducibility checks on workflow_dispatch trigger executing thetests/check_models_compatability_consistency.pyscript with the--cioption.Testing Infrastructure and Scripts:
The script functions as follows in the two check modes:
compatibilitymode/subcommand:consistencymode/subcommand:abs(prev_value - new_value) / abs(prev_value)) for compatibility, and 0.03 for consistency (the larger value allow non-determinism in training).--cioption and detecting differences, the script exits with code 1 failing the GH Action job.The
uploadsubcommand of the script uploads the newly trained models and their evalution metrics to the HFH repo, thus "resetting" the state:In the above command,
uploadcan be changed tocompatibilityorconsistencyfor running in those modes.Configuration for Model Checks:
tests/projects-compatibility.cfgandtests/projects-consistency.cfgconfiguration files, which define the set of Annif projects (models) to be checked for compatibility and consistency, respectively. The first configuration is for projects of non-trainable backends.This testing is probably best used via the workflow dispatch trigger from the GH Actions workflow page, which allows also checking the status:
TODO: