WIP LLM ranking/scoring backend#859
Conversation
Co-authored-by: Copilot Co-authored-by: Osma Suominen <osma.suominen@helsinki.fi>
This is mainly workaround for differing model names in Ollama and HFH, which complicates tokenizer selection
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #859 +/- ##
==========================================
- Coverage 99.64% 97.56% -2.09%
==========================================
Files 99 100 +1
Lines 7349 7509 +160
==========================================
+ Hits 7323 7326 +3
- Misses 26 183 +157 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
This PR adds exponentiated weighted averaging to suggestions and implements an LLM-based ensemble backend for ranking and scoring.
- Extend
SuggestionBatch.from_averagedto accept an optionalexponentsparameter for score exponentiation. - Introduce
BaseLLMBackendandLLMEnsembleBackendwith OpenAI/AzureOpenAI integration and parallel prompt processing. - Register the new
llm_ensemblebackend in the backend factory.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| annif/suggestion.py | Added exponents parameter and updated averaging logic/docstring. |
| annif/backend/llm_ensemble.py | New LLM ensemble backend: API calls, prompt handling, ensemble logic. |
| annif/backend/init.py | Registered llm_ensemble backend. |
Comments suppressed due to low confidence (2)
annif/suggestion.py:125
- [nitpick] Update the docstring for
from_averagedto include a description of the newexponentsparameter and its default behavior.
"""Create a new SuggestionBatch where the subject scores are the
annif/backend/llm_ensemble.py:263
- [nitpick] Add a brief docstring to
_get_labels_batchto clarify its behavior and inputs, improving code readability.
def _get_labels_batch(self, suggestion_batch: SuggestionBatch) -> list[list[str]]:
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
|
Remainder for implementation: We could check if querying the LLM multiple times and averaging the results gives an improvement (this could work when using a non-zero temperature). This approach is used in https://kata.rara.ee/ service. |



Closes #856.