Skip to content

[Model]: LateOn #641

@tekumara

Description

@tekumara

Which model would you like to support?

https://huggingface.co/lightonai/LateOn

What are the main advantages of this model?

Beats every existing ColBERT model, including those 4× its size (Jina ColBERT v2 at 559M, Arctic Embed L v2 at 568M).
Holds up under decontamination: when training-overlap samples are stripped from the BEIR corpora, LateOn climbs to 60.36 nDCG@10 on the 12-dataset decontaminated split — first place overall.
Uses fully open data for both pre-training and fine-tuning, with all signals released as metadata so you can rebuild, extend, or replace any filter.

https://x.com/antoine_chaffin/status/2060845436378783970?s=20

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions