Hi, thanks for releasing MBench and maintaining the leaderboard.
We would like to add our model results to the MBench leaderboard. I have finished running the evaluation, and each metric directory contains units.jsonl, items.jsonl, and summary.json. I would like to confirm which level of results should be included in the leaderboard submission JSON.
Should the submission only contain the aggregated scores from each metric's summary.json plus total_m_score, or should it also include item-level results from items.jsonl / units.jsonl for verification?
Could you also provide an example leaderboard submission JSON, including the format of total_m_score?
Thanks!
Hi, thanks for releasing MBench and maintaining the leaderboard.
We would like to add our model results to the MBench leaderboard. I have finished running the evaluation, and each metric directory contains
units.jsonl,items.jsonl, andsummary.json. I would like to confirm which level of results should be included in the leaderboard submission JSON.Should the submission only contain the aggregated scores from each metric's
summary.jsonplustotal_m_score, or should it also include item-level results fromitems.jsonl/units.jsonlfor verification?Could you also provide an example leaderboard submission JSON, including the format of
total_m_score?Thanks!