Add new HF QA tasks and enhance existing benchmarks by doxav · Pull Request #22 · AgentOpt/Trace-Bench

doxav · 2026-06-11T19:32:49Z

Updated hf_tasks.yaml to include new QA tasks: MuSiQue, GSM8K, ARC-Challenge, QASC, DROP, QuALITY, and Qasper with detailed configurations.
Implemented mcqa.py for handling multiple-choice questions, including task conversion and feedback scoring.
Created musique.py for MuSiQue task handling, focusing on alias scoring and context formatting.
Developed qasper.py for Qasper task processing, including context formatting and answer extraction.
Enhanced hotpot_qa.py to improve feedback mechanisms and response handling.
Added comprehensive tests in test_hf_qa_ext.py to validate new tasks and their functionalities.
Introduced test_noop_trainer.py to ensure NoOpTrainer operates correctly within the Trace-Bench runner.
Updated trace_bench/runner.py to support new model token handling and optimizer integration.
Refined noop_trainer.py to improve logging and parameter handling during training.

- Updated `hf_tasks.yaml` to include new QA tasks: MuSiQue, GSM8K, ARC-Challenge, QASC, DROP, QuALITY, and Qasper with detailed configurations. - Implemented `mcqa.py` for handling multiple-choice questions, including task conversion and feedback scoring. - Created `musique.py` for MuSiQue task handling, focusing on alias scoring and context formatting. - Developed `qasper.py` for Qasper task processing, including context formatting and answer extraction. - Enhanced `hotpot_qa.py` to improve feedback mechanisms and response handling. - Added comprehensive tests in `test_hf_qa_ext.py` to validate new tasks and their functionalities. - Introduced `test_noop_trainer.py` to ensure NoOpTrainer operates correctly within the Trace-Bench runner. - Updated `trace_bench/runner.py` to support new model token handling and optimizer integration. - Refined `noop_trainer.py` to improve logging and parameter handling during training.

doxav merged commit cd2648e into AgentOpt:main Jun 11, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new HF QA tasks and enhance existing benchmarks#22

Add new HF QA tasks and enhance existing benchmarks#22
doxav merged 1 commit into
AgentOpt:mainfrom
doxav:new_hf_datasets

doxav commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doxav commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant