Skip to content

Add new HF QA tasks and enhance existing benchmarks#22

Merged
doxav merged 1 commit into
AgentOpt:mainfrom
doxav:new_hf_datasets
Jun 11, 2026
Merged

Add new HF QA tasks and enhance existing benchmarks#22
doxav merged 1 commit into
AgentOpt:mainfrom
doxav:new_hf_datasets

Conversation

@doxav

@doxav doxav commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator
  • Updated hf_tasks.yaml to include new QA tasks: MuSiQue, GSM8K, ARC-Challenge, QASC, DROP, QuALITY, and Qasper with detailed configurations.
  • Implemented mcqa.py for handling multiple-choice questions, including task conversion and feedback scoring.
  • Created musique.py for MuSiQue task handling, focusing on alias scoring and context formatting.
  • Developed qasper.py for Qasper task processing, including context formatting and answer extraction.
  • Enhanced hotpot_qa.py to improve feedback mechanisms and response handling.
  • Added comprehensive tests in test_hf_qa_ext.py to validate new tasks and their functionalities.
  • Introduced test_noop_trainer.py to ensure NoOpTrainer operates correctly within the Trace-Bench runner.
  • Updated trace_bench/runner.py to support new model token handling and optimizer integration.
  • Refined noop_trainer.py to improve logging and parameter handling during training.

- Updated `hf_tasks.yaml` to include new QA tasks: MuSiQue, GSM8K, ARC-Challenge, QASC, DROP, QuALITY, and Qasper with detailed configurations.
- Implemented `mcqa.py` for handling multiple-choice questions, including task conversion and feedback scoring.
- Created `musique.py` for MuSiQue task handling, focusing on alias scoring and context formatting.
- Developed `qasper.py` for Qasper task processing, including context formatting and answer extraction.
- Enhanced `hotpot_qa.py` to improve feedback mechanisms and response handling.
- Added comprehensive tests in `test_hf_qa_ext.py` to validate new tasks and their functionalities.
- Introduced `test_noop_trainer.py` to ensure NoOpTrainer operates correctly within the Trace-Bench runner.
- Updated `trace_bench/runner.py` to support new model token handling and optimizer integration.
- Refined `noop_trainer.py` to improve logging and parameter handling during training.
@doxav doxav merged commit cd2648e into AgentOpt:main Jun 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant