Implementation of GAAR (Generalized Automatic Argument Reconstruction) and Arguinas (Argument reconstruction) dataset as presented in our paper:
Argument Reconstruction as Supervision for Critical Thinking in LLMs
by Hyun Ryu*1,2, Gyouk Chu*2, Gregor Betz3, Eunho Yang2, Carolyn Rosé†1, and Sean Welleck†1
1Language Technologies Institute, Carnegie Mellon University 2Graduate School of AI, Korea Advanced Institute of Science & Technology 3Department of Philosophy, Karlsruhe Institute of Technology *Equal Contribution †Equal Advising
- [✔] (26.05.21) The fine-tuned model from Qwen3-4B-Base/Instruct and Qwen3-8B-Base have been released.
- [✔] (26.04.21) The code implementation of GAAR and Arguinas dataset are out.
- [✔] (26.03.18) Paper is out! here
All released models and datasets are gathered in our HuggingFace collection: ChuGyouk/Arguinas.
Follow the steps below to set up your environment:
- Create a Python virtual environment using e.g. Conda:
conda create -n arguinas python=3.12 && conda activate arguinas- Install dependencies:
pip install -r requirements.txt- Configure API keys
Copy .env.example to .env and fill in your keys:
cp .env.example .envThen edit .env:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-proj-...
You only need to set the key(s) for the model family you intend to run (Anthropic for Claude models, OpenAI for GPT models).
Run the pipeline with default arguments:
python run_GAAR.pyThis is equivalent to:
python run_GAAR.py \
--data_path ./data/Sample \
--data_filename sample.json \
--use_general_reconstruction True \
--use_specific_reconstruction False \
--save_path ./output \
--prompt_path ./prompts/GAAR \
--subset sample \
--model_name claude-sonnet-4-5-20250929 \
--max_num_recon 10 \
--max_num_debug 5 \
--max_attempts 5Outputs are written to ./output/reconstruction_<subset>_<model_name>.json.
Our train and test Arguinas datasets live in data/. See data/README.md for the full data format (top-level columns, fallacy_info, sections, etc.).
run_GAAR.py only reads three fields from each entry in the input JSON:
| Field | Type | Description |
|---|---|---|
title |
string |
The debate topic. |
background |
string |
Background context ("None" if absent). |
argument |
string |
The raw argument text to reconstruct. |
See data/Sample/sample.json for a minimal working example, and output/reconstruction_sample_claude-sonnet-4-5-20250929.json for a corresponding sample output produced by the pipeline.
To run on your own data, place a JSON file with the same schema under any directory and point --data_path / --data_filename to it.
All prompt templates used by each stage of the pipeline (fallacy detection, reconstruction, validity checking, streamlining, faithfulness checking, program debugging) live under prompts/GAAR/. Refer to these files to see or modify the instructions given to the LLM at each step.
Two reconstruction variants are provided:
- General (
reconstruction_general_*.txt) — classifies reasoning into 4 broad types (deductive / inductive / analogical / abductive). - Specific (
reconstruction_60_types_*.txt) — classifies reasoning into 60 fine-grained Walton-style argumentation schemes.
Toggle between them with the --use_general_reconstruction / --use_specific_reconstruction flags.
If you find this repo useful for your research, please consider citing us:
@article{ryu2026argument,
title={Argument Reconstruction as Supervision for Critical Thinking in LLMs},
author={Ryu, Hyun and Chu, Gyouk and Betz, Gregor and Yang, Eunho and Rose, Carolyn and Welleck, Sean},
journal={arXiv preprint arXiv:2603.17432},
year={2026}
}
If you have any questions or feedback, feel free to reach out:
- Hyun Ryu: ryuhyun1905@kaist.ac.kr
- Gyouk Chu: kyouwook@kaist.ac.kr

