TARA: Tool-Augmented Retrieval Agents for Self-Corrective RAG

Overview

TARA replaces fixed-loop self-corrective RAG with a ReAct-based agent equipped with six specialized tools for autonomous retrieval refinement. The entire pipeline is implemented as a DSPy declarative program, enabling automatic prompt optimization.

Target Journal: Knowledge-Based Systems (Elsevier, SCIE Q1, IF 7.6)

Key Contributions

Tool-Augmented Agentic Refinement — ReAct agent with 4 core + 2 domain-adaptive tools autonomously decides retrieval strategy
Multi-dimensional Quality Assessment — 4D evaluation (Relevance, Coverage, Specificity, Sufficiency) as an agent tool
Structure-Aware Retrieval — Document section browsing and terminology mapping for enterprise documents
DSPy Declarative Pipeline — Typed Signatures + BootstrapFewShot/MIPROv2 optimization

Main Results

Dataset	TARA (F1)	Loop (F1)	Delta	p-value
2WikiMultiHopQA	.584	.495	+.089	<.001
MuSiQue	.438	.399	+.039	.161
HotpotQA	.658	.636	+.022	.772
FinanceBench	.386	.400	-.014	.394

Gemini Flash Lite, n=200, paired bootstrap significance test (Bonferroni-corrected)

Key finding: The agentic advantage is complexity-dependent — substantial on 4-hop questions (+0.305 F1) and diminishes on simpler tasks.

Architecture

The pipeline consists of four stages: (1) query preprocessing, (2) ReAct agentic refinement with six specialized tools, (3) passage merging, and (4) answer generation. A 3-way fallback router (Clarification / DomainExpert / Fallback) is available for edge cases where no passages are retrieved, though it is rarely triggered in practice.

Quick Start

1. Setup

git clone https://github.com/comsa33/self-corrective-rag.git
cd self-corrective-rag
cp .env.example .env    # Edit with your API keys
uv sync                 # Install dependencies

2. Configure `.env`

GEMINI_API_KEY=your-key-here
PREPROCESS_MODEL=gemini/gemini-3.1-flash-lite-preview
EVALUATE_MODEL=gemini/gemini-3.1-flash-lite-preview
GENERATE_MODEL=gemini/gemini-3.1-flash-lite-preview
AGENT_MODEL=gemini/gemini-3.1-flash-lite-preview
EMBEDDING_MODEL=all-MiniLM-L6-v2

3. Prepare Data

uv run python scripts/prepare_datasets.py --sample 500
uv run python scripts/build_index.py --dataset all

4. Run Experiments

# Single RQ
uv run python experiments/run.py --config configs/experiment/rq1.yaml --sample 20

# All experiments
uv run python experiments/run.py --all --sample 200 --delay 0

Repository Structure

agentic_rag/
  config/          Settings, prompts, YAML config loader
  retriever/       FAISS + BM25 hybrid retrieval, section/term indices
  signatures/      DSPy signatures (preprocess, evaluate, generate, agent)
  tools/           6 agent tools (search, decompose, evaluate, inspect, structure, terminology)
  pipeline/        Pipeline implementations (naive, crag, loop, agentic)
  evaluation/      Metrics (EM, F1, LLM-as-Judge, ROUGE-L), cost tracker
  optimization/    BootstrapFewShot, MIPROv2 wrappers

configs/
  base.yaml                  Shared defaults
  pipeline/*.yaml            Per-pipeline configs
  experiment/rq1..rq5.yaml   Per-RQ experiment configs
  ablation/*.yaml            8 tool-level ablation configs

experiments/
  run.py           Unified config-driven experiment runner
  common.py        Shared utilities
  analysis/        Trajectory, tool usage, score progression analysis

paper/
  main.tex                   Paper source (Elsevier elsarticle)
  sections/                  Per-section .tex files
  references.bib             77 references
  supplementary/             12 CSV files for reviewer verification

tests/                       pytest test suite

Research Questions

RQ	Question	Finding
RQ1	Agentic vs baselines?	+0.089 F1 on 2Wiki (p<.001), complexity-dependent
RQ2	Tool usage patterns?	decompose→search→evaluate convergence, 5-6 tools/question
RQ3	4D vs 1D evaluation?	1D ≈ 4D > w/o Eval — evaluation as quality gate
RQ4	Structure-aware tools?	Dataset-dependent: helpful on MuSiQue/FinanceBench, not Wikipedia
RQ5	DSPy optimization?	+0.071–0.164 F1 from Signatures, Bootstrap best on all datasets

Supplementary Materials

Pre-computed results for reviewer verification are in paper/supplementary/:

Bootstrap CI and pairwise significance tests
Refusal rate analysis across models
Hop-level F1 breakdown
2×2 factorial synergy analysis
LLM-as-Judge results

Citation

@article{lee2026tara,
  title={TARA: Tool-Augmented Retrieval Agents for Self-Corrective RAG},
  author={Lee, Ruo},
  journal={Knowledge-Based Systems},
  year={2026},
  note={Under review}
}

License

This project is licensed under the MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
agentic_rag		agentic_rag
assets		assets
configs		configs
data/results		data/results
docs		docs
experiments		experiments
notebooks		notebooks
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
README_ko.md		README_ko.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TARA: Tool-Augmented Retrieval Agents for Self-Corrective RAG

Overview

Key Contributions

Main Results

Architecture

Quick Start

1. Setup

2. Configure `.env`

3. Prepare Data

4. Run Experiments

Repository Structure

Research Questions

Supplementary Materials

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TARA: Tool-Augmented Retrieval Agents for Self-Corrective RAG

Overview

Key Contributions

Main Results

Architecture

Quick Start

1. Setup

2. Configure .env

3. Prepare Data

4. Run Experiments

Repository Structure

Research Questions

Supplementary Materials

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Configure `.env`

Packages