What ARC-AGI is, where it came from, and what the ARC Prize 2026 actually asks for — a sourced explainer.
ARC-AGI is, to my eye, the most intellectually honest benchmark in AI right now: it measures whether a system can solve novel problems it was never trained on, rather than how much of the internet it has memorised. This repo is an accessible, sourced explainer of what ARC-AGI is, where it came from, how solvers have evolved, and what the ARC Prize 2026 competition asks for — a clean front door to the topic for anyone trying to make sense of it.
Dating discipline. Competition dates, prize amounts and leaderboard numbers move. Every time-sensitive claim below is dated and linked in Sources, and was re-verified on 2026-06-20. Re-check
arcprize.organd Kaggle before relying on any figure.
- What ARC-AGI is
- A short history (2019 → today)
- How solvers have evolved
- ARC Prize 2026 — the three tracks
- Why ARC-AGI-3 is a real departure
- My work
- Sources
Deeper dives live in docs/: a fuller history, an
annotated tour of solver approaches, and a side-by-side of
the two 2026 tracks.
ARC — the Abstraction and Reasoning Corpus — is a benchmark of small visual grid puzzles. Each task gives you a handful of input→output examples; you infer the transformation rule and apply it to a new input. The grids are deliberately simple (coloured cells on a grid a child can read), but every task embodies a different rule, and the test tasks are novel — there is no shared "skill" you can drill and reuse across them.
That design is the whole point. ARC-AGI was introduced by François Chollet in "On the Measure of Intelligence" (2019), which argued that we had been measuring the wrong thing. Most benchmarks reward skill — performance at a specific task — but skill can be bought with data and compute, so a high score tells you little about intelligence. Chollet's alternative is to measure skill-acquisition efficiency: how well a system turns a small amount of experience with a novel problem into competence at it. That is closer to what psychologists call fluid intelligence, and it is what ARC-AGI is built to probe.
Two consequences follow, and they explain why ARC has stayed hard:
- Memorisation doesn't transfer. Because each task has its own rule and the evaluation tasks are held out, a system that has merely seen a lot of data has no edge. You have to generalise to a problem you have never encountered.
- Humans find it easy; machines don't. ARC tasks are calibrated to be solvable by people. The gap between human and machine performance is therefore a fairly clean read on the kind of generalisation machines still lack.
The ARC Prize Foundation — a non-profit co-founded in 2024 by Mike Knoop and François Chollet — stewards the benchmark and runs the annual competition. Its framing is consistent: an unsolved ARC-AGI is evidence that something important about general intelligence is still missing, and closing it efficiently (small, self-contained systems rather than ever-larger models) is the interesting prize.
For the conceptual roots, Chollet's On the Measure of Intelligence (arXiv:1911.01547) is the primary source.
A compressed version of the story (full detail, with sources, in
docs/history.md):
- 2019 — The idea. Chollet publishes On the Measure of Intelligence and releases ARC: a definition of intelligence as skill-acquisition efficiency, and a benchmark built to resist memorisation.
- 2020 — First Kaggle competition. A $20K contest drew 914 teams. The winner reached roughly 20% using brute-force program synthesis over a hand-built domain-specific language (DSL) — a style that would dominate for years.
- 2022–2023 — ARCathon (Lab42). Two $100K editions hosted by Lab42 kept the benchmark alive and broadened international participation between the big Kaggle years.
- 2024 — Deep learning arrives. ARC Prize 2024 (a $1.1M pool, ~1,430 teams) saw the private-eval state of the art jump from 33% → 55.5%, driven by test-time training and LLM-guided program synthesis. The eligible winner, "the ARChitects," scored 53.5%; MindsAI posted the top 55.5% but did not open-source and was therefore ineligible. The 85% Grand Prize went unclaimed.
- 2025 — A harder benchmark. ARC-AGI-2 was introduced, redesigned to resist 2024-era recipes. A tiny-model paper, TRM (Tiny Recursive Models), won the ARC Prize 2025 Paper Award.
- 2026 — Three tracks, ~$2M. ARC-AGI-2 (static), ARC-AGI-3 (interactive / agentic), and a Paper Prize — the current competition (§4).
The interesting thing about ARC's history is that no single approach has won
cleanly — each era's best method ran into the benchmark's resistance to
memorisation. A sketch (annotated in full in docs/approaches.md):
| Era | Dominant approach | The catch |
|---|---|---|
| 2020–2022 | Brute-force program synthesis over a hand-built DSL | Search explodes; the DSL caps what's expressible |
| 2022–2023 | Augmentation + ensembling of synthesis solvers | Incremental; still brittle on novel rules |
| 2023–2024 | Test-time training (TTT) — fine-tune on the task's own examples at inference | Strong gains, but compute-heavy and fiddly |
| 2024 | LLM-guided program search — let a language model propose programs | Works, but needs large models and careful scaffolding |
| 2024–2025 | Deep-learning solvers + TTT ensembles | The 55.5% breakthrough — yet still far from 85% |
| 2025– | Tiny recursive models (TRM/HRM) — small nets that iterate | Promising and sandbox-friendly; see docs/approaches.md |
The throughline: progress has come less from raw scale than from giving the system a way to adapt to the specific task in front of it — whether by searching for a program, fine-tuning at test time, or recursing on a latent scratchpad.
ARC Prize 2026 runs on Kaggle with a pool of over $2M across three tracks. Headline mechanics, verified 2026-06-20 (re-check before relying on them):
- Opens: March 25, 2026 · Submission deadline: November 2, 2026 · Winners announced: December 4, 2026 · Papers due: November 8, 2026.
- Sandboxed evaluation. No internet access during Kaggle scoring — i.e. no hosted-API systems (GPT/Claude/etc.). Solutions run self-contained within Kaggle's compute and time limits.
- Open-source requirement. To be prize-eligible, code and methods must be open-sourced under a permissive licence (this is why prize-eligible ARC solutions are typically CC0 / MIT-0), attached to a Solution Writeup within seven days of the deadline.
The three tracks:
- ARC-AGI-2 — the static track: classic input→output grid tasks, scored under a two-attempts-per-task rule, targeting 85% on the private eval within efficiency limits. 2026 is the final year ARC-AGI-2 runs as an official Kaggle competition.
- ARC-AGI-3 — the interactive / agentic track: agents act inside novel environments rather than mapping a grid to a grid (see §5).
- Paper Prize — awards for work that advances understanding of ARC-AGI performance, not just leaderboard scores. (TRM won the 2025 edition.)
ARC-AGI-3 also runs milestone prizes — checkpoints on June 30, 2026 and September 30, 2026 (each: 1st $25K · 2nd $10K · 3rd $2.5K) — plus a public community leaderboard for harness research.
ARC-AGI-1 and -2 are static: you see input→output examples and produce an output. ARC-AGI-3 is interactive. An agent is dropped into a novel, turn-based environment with no instructions and must, on its own:
- explore — act to gather information about how the world works;
- model — build an internal theory of the environment's dynamics; and
- set goals — infer what "success" even means, then plan toward it.
This is much closer to how a person handles a game they've never played, and it is brutally hard for current systems. At the March 2026 launch, humans solved 100% of the environments while frontier LLMs scored below 1% (e.g. Gemini 3.1 Pro ~0.37%, Claude Opus 4.6 ~0.2%). The top preview agent reached ~12.6% — and it was a purpose-built agent, not a frontier language model used directly. That single fact is one of the more interesting signals in the whole 2026 cycle: the lead on the genuinely agentic task did not belong to the biggest LLM.
More on the static-vs-interactive distinction in
docs/arc-agi-2-vs-3.md.
This hub is the explainer; the hands-on work lives in three sibling repos. I'm entering both 2026 tracks and researching recursive models — the directions below are what I'm exploring, not claimed results.
- 🧩 arc-agi-2 — my static-track workspace. The approach I'm exploring is a two-branch ensemble, both branches adapted to ARC-AGI-2: an LLM branch (a Qwen model with test-time training) and a TRM branch (a tiny recursive model). Both stay inside the no-internet sandbox.
- 🕹️ arc-agi-3 — my interactive-track workspace. Here I'm exploring an object-centric world model, adapting LeCun's JEPA (Joint Embedding Predictive Architecture) — predicting in latent space to model an environment and plan — to the ARC-AGI-3 environments.
- 🔁 recursive-reasoning-models — a cross-cutting research thread (TRM and recursive reasoning) that feeds the TRM branch above and fits the sandbox rules well.
A note on honesty (it matters for a public repo). Any score, rank, or result attributed to someone else — TRM's reported numbers, the leaderboard figures above — is cited to its source and never presented as mine.
Re-verified 2026-06-20. Competition details and leaderboard numbers change — re-check before reuse.
Primary
- ARC Prize 2026 — competition overview — https://arcprize.org/competitions/2026
- ARC Prize 2026 — ARC-AGI-2 track — https://arcprize.org/competitions/2026/arc-agi-2
- ARC Prize 2026 — ARC-AGI-3 track — https://arcprize.org/competitions/2026/arc-agi-3
- ARC Prize 2026 — Paper Prize — https://arcprize.org/competitions/2026/paper
- F. Chollet, On the Measure of Intelligence (2019) — https://arxiv.org/abs/1911.01547
- ARC Prize 2024: Technical Report — https://arxiv.org/abs/2412.04604
- ARC Prize 2024 winners & report (blog) — https://arcprize.org/blog/arc-prize-2024-winners-technical-report
- ARC Prize 2025: Technical Report — https://arxiv.org/abs/2601.10904
- ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence — https://arxiv.org/abs/2603.24621
- ARC-AGI-3 docs / quickstart — https://docs.arcprize.org/
- Less is More: Recursive Reasoning with Tiny Networks (TRM) — https://arxiv.org/abs/2510.04871
- Y. LeCun, A Path Towards Autonomous Machine Intelligence (2022) — https://openreview.net/forum?id=BZ5a1r-kVsf
Prose and figures in this repo are © 2026 Antonio Rodriguez-Moral, licensed CC BY 4.0; code is MIT.
🌐 arodmor.me · 💻 github.com/arodmor · ✉️ antonio.rodriguez.moral@pm.me
Part of a series: AI/ML Lab · voice-ai-landscape · arc-agi · recursive-reasoning-models