LARS — Live Adaptive Reasoning System

The first LLM runtime that preserves reasoning state across user interruption.

The headline result

LARS is the only method in our benchmark that achieves high reasoning preservation WHILE incorporating the user's interrupt:

Method	RPR ↑	Cost ↓	Used interrupt?	Win?
`no_interrupt`	1.000	62.4	0%	✗ ignores user
`restart_from_scratch`	0.000	67.7	100%	✗ loses reasoning
`langgraph_checkpoint`	0.000	67.7	100%	✗ appends, no merge
`lars`	1.000	62.4	100%	✓ preserves + adapts

See examples/run_benchmark.py to reproduce.

What is LARS?

LARS (Live Adaptive Reasoning System) reframes LLM interaction from a stateless request-response loop into a continuous state-transition process:

S(t + 1) = f(S(t), ∆U(t))

S(t) — the structured reasoning state (goal, steps, assumptions, decisions)
∆U(t) — the user's interrupt, classified into one of 9 typed intents
f — a weighted merge with α + β + γ = 1, α ≥ 0.5

Paper: LARS: Live Adaptive Reasoning System for Continuous-State Interactive AI (Salah, 2026, DOI: 10.5281/zenodo.20618761) — v3 with real-LLM validation on gpt-4o-mini and a 3-layer defense pipeline (see lars_v3_paper.md in this repo).

🆕 What's new in v0.5.1 (3-layer defense pipeline)

The merger is now wrapped in a 3-layer defense pipeline that keeps the state consistent with a real LLM (e.g., gpt-4o-mini), even when the LLM produces generic step descriptions:

USER: "use Twitter instead of Facebook"
   │
   ├─ Layer 1: CoT-aware merger  ──►  rewrites the latest CoT of each step
   ├─ Layer 2: Pending-step refresh  ─►  re-states future steps with new keywords
   └─ Layer 3: Active override inject  ►  feeds user text into the next LLM prompt

Validated end-to-end on openai/gpt-4o-mini via OpenRouter. Both focus on Cairo only and use Twitter instead of Facebook produce mod=1 in the MergeTrace; in v0.4.9 they produced mod=0.

⚡ 30-second quick start

git clone https://github.com/Skyhosteg/LARS.git
cd LARS
pip install -r requirements.txt

# Optional: for LangGraph deployment
# pip install langgraph

# Run the live interactive demo (mock LLM)
python examples/demo_live.py

Enter any goal. LARS plans, executes, and pauses after each step. Type to interrupt — the system parses your input, applies the merge, and continues from the new state.

With a real LLM (OpenRouter / OpenAI)

# PowerShell
$env:OPENROUTER_API_KEY = "sk-or-v1-..."
$env:OPENROUTER_MODEL = "openai/gpt-4o-mini"
python examples/demo_live.py

# bash / zsh
export OPENAI_API_KEY=sk-...
python examples/demo_live.py

With LangGraph (optional)

pip install langgraph
python -m lars.langgraph_integration

The LangGraph wiring is a 50-line graph (lars/langgraph_integration.py) that exposes interrupt_before=["execute_step"] for production deployment with persistence and time-travel.

# PowerShell
$env:OPENROUTER_API_KEY = "sk-or-v1-..."
$env:OPENROUTER_MODEL = "openai/gpt-4o-mini"
python examples/demo_live.py

# bash / zsh
export OPENAI_API_KEY=sk-...
python examples/demo_live.py

To run the full benchmark (mock by default, real LLM with a key):

python examples/run_benchmark.py

Architecture (v0.5.1)

                       user text
                           │
                           ▼
                    ┌──────────────┐
                    │  ∆U Parser   │   → UpdateIntent (9 types)
                    └──────┬───────┘
                           │
        S(t)  ─────────────┼────────►  ┌──────────────────┐
       (state)             │          │ StateMerger      │  → S(t+1) + MergeTrace
              ┌────────────┘          │  f_merge(S, ∆U)  │     α+β+γ weights
              │                       │  (CoT-aware)     │
              │                       └──────┬───────────┘
              │                              │
              ▼                              ▼
   ┌──────────────────┐            ┌──────────────────┐
   │ refresh_pending  │            │  active_overrides│  → injected into next LLM prompt
   │ (re-state future)│            │  Ω(t)            │
   └──────────────────┘            └──────────────────┘
                           │
                           ▼
                   ┌──────────────┐
                   │  RPR metric  │   → 0..1 (semantic similarity)
                   └──────────────┘

The 9 intent types

Intent	Example	Handler
`SCOPE_NARROW`	"focus on Cairo only"	rewrites broad refs (description + CoT)
`SCOPE_EXPAND`	"also include the Gulf"	inserts pending step
`CORRECTION`	"actually use blue"	modifies last step
`REPLACE`	"use Twitter instead of Facebook"	swaps token (description + CoT)
`ADD`	"also include TikTok"	appends pending step
`REMOVE`	"drop the influencer budget"	drops matching (description + CoT)
`REPRIORITIZE`	"do budget first"	note only in v1; design ready in v3
`CLARIFY`	"what do you mean by young?"	no-op + log
`ABORT`	"stop, restart"	clears state

See docs/ARCHITECTURE.md for the full design.

What's in the box

File	Purpose
`lars/state.py`	`StateVector` Pydantic schema (S(t)) — incl. `latest_cot` and `active_overrides`
`lars/extractor.py`	CoT → `StateVector`; includes `refresh_pending()`
`lars/delta_u.py`	User text → `UpdateIntent` (LLM + heuristic)
`lars/merger.py`	The `f` function + `MergeTrace` (CoT-aware handlers)
`lars/metrics.py`	`rpr()`, `rpr_semantic()`, latency, cost
`lars/embeddings.py`	Pluggable embedder (Hash + OpenAI)
`lars/agent.py`	`LiveAgent` runtime with interrupts and 3-layer pipeline
`lars/llm.py`	`OpenAILLM`, `OpenRouterLLM`, `MockLLM`
`lars/langgraph_integration.py`	LangGraph wiring (G1)
`lars/baselines.py`	3 baselines + LARS method
`lars/tasks.py`	12 benchmark tasks
`lars/benchmark.py`	The benchmark harness
`examples/`	4 runnable demos
`tests/`	33 tests across 4 suites
`lars_v3_paper.md`	v3 paper (real-LLM validated)

Running the tests

python tests/test_extractor.py    # 7 tests
python tests/test_merge.py        # 13 tests
python tests/test_agent.py        # 7 tests
python tests/test_benchmark.py    # 6 tests

All 33 tests should pass.

Known limitations (v0.5.1)

Rule-based merger is still literal — but the 3-layer pipeline (CoT-aware merge, pending refresh, override injection) makes it robust to real LLM output. A learned $f_\theta$ remains the highest-priority extension.
Mock ∆U parser is English-only. Use DeltaUParserLLM for other languages.
REPRIORITIZE is a no-op — graph re-ranking is design-ready, pending impl.
Single-LLM validation — v0.5.1 validates on gpt-4o-mini. Cross-model replication (Claude, Llama, Gemini) is future work.
No user study yet — CHI-style evaluation is future work.

Citation

@misc{salah2026lars,
  title  = {LARS: Live Adaptive Reasoning System for Continuous-State Interactive AI},
  author = {Salah, Mohamed},
  year   = {2026},
  month  = jun,
  doi    = {10.5281/zenodo.20618761},
  url    = {https://zenodo.org/records/20618761},
  note   = {v3 with real-LLM validation; see also the lars_v3_paper.md in this repo for the 3-layer defense pipeline},
}

Contributing

See CONTRIBUTING.md. The 9-intent taxonomy is an evolving research artifact — propose new types, add benchmark tasks, or implement f_llm.

License

Code: MIT — see LICENSE
Paper: CC-BY-4.0 — see Zenodo

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
lars		lars
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
benchmark_results.json		benchmark_results.json
lars_v2_paper.md		lars_v2_paper.md
lars_v3_paper.md		lars_v3_paper.md
paper_outline.md		paper_outline.md
paper_v2_limitations.md		paper_v2_limitations.md
paper_v2_section2_related_work.md		paper_v2_section2_related_work.md
paper_v2_section5.md		paper_v2_section5.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LARS — Live Adaptive Reasoning System

The headline result

What is LARS?

🆕 What's new in v0.5.1 (3-layer defense pipeline)

⚡ 30-second quick start

With a real LLM (OpenRouter / OpenAI)

With LangGraph (optional)

Architecture (v0.5.1)

The 9 intent types

What's in the box

Running the tests

Known limitations (v0.5.1)

Citation

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LARS — Live Adaptive Reasoning System

The headline result

What is LARS?

🆕 What's new in v0.5.1 (3-layer defense pipeline)

⚡ 30-second quick start

With a real LLM (OpenRouter / OpenAI)

With LangGraph (optional)

Architecture (v0.5.1)

The 9 intent types

What's in the box

Running the tests

Known limitations (v0.5.1)

Citation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages