Skip to content

arianXdev/constitution-sim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Constitution-Sim

Stress-test constitutions with AI-powered agentic politicians before trying them out on a real nation!

constitution-sim is a research-grade multi-agent AI simulator. You give it a constitution and a scenario; it spins up an LLM-powered agent for each political role (Executive, Legislature, Judiciary, Media, Bureaucracy) and lets them act under the rules you wrote, turn by turn. Every action is checked by a rules engine, every event is logged, and every run is reproducible from a seed.

Why

Politicians are not utility-maximisers reading from a spec — they deliberate, bargain, posture, and reach for legitimacy. The interesting question is how the rules of a constitution shape that behaviour. So the agents here are LLMs (OpenAI / Anthropic) instructed with a role-specific persona, the constitution they live under, their own goals and utility weights, and a memory of their own recent decisions. They never get to mutate the world directly — every move passes the typed rules engine first.

A deterministic heuristic agent is still available as a no-LLM fallback, so the project also runs offline / in CI / with zero API keys.

Features

  • AI cognition is the default. When OPENAI_API_KEY or ANTHROPIC_API_KEY is in the environment, constitution-sim run uses LLM-powered agents out of the box. With no key, it falls back to a deterministic heuristic — same CLI, same outputs, no setup required.
  • Role-specific personas. Each role (Executive, Legislature, Judiciary, Media, Bureaucracy) gets its own LLM system prompt. The Executive is ambitious; the Judiciary is reactive; the Media chases a narrative; the Bureaucracy implements steadily.
  • Agent memory & shared history. Each agent remembers its own recent decisions and can see a public history of what other actors just did (if the constitution allows).
  • Inter-agent deliberation. Each turn features a deliberation phase where agents can negotiate, threaten, or signal intent by sending messages to each other's inboxes.
  • Schema-driven constitutions. Strict Pydantic v2 models; YAML in, typed objects out. Constitutions can enforce communication limits (e.g. authoritarian gag orders).
  • Rules engine is source of truth. Agents propose typed actions; the engine accepts or rejects with a reason. The LLM cannot mutate state directly.
  • Partial observability. Each role gets a state view filtered by its observation_limits.
  • Institutional metrics. Power concentration, deadlock, trust volatility, legitimacy, corruption pressure, emergency-power drift.
  • Repeated-run evaluation harness. Multi-seed runs with pandas / matplotlib output.
  • Deterministic when seeded (heuristic mode is byte-for-byte reproducible; LLM mode is reproducible up to provider variance).
num_pending_bills corruption_proxy num_active_laws power_concentration public_trust num_active_laws emergency_turns trust_volatility

Requirements

  • Python 3.10+ (target: 3.14)
  • pydantic >= 2, PyYAML, pandas, matplotlib, seaborn
  • For AI cognition: openai (and/or anthropic)

Install

git clone https://github.com/arianXdev/constitution-sim.git
cd constitution-sim
pip install -e ".[dev,llm]"     # core + tests + LLM SDKs (recommended)
# or, no-LLM-only install:
pip install -e ".[dev]"

This exposes a constitution-sim console entry point.

Quickstart (AI-powered)

export OPENAI_API_KEY=sk-...
constitution-sim run \
  --constitution constitutions/advanced_constitution.yaml \
  --scenario     constitutions/scenario.yaml \
  --turns 20 --seed 42 \
  --log         /tmp/cs/events.jsonl \
  --metrics-out /tmp/cs/metrics.csv

That's it. The default --agent-type auto notices the key, spins up LLM-powered Executive / Legislature / Judiciary / Media / Bureaucracy agents, and runs the simulation. You'll see a one-liner telling you which provider was picked.

Want to force a provider explicitly?

constitution-sim run --agent-type openai    --model gpt-4o-mini       ...
constitution-sim run --agent-type anthropic --model claude-sonnet-4-5 ...

Want deterministic, no-API runs (for tests / reproducibility)?

constitution-sim run --agent-type heuristic ...

The four CLI subcommands

# 1. Validate a constitution YAML against the schema.
constitution-sim validate --constitution constitutions/advanced_constitution.yaml

# 2. Run a simulation (single seed or multi-seed evaluation).
constitution-sim run \
  --constitution constitutions/advanced_constitution.yaml \
  --scenario     constitutions/scenario.yaml \
  --turns 30 --runs 5 --seed 42 \
  --log         /tmp/cs/events.jsonl \
  --metrics-out /tmp/cs/metrics.csv \
  --plot-dir    /tmp/cs/plots

# 3. Replay a recorded event log (structured summary, not re-execution).
constitution-sim replay --log /tmp/cs/eval_logs/run_0_events.jsonl --show-first 5

# 4. Compare two evaluations (e.g. two constitutions).
constitution-sim compare --a /tmp/cs/metrics_A.csv --b /tmp/cs/metrics_B.csv

What the LLM sees

For each turn, the LLM agent is prompted with:

  • A role-specific persona (Executive / Legislature / …).
  • The constitution's name, description, and the list of other roles.
  • Its own declared goals and utility weights (from the YAML).
  • A partial state view filtered by its observation_limits.
  • Public political history: recent public actions taken by all actors.
  • Inbox messages: any negotiation/signals received during the turn's deliberation phase.
  • A short memory of its own recent decisions (and whether they were legal).
  • The exact set of typed actions it's allowed to return.

It replies with one JSON object describing a single action. If the LLM returns malformed JSON or an action outside its permission set, the agent silently falls back to the deterministic heuristic policy — the simulator never breaks.

Project structure

src/constitution_sim/
  models/        Pydantic schemas: Constitution, Role, Rule, WorldState, actions
  core/          SimulationEngine, RulesEngine, Scheduler, EventLogger
  agents/        BaseAgent, DeterministicHeuristicAgent, LLMAgent, providers
  scenarios/     Shock model + ScenarioEngine
  analysis/      MetricsCollector, Evaluator, plot
  app/           CLI (validate / run / replay / compare)
constitutions/
  simple_constitution.yaml
  advanced_constitution.yaml
  strong_executive_constitution.yaml
  scenario.yaml
docs/
  architecture.md
  tutorial.md
tests/

Tests

pytest -q

All tests should pass. tests/test_determinism.py explicitly asserts that two heuristic-mode runs with the same seed produce byte-identical event logs. tests/test_llm_agent.py::test_live_openai_smoke runs a real LLM round-trip when OPENAI_API_KEY is set, and is automatically skipped otherwise.

Headline experiment

Compare a balanced constitution against a strong-executive one (3 runs × 12 turns, seed 11). The strong-executive YAML pushes power_concentration from ~0.47 to ~0.92 and adds illegal-action attempts to the log: laws written by one actor, judiciary unable to push back. That's the framework working as intended — see docs/tutorial.md for a walkthrough.

Design highlights

  • WorldState is the single canonical truth; agents only ever see a StateView.
  • Every action attempt is recorded in the JSONL event log, including the rules-engine reason for any rejection.
  • Role.observation_limits lets the constitution define what each role can see (e.g. the Bureaucracy doesn't see pending bills in advanced_constitution.yaml).
  • Role.utility_weights drives heuristic voting and is surfaced to LLM agents in their prompt as part of the persona.
  • RulesEngine does both permission checks AND state-level legality checks (you can't vote on a non-existent bill, you can't declare emergency powers if the constitution doesn't allow them).

See docs/architecture.md for the full design and docs/tutorial.md for an end-to-end "use it like I'm 10" walkthrough.

Out of scope (intentional)

This is an MVP, not a finished research instrument. The following are explicit non-goals at this stage:

  • Persistent economic/demographic simulation (state variables are scalars, not vector economies).
  • Fine-tuned LLMs or RL self-play.

About

Stress-test constitutions with AI-powered agentic politicians before trying them out on a real nation.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages