Code release for Dissociating Decodability and Causal Use in Bracket-Sequence Transformers.
.
├── training/ Transformer + probe + intervention code (Dyck pipeline)
├── data_generation/ Dyck string sampling (Hewitt-style generator)
├── data/ Pre-generated Dyck sequences used by the base model
├── tree_utils/ Shared distance-probe utilities
├── paper_data/ Figure scripts and small JSONs behind the paper's plots
├── figures/ Final figure PDFs + teaser PNG (paper-referenced only)
├── results/ Aggregate probe metrics (Pearson / R² / attention mass)
├── results_dyck_full_top_ablation/ Per-model top-of-stack ablation results
├── analysis/ Post-hoc summary CSVs and an artifact audit
└── experiment_results/ Probe-vs-position and corruption-tracking CSVs
All experiments use 2-layer, 1-head transformers with embedding dimension d ∈ {16, 32, 64} trained on Dyck-(k, m) sequences.
- Train base models (
k=20, m=10):training/run_base_multiseed_replication.sh - Train harder variants (
(k, m) ∈ {(20,18), (30,14), (40,10), (40,18)}):training/run_complexity_scaling.sh - Activation patching sweep:
training/dyck_attention_edge_ablation.py,training/dyck_causal_ablation.py - Subspace ablation (depth / distance):
training/dyck_causal_ablation.py,training/run_subspace_ablation_sweep.sh - Attention analysis:
training/dyck_attention_stack_analysis.py,training/run_attention_analysis.sh - Probe fitting + OOD evaluation:
paper_data/run_h_probes_ood.py,paper_data/run_proper_distance_probe_ood.py,paper_data/run_depth_correlation_ood.py - Query/key-space interventions:
training/dyck_qk_subspace_analysis.py,training/run_qk_probe_overlap.sh
Figure scripts in paper_data/ read from paper_data/*.json, results/, results_dyck_full_top_ablation/, and experiment_results/ and write PDFs to figures/. Run python paper_data/make_all.py to regenerate every figure.
See requirements.txt. Tested with Python 3.11+, PyTorch 2.2+, and matplotlib 3.8+.