QALF: Quantum Associative Language Field

QALF is a small proof-of-concept language model that uses complex Hilbert-space states, density-matrix context, entangled relation operators, and Born-style decoding. It is intentionally not a transformer, RNN, SSM, or wrapper around pretrained model weights.

The current target is conceptual evidence: a model that can train locally, generate short coherent replies, and provide enough diagnostics to support a paper about the idea.

Environment

The Conda environment is named EXPLLM.

conda run -n EXPLLM python -m qalf.train --data data/seed_corpus.jsonl --out runs/qalf_poc
conda run -n EXPLLM python -m qalf.eval --checkpoint runs/qalf_poc/model.pt
conda run -n EXPLLM python -m qalf.chat --checkpoint runs/qalf_poc/model.pt
conda run -n EXPLLM python -m unittest discover -s tests

QALF uses CUDA automatically when torch.cuda.is_available() succeeds and falls back to CPU otherwise.

Larger Dataset

TinyStories is a useful next corpus because it was designed for very small language models and simple coherent English.

conda run -n EXPLLM python -m qalf.prepare_text --source tinystories-valid --out data/tinystories_qalf.jsonl --max-examples 2000
conda run -n EXPLLM python -m qalf.train --data data/tinystories_qalf.jsonl --out runs/qalf_tinystories_small --device auto --dimension 48 --context-size 32 --vocab-size 2500 --epochs 15 --batch-size 256 --lr 0.012 --max-windows 30000 --attractor-limit 300 --log-every 3
conda run -n EXPLLM python -m qalf.eval --checkpoint runs/qalf_tinystories_small/model.pt --data data/tinystories_qalf.jsonl --out runs/qalf_tinystories_small/eval_compact.json --attractor-data data/tinystories_qalf.jsonl
conda run -n EXPLLM python -m qalf.chat --checkpoint runs/qalf_tinystories_small/model.pt --attractor-data data/tinystories_qalf.jsonl

Verified CUDA run on the GTX 1660:

conda run -n EXPLLM python -m qalf.train --data data/tinystories_qalf.jsonl --out runs/qalf_tinystories --device cuda --dimension 96 --context-size 48 --vocab-size 8000 --epochs 40 --batch-size 256 --lr 0.01 --max-windows 200000
conda run -n EXPLLM python -m qalf.eval --checkpoint runs/qalf_tinystories/model.pt --data data/tinystories_qalf.jsonl --out runs/qalf_tinystories/eval.json --attractor-data data/tinystories_qalf.jsonl --eval-batch-size 256

If the normal sandbox runner cannot mount GPU devices, run these commands from the CUDA-visible shell or with approved unsandboxed execution.

DGX Spark Higher-Order Run

This run enables sparse trigram associative memory and writes train/eval job logs as JSONL so results can be shared back into Codex.

conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_qalf.jsonl \
  --out runs/qalf_dgx_trigram \
  --device cuda \
  --dimension 192 \
  --context-size 96 \
  --vocab-size 16000 \
  --epochs 60 \
  --batch-size 512 \
  --lr 0.006 \
  --max-windows 1000000 \
  --relations 8 \
  --trigram-top-k 64 \
  --trigram-min-count 2 \
  --trigram-strength 0.9 \
  --attractor-limit 1000 \
  --log-every 5 \
  --log-file runs/qalf_dgx_trigram/train.jsonl

conda run -n EXPLLM python -m qalf.eval \
  --checkpoint runs/qalf_dgx_trigram/model.pt \
  --data data/tinystories_qalf.jsonl \
  --out runs/qalf_dgx_trigram/eval.json \
  --attractor-data data/tinystories_qalf.jsonl \
  --eval-batch-size 1024 \
  --log-file runs/qalf_dgx_trigram/eval.jsonl

conda run -n EXPLLM python -m qalf.eval \
  --checkpoint runs/qalf_dgx_trigram/model.pt \
  --data data/tinystories_qalf.jsonl \
  --out runs/qalf_dgx_trigram/eval_raw.json \
  --no-attractor \
  --eval-batch-size 1024 \
  --log-file runs/qalf_dgx_trigram/eval_raw.jsonl

Share runs/qalf_dgx_trigram/train.jsonl, eval.json, and eval_raw.json after the run. If memory is tight, reduce --batch-size first, then --max-windows. If training is too fast and underuses the DGX, increase --dimension to 256 and --max-windows to all available windows.

DGX Spark Heavy Data Run

The first DGX run used little RAM because data/tinystories_qalf.jsonl contains only 2,000 examples and produced only about 202k windows. To use the DGX Spark 128 GB RAM, prepare a much larger TinyStories-train subset first. The updated trainer logs estimated memory for window tensors, bigram memory, and trigram memory.

Start here:

conda run -n EXPLLM python -m qalf.prepare_text \
  --source tinystories-train \
  --out data/tinystories_train_100k_qalf.jsonl \
  --max-examples 100000 \
  --prompt-tokens 24 \
  --reply-tokens 128 \
  --log-file runs/qalf_dgx_heavy/prep.jsonl

conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_train_100k_qalf.jsonl \
  --out runs/qalf_dgx_heavy \
  --device cuda \
  --dimension 384 \
  --context-size 128 \
  --vocab-size 32000 \
  --epochs 30 \
  --batch-size 1024 \
  --lr 0.008 \
  --lr-schedule warmup-cosine \
  --warmup-fraction 0.03 \
  --min-lr-ratio 0.15 \
  --max-windows 5000000 \
  --relations 12 \
  --trigram-top-k 96 \
  --trigram-min-count 2 \
  --trigram-strength 1.0 \
  --bigram-strength 0.35 \
  --entropy-weight 0.02 \
  --component-diversity-weight 0.1 \
  --component-diversity-target 0.05 \
  --component-temperature 2.0 \
  --component-min-weight 0.08 \
  --attractor-limit 2000 \
  --log-every 2 \
  --log-file runs/qalf_dgx_heavy/train.jsonl

conda run -n EXPLLM python -m qalf.eval \
  --checkpoint runs/qalf_dgx_heavy/model.pt \
  --data data/tinystories_train_100k_qalf.jsonl \
  --out runs/qalf_dgx_heavy/eval_raw.json \
  --no-attractor \
  --eval-batch-size 2048 \
  --log-file runs/qalf_dgx_heavy/eval_raw.jsonl

Expected memory pressure is mostly from contexts: roughly max_windows * context_size * 8 bytes before smaller side tensors. With 5,000,000 windows and context 128, the context tensor alone is about 4.8 GiB; Python overhead is now reduced by preallocating tensors directly. The dense bigram prior at vocab 32k is about 4 GiB. This should still leave plenty of room on a 128 GB DGX Spark. If memory remains low and training is stable, raise --max-windows to 10000000, then raise --dimension to 512.

Resuming Training

Training now supports periodic resumable checkpoints. Add --save-every N to a long run; this writes checkpoint_epoch_N.pt files containing model weights, optimizer state, and the completed epoch. Final model.pt also includes training state.

Example long run options:

--save-every 2 \
--log-file runs/qalf_dgx_heavy/train.jsonl

Resume from the latest checkpoint and set --epochs to the final target epoch, not the number of additional epochs:

conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_train_100k_qalf.jsonl \
  --out runs/qalf_dgx_heavy \
  --resume runs/qalf_dgx_heavy/checkpoint_epoch_10.pt \
  --device cuda \
  --epochs 30 \
  --batch-size 1024 \
  --lr 0.004 \
  --save-every 2 \
  --log-file runs/qalf_dgx_heavy/train_resume.jsonl

If a run was started without --save-every, it can resume only after final model.pt has been written. An interrupted run with no checkpoint cannot be resumed.

Learning Rate Scheduling

Training defaults to a static learning rate for compatibility, but long DGX runs should use warmup plus cosine decay. The schedule is step-based, logs lr at each epoch record, and stores global_step in checkpoints for resume.

Recommended starting point for the heavy run:

--lr 0.008 \
--lr-schedule warmup-cosine \
--warmup-fraction 0.03 \
--min-lr-ratio 0.15

If the first two epochs are unstable or loss spikes, lower --lr to 0.006. If the loss still plateaus early and VRAM is healthy, try --batch-size 2048 with --lr 0.01. For exact resume behavior, keep dataset, --max-windows, --batch-size, and --epochs consistent with the original scheduled run.

QALF-Mixed DGX Run

QALF-Mixed replaces the previous rank-one context with a mixture of several phase-weighted context components. In logs, early purity_mean should be below 1.0, but the stronger sanity check is that component_overlap_mean stays low and density_effective_rank does not collapse back to 1.0. If purity rises toward 1.0 while component_entropy remains high, the components have aligned and the run needs either stronger regularisation or a mixture floor such as --component-min-weight 0.08.

Short comparison run first:

conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_train_100k_qalf.jsonl \
  --out runs/qalf_mixed_compare \
  --device cuda \
  --dimension 512 \
  --context-size 128 \
  --components 6 \
  --vocab-size 32000 \
  --epochs 8 \
  --batch-size 2048 \
  --lr 0.004 \
  --lr-schedule constant \
  --max-windows 10000000 \
  --relations 12 \
  --trigram-top-k 96 \
  --trigram-min-count 2 \
  --trigram-strength 1.0 \
  --bigram-strength 0.35 \
  --entropy-weight 0.02 \
  --component-diversity-weight 0.1 \
  --component-diversity-target 0.05 \
  --component-temperature 2.0 \
  --component-min-weight 0.08 \
  --attractor-limit 2000 \
  --save-every 2 \
  --log-every 1 \
  --log-file runs/qalf_mixed_compare/train.jsonl

If the comparison run beats the rank-one curve or gives better raw samples, run a longer version:

conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_train_100k_qalf.jsonl \
  --out runs/qalf_mixed_dgx \
  --device cuda \
  --dimension 768 \
  --context-size 160 \
  --components 8 \
  --vocab-size 32000 \
  --epochs 20 \
  --batch-size 2048 \
  --lr 0.004 \
  --lr-schedule constant \
  --max-windows 10000000 \
  --relations 16 \
  --trigram-top-k 128 \
  --trigram-min-count 2 \
  --trigram-strength 1.0 \
  --attractor-limit 2000 \
  --save-every 2 \
  --log-every 1 \
  --log-file runs/qalf_mixed_dgx/train.jsonl

Evaluate raw generation first; the main goal is not lower attractor-backed loss, but better --no-attractor samples:

conda run -n EXPLLM python -m qalf.eval \
  --checkpoint runs/qalf_mixed_compare/model.pt \
  --data data/tinystories_train_100k_qalf.jsonl \
  --out runs/qalf_mixed_compare/eval_raw.json \
  --no-attractor \
  --eval-batch-size 1024 \
  --log-file runs/qalf_mixed_compare/eval_raw.jsonl

QALF-Entangling Window

--attention-mode entangling replaces the phase-weighted component collapse with a unitary window circuit over the position x Hilbert-feature register. The circuit uses fixed local-plus-log-stride Givens rotations, factorized phase gates, and position readout projectors. --memory-mode unitary also replaces the learned linear relation bank with a norm-preserving feature circuit, so the forward path stays unitary until readout measurement.

CPU smoke run with count priors disabled:

conda run -n EXPLLM python -m qalf.train \
  --data data/seed_corpus.jsonl \
  --out runs/qalf_entangling_smoke \
  --device cpu \
  --dimension 32 \
  --context-size 12 \
  --components 3 \
  --relations 2 \
  --epochs 2 \
  --batch-size 64 \
  --lr 0.01 \
  --max-windows 512 \
  --attention-mode entangling \
  --memory-mode unitary \
  --attention-layers 1 \
  --attention-phase-rank 2 \
  --bigram-strength 0 \
  --trigram-strength 0 \
  --log-every 1 \
  --log-file runs/qalf_entangling_smoke/train.jsonl

conda run -n EXPLLM python -m qalf.eval \
  --checkpoint runs/qalf_entangling_smoke/model.pt \
  --data data/seed_corpus.jsonl \
  --out runs/qalf_entangling_smoke/eval_raw.json \
  --no-attractor \
  --eval-batch-size 128 \
  --log-file runs/qalf_entangling_smoke/eval_raw.jsonl

Matched TinyStories comparison runs should keep the state budget close and make classical priors non-central. Start with a no-prior QALF-Mixed baseline, then run entangling attention with linear and unitary memory:

# Baseline: current component mixer, no count priors.
conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_qalf.jsonl \
  --out runs/qalf_component_noprior_compare \
  --device cuda \
  --dimension 192 \
  --context-size 96 \
  --components 6 \
  --relations 8 \
  --epochs 8 \
  --batch-size 512 \
  --lr 0.004 \
  --max-windows 1000000 \
  --bigram-strength 0 \
  --trigram-strength 0 \
  --entropy-weight 0.02 \
  --component-diversity-weight 0.1 \
  --component-diversity-target 0.05 \
  --log-every 1 \
  --log-file runs/qalf_component_noprior_compare/train.jsonl

# Entangling window with legacy linear relation memory.
conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_qalf.jsonl \
  --out runs/qalf_entangling_linear_compare \
  --device cuda \
  --dimension 192 \
  --context-size 96 \
  --components 6 \
  --relations 8 \
  --epochs 8 \
  --batch-size 512 \
  --lr 0.004 \
  --max-windows 1000000 \
  --attention-mode entangling \
  --memory-mode linear \
  --attention-layers 2 \
  --attention-phase-rank 4 \
  --bigram-strength 0 \
  --trigram-strength 0 \
  --log-every 1 \
  --log-file runs/qalf_entangling_linear_compare/train.jsonl

# Entangling window with unitary memory.
conda run -n EXPLLM python -m qalf.train \
  --data data/tinystories_qalf.jsonl \
  --out runs/qalf_entangling_unitary_compare \
  --device cuda \
  --dimension 192 \
  --context-size 96 \
  --components 6 \
  --relations 8 \
  --epochs 8 \
  --batch-size 512 \
  --lr 0.004 \
  --max-windows 1000000 \
  --attention-mode entangling \
  --memory-mode unitary \
  --attention-layers 2 \
  --attention-phase-rank 4 \
  --bigram-strength 0 \
  --trigram-strength 0 \
  --log-every 1 \
  --log-file runs/qalf_entangling_unitary_compare/train.jsonl

Train all three comparison models without count priors as shown above, then pass --no-attractor only to qalf.eval when evaluating each checkpoint. Training commands do not accept an attractor flag. The key diagnostics are window_norm_drift_max, raw perplexity, and whether the entangling/unitary run improves fixed-prompt samples without count priors carrying the result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QALF: Quantum Associative Language Field

Environment

Larger Dataset

DGX Spark Higher-Order Run

DGX Spark Heavy Data Run

Resuming Training

Learning Rate Scheduling

QALF-Mixed DGX Run

QALF-Entangling Window

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
paper		paper
qalf		qalf
runs		runs
tests		tests
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

QALF: Quantum Associative Language Field

Environment

Larger Dataset

DGX Spark Higher-Order Run

DGX Spark Heavy Data Run

Resuming Training

Learning Rate Scheduling

QALF-Mixed DGX Run

QALF-Entangling Window

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages