Skip to content

Integrate the ten-step expansion wave (9 of 10 branches)#16

Merged
Cuuper22 merged 18 commits into
mainfrom
claude/eager-cerf-kvmqta
Jun 11, 2026
Merged

Integrate the ten-step expansion wave (9 of 10 branches)#16
Cuuper22 merged 18 commits into
mainfrom
claude/eager-cerf-kvmqta

Conversation

@Cuuper22

Copy link
Copy Markdown
Owner

Integration: the ten-step expansion wave

Merges nine of the ten parallel work branches (#6 #7 #8 #9 #10 #11 #12 #13 #15) plus integration reconciliation. The tenth (#14, SEMF + quark decomposition) remains a draft pending its test reconciliation and will land separately.

What lands here

  • First end-to-end real-scenario cost (N1): sourced DGX H100 power BOM + assumption-labeled full-TCO pack → econ.cost.per_token = 3.738e-9 at the EIA 2024 industrial tariff, missing=0, 75 trace steps. The sourced-only pack keeps its 33 missing roots visible by design.
  • Resolver capabilities (N3): opt-in --fallback-on-violated-validity and --solve-systems (2–3 variable cycles) with trace explanations; defaults inert.
  • Cited scenario inventory (N4): sourced Pythia-160M + EIA commercial-tariff packs → 8 sourced packs total.
  • Metadata tail closed (N5): variable units 1428→1493 (100%), variable references 1324→1493 (100%), equation references 878→959 (100%), unit checks 799→893.
  • Uncertainty propagation (N6): seeded Monte Carlo layer over the existing evaluate path.
  • Live cone browser (N7): export-graph-json CLI + 706-node/1011-edge registry slice + CuperOS-styled browser pane.
  • Docs-stats gate (N8): the full verify profile is now five gates; on its first integration run it caught all ten stale README/site coverage numbers, which were refreshed from live output.
  • CI & release path (N9), ledger archive (N10).

Integration reconciliation

  • One merge conflict total (scenarios.py N1↔N4 registration), resolved as the union.
  • docs/data/registry-cone.json regenerated against the merged registry.
  • README/site honesty claims updated: simultaneous solving and validity fallback now exist as opt-in flags, so "does not solve simultaneous systems" was made precise.
  • Planning-ledger metadata-gap claims refreshed; integration wave recorded in CHANGELOG/SESSION_STATE.

Verification (observed on this tree)

  • Full pytest: 841 passed in 256.30s (one expected RuntimeWarning from the uncertainty failure-count test)
  • Full verifier: 5/5 gates passed in 260.02s; read-only: 5/5 in 262.07s
  • Audit gate: PASS; docs-stats: OK
  • scenario-audit: 8 packs, 99 issues kept visible by design (three open sourced cost frontiers × ~33 missing economics roots each; the closure pack resolves 4/4)
  • impeccable detect on docs/: only the known CLI-flag em-dash false positive
  • Source-clean: cache_dirs=0 pyc=0

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ


Generated by Claude Code

claude added 18 commits June 10, 2026 17:07
CI workflow:
- Add Python 3.13 to the test matrix and enable pip caching.
- Add a verify job that runs the project's own gate suite
  (verify --profile full --gate-timeout 0), gated on the matrix via
  needs so red commits do not consume runners.
- Add a build job (python -m build, twine check, artifact upload),
  also gated on the test matrix.

Release path:
- New tag-triggered release.yml: build, twine check, and a PyPI
  trusted-publishing job pinned to an immutable action SHA
  (pypa/gh-action-pypi-publish v1.14.0), gated on a 'pypi' GitHub
  environment so nothing can publish before one-time setup.
- New RELEASING.md documenting the one-time PyPI trusted-publisher
  setup and the release procedure.
- pyproject: add 3.13 classifier and a 'release' optional dependency
  group (build, twine), keeping dev lean.

Validated locally: both workflows YAML-parse; python -m build succeeds;
twine check PASSED for sdist and wheel; the built wheel installed into
a fresh venv serves gpu-stack stats correctly.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…ing docs

- Moved AGENT_DIARY.md, AGENT_WORKLOG.md, AGENT_GITLOG.md, CODEX 5-5 START HERE.md,
  AGENT_REST_BREAKS/, and rest_breaks/ to archive/ via git mv; added archive/README.md
  explaining provenance. Root now holds only the 9 canonical operational ledgers.
- Updated all references: README.md Project Status Docs links now point to archive/;
  ROADMAP.md, IMPROVEMENT_MAP.md, SESSION_STATE.md, HANDOFF.md, VISIBLE_BACKLOG.md,
  CHANGELOG.md, and docs/readme_fragments/readme_qa_checklist.md updated.
- Refreshed ROADMAP.md: new status timestamp (June 10, 2026), new Latest Verified Wave
  entry for portfolio form-and-deliverable polish wave with PR #5 facts (670 tests, 4/4
  verifier gates, audit PASS large_project_files=0), live next-work compass evidence
  from 2026-06-10 run (Pythia cost_per_token 33 missing, lithography.medium weight 3014
  across 15 roots, metadata gaps 65/169/81/160).
- Refreshed IMPROVEMENT_MAP.md: updated snapshot date, test count 639 to 670, large
  project files 7 to 0, verification-surface row and file-cohesion row, AGENT_GITLOG
  reference updated to archive path, new Latest Verified Wave block.
- Fixed docs/app.js null guard: renderTrace accesses traceMeterLabel and traceMeterFoot
  but both were absent from the guard condition; added them so the check is complete.
- All 670 tests green; full verifier 4/4 gates passed; audit PASS.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…tainty)

Adds `gpu_stack/uncertainty.py` with UncertainAssignment, three distribution
types (uniform, normal, lognormal), and `propagate_uncertainty` that resolves
targets over n_samples draws. Uses SymPy lambdify fast-path for vectorised
evaluation (200 samples in <1 ms vs ~14 s per-sample) with fallback to
per-sample resolver. Returns structured TargetUncertaintyStats with mean,
sample std, p5/p50/p95, failure count, and echoed input specs. 35 tests cover
determinism, quantile ordering, analytic correctness, failure counting, and
all three distribution types.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
Implements the site's #1 "Next work" item: a real dependency-cone browser.

(1) New CLI subcommand `export-graph-json` (gpu_stack/cli_export_graph.py)
    walks the registry and writes a bounded JSON slice for chosen target
    variables. Default targets: econ.cost.per_token, training.tokens_per_sec,
    thermal.dc.pue. Per-node fields: name, units, scope, description (trimmed
    to 160 chars), is_root_input, is_constant, defining_equations. Edges are
    value-defining dependency links. Deterministic ordering (sorted keys and
    edges) for stable diffs. 706 nodes, 1011 edges, 377 KB payload.

(2) Generated docs/data/registry-cone.json with that command and committed
    as a build artifact. Regeneration command noted in
    docs/readme_fragments/data_pipeline.md.

(3) New section-window on the portfolio page (docs/index.html) with a nav
    tree-button. Vanilla JS cone browser in docs/cone-browser.js: fetches
    the JSON, renders the selected target's upstream cone as expandable
    OS-styled rows (depth-indented inset rows, gold "root" badge for root
    inputs, muted "const" badge for constants). Keyboard-operable buttons,
    aria-live status bar, graceful failure with informative notice when JSON
    cannot be loaded (e.g. file:// protocol). Bugs from code review fixed:
    stale openNodes entries purged recursively on collapse; aria-expanded
    omitted entirely on non-expandable leaf nodes.

(4) Design verification: impeccable detect docs/ reports only the known
    false positive (7 em-dashes = CLI --flag tokens in console sample).
    Zero new findings.

Tests: 21 new tests in tests/test_cli_export_graph.py covering subcommand
wiring, JSON schema shape, determinism, bounds, file output, and error paths.
Full pytest: 691 passed. Verify --profile full: 4/4 gates passed.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…labeled TCO closure

New gpu_stack/presets/dgx_h100_tco.py:
- dgx_h100_node_power_bom: sourced DGX H100 component power facts
  (CPU, NIC, RAM, misc) from public NVIDIA system documentation.
- pythia_70m_dgx_h100_run_closure_assumption: every non-sourced root
  needed by the cost rollup, explicitly labeled as an assumption with
  per-field rationale, never silent defaults.

New combined pack pythia_70m_dgx_h100_us_2024_industrial_full_tco_assumption
resolves all four advertised targets end to end. Observed:
  tokens_per_second = 1268976.3 (ok, 21 trace steps)
  job_dc_power      = 10200.0 W (ok, matches DGX H100 system spec)
  run_power_cost    = 54.44 (ok, 30 trace steps)
  cost_per_token    = 3.738e-9 (ok, 75 trace steps, missing=0)

The original sourced-only energy-floor pack keeps its 33 missing inputs
visible by design; the closure lives in a separate assumption-labeled
pack so sourced facts stay distinct from assumptions.

Full pytest: 707 passed. Audit gate: PASS.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…lanations

New gpu_stack/core/resolver_advanced.py behind two opt-in flags:
- --fallback-on-violated-validity: when a selected Approximation's
  validity check is numerically violated and an alternative defining
  relation exists, retry with the alternative and record an explicit
  trace entry naming the switch. Default behavior unchanged.
- --solve-systems: when resolution stalls on a 2- or 3-variable cycle
  of mutually defined variables, solve the subsystem symbolically and
  accept only unique real solutions consistent with symbol assumptions.
  Larger systems stay missing as before.
- Trace steps now carry selection reasons (sole identity, variant
  choice, fallback, system solve) and unresolved inputs can name
  not-selectable alternatives.

Defaults are inert: existing invocations produce identical traces with
no new fields populated, asserted by regression tests.

25 new tests in tests/test_resolver_advanced.py. Full pytest: 695 passed.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
New gpu_stack/docs_stats_check.py compares the claim surfaces against
live registry truth computed at runtime:
- README.md Current Snapshot table rows and the stats code block
- docs/index.html stat-grid values
- docs/app.js embedded fact-string literals

Mismatches report file, claim, expected (live), and found (document),
with a nonzero exit. Wired as a fifth docs-stats gate in the full
verify profile, naturally read-only safe. Runnable directly via
python -m gpu_stack.docs_stats_check (currently: docs-stats: OK).

Tests derive expectations from the live Registry at test time and
exercise planted-drift failures on tmp copies, never the real files.

Full pytest: 684 passed.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
New gpu_stack/presets/scenarios_cited_2026.py with two pack families,
every numeric carrying a public source string:
- Pythia-160M on one DGX H100 node (EleutherAI Pythia repository and
  Hugging Face config.json for n_layers=12, d_model=768, n_heads=12,
  vocab=50304, seq_len=2048, 2M-token batches, ~300B total tokens),
  reusing the existing sourced DGX H100 hardware and EIA industrial
  tariff presets, with closure assumptions in separate named packs.
- Pythia-70M commercial-tariff variant substituting the EIA 2024 U.S.
  commercial average retail electricity price for the industrial rate,
  making tariff sensitivity explicit.

Packs register through SOURCED_SCENARIO_PACKS and SCENARIO_TARGET_SETS;
statuses stay honest (open frontiers keep reporting missing inputs).
Test helper hardened to exact-name pack matching so adding variants
cannot make marker-based selection ambiguous.

Full pytest: 709 passed. Audit gate: PASS.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
Metadata coverage across 33 non-lithography scope modules (memory,
cluster, economics, training, kernel, architecture, interconnect,
precision, optimizer, parallelism, collective, gpu, thermal, noise):

Observed coverage before -> after:
  with_sp_units              1428 -> 1493 (every non-constant variable)
  with_references            1324 -> 1493 (every non-constant variable)
  equations_with_references   878 -> 959  (every equation)
  equations_with_unit_check   799 -> 893

References are real public documents (vendor datasheets, JEDEC specs,
IEEE 754, standard texts); no fabricated provenance. Unit checks were
enabled only where the dimensional check passes.

Reconciliation applied on top of the interrupted agent's work:
- Registry snapshot test updated to the new coverage truth.
- Curated unchecked-equation ledgers shrunk (cluster set now empty;
  kernel matmul/attention FLOP equations now checked; opex checked
  set gains run_power_cost and water_cost_rate).
- Reverted check_units on the three optimizer schedule ordering
  inequalities: existing tests deliberately assert those feasibility
  relations stay unit-check-free, and overriding design tests is
  beyond a metadata pass.

Full pytest: 670 passed. Audit gate: PASS.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…o claude/eager-cerf-kvmqta

# Conflicts:
#	gpu_stack/presets/scenarios.py
Merges nine branches (n1, n3, n4, n5, n6, n7, n8, n9, n10) and applies
integration reconciliation:
- Resolve the scenarios.py registration conflict between the full-TCO
  pack (n1) and the cited-2026 packs (n4) as the union: the target-set
  mapping keeps the MappingProxyType wrapper, the full_tco target set,
  and the 2026 spread.
- Regenerate docs/data/registry-cone.json against the merged registry.
- Refresh all ten README/docs coverage numbers flagged by the new
  docs-stats gate (it caught every one of them on its first
  integration run).
- Update the README and site honesty claims about simultaneous-system
  solving and validity fallback, which now exist as opt-in flags.
- Refresh planning-ledger metadata-gap claims closed by the wave and
  record the integration wave in CHANGELOG and SESSION_STATE.

Observed on this tree: full pytest 841 passed in 256.30s (one expected
RuntimeWarning from the uncertainty failure-count test); full verifier
5/5 gates passed in 260.02s; read-only full verifier 5/5 gates passed
in 262.07s; audit gate PASS; scenario-audit spans 8 sourced packs with
99 issues kept visible by design across three open cost frontiers;
impeccable detect on docs/ reports only the known CLI-flag em-dash
false positive.

https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
@Cuuper22 Cuuper22 merged commit 5b2b4c7 into main Jun 11, 2026
4 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants