Integrate the ten-step expansion wave (9 of 10 branches)#16
Merged
Conversation
CI workflow: - Add Python 3.13 to the test matrix and enable pip caching. - Add a verify job that runs the project's own gate suite (verify --profile full --gate-timeout 0), gated on the matrix via needs so red commits do not consume runners. - Add a build job (python -m build, twine check, artifact upload), also gated on the test matrix. Release path: - New tag-triggered release.yml: build, twine check, and a PyPI trusted-publishing job pinned to an immutable action SHA (pypa/gh-action-pypi-publish v1.14.0), gated on a 'pypi' GitHub environment so nothing can publish before one-time setup. - New RELEASING.md documenting the one-time PyPI trusted-publisher setup and the release procedure. - pyproject: add 3.13 classifier and a 'release' optional dependency group (build, twine), keeping dev lean. Validated locally: both workflows YAML-parse; python -m build succeeds; twine check PASSED for sdist and wheel; the built wheel installed into a fresh venv serves gpu-stack stats correctly. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…ing docs - Moved AGENT_DIARY.md, AGENT_WORKLOG.md, AGENT_GITLOG.md, CODEX 5-5 START HERE.md, AGENT_REST_BREAKS/, and rest_breaks/ to archive/ via git mv; added archive/README.md explaining provenance. Root now holds only the 9 canonical operational ledgers. - Updated all references: README.md Project Status Docs links now point to archive/; ROADMAP.md, IMPROVEMENT_MAP.md, SESSION_STATE.md, HANDOFF.md, VISIBLE_BACKLOG.md, CHANGELOG.md, and docs/readme_fragments/readme_qa_checklist.md updated. - Refreshed ROADMAP.md: new status timestamp (June 10, 2026), new Latest Verified Wave entry for portfolio form-and-deliverable polish wave with PR #5 facts (670 tests, 4/4 verifier gates, audit PASS large_project_files=0), live next-work compass evidence from 2026-06-10 run (Pythia cost_per_token 33 missing, lithography.medium weight 3014 across 15 roots, metadata gaps 65/169/81/160). - Refreshed IMPROVEMENT_MAP.md: updated snapshot date, test count 639 to 670, large project files 7 to 0, verification-surface row and file-cohesion row, AGENT_GITLOG reference updated to archive path, new Latest Verified Wave block. - Fixed docs/app.js null guard: renderTrace accesses traceMeterLabel and traceMeterFoot but both were absent from the guard condition; added them so the check is complete. - All 670 tests green; full verifier 4/4 gates passed; audit PASS. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…tainty) Adds `gpu_stack/uncertainty.py` with UncertainAssignment, three distribution types (uniform, normal, lognormal), and `propagate_uncertainty` that resolves targets over n_samples draws. Uses SymPy lambdify fast-path for vectorised evaluation (200 samples in <1 ms vs ~14 s per-sample) with fallback to per-sample resolver. Returns structured TargetUncertaintyStats with mean, sample std, p5/p50/p95, failure count, and echoed input specs. 35 tests cover determinism, quantile ordering, analytic correctness, failure counting, and all three distribution types. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
Implements the site's #1 "Next work" item: a real dependency-cone browser. (1) New CLI subcommand `export-graph-json` (gpu_stack/cli_export_graph.py) walks the registry and writes a bounded JSON slice for chosen target variables. Default targets: econ.cost.per_token, training.tokens_per_sec, thermal.dc.pue. Per-node fields: name, units, scope, description (trimmed to 160 chars), is_root_input, is_constant, defining_equations. Edges are value-defining dependency links. Deterministic ordering (sorted keys and edges) for stable diffs. 706 nodes, 1011 edges, 377 KB payload. (2) Generated docs/data/registry-cone.json with that command and committed as a build artifact. Regeneration command noted in docs/readme_fragments/data_pipeline.md. (3) New section-window on the portfolio page (docs/index.html) with a nav tree-button. Vanilla JS cone browser in docs/cone-browser.js: fetches the JSON, renders the selected target's upstream cone as expandable OS-styled rows (depth-indented inset rows, gold "root" badge for root inputs, muted "const" badge for constants). Keyboard-operable buttons, aria-live status bar, graceful failure with informative notice when JSON cannot be loaded (e.g. file:// protocol). Bugs from code review fixed: stale openNodes entries purged recursively on collapse; aria-expanded omitted entirely on non-expandable leaf nodes. (4) Design verification: impeccable detect docs/ reports only the known false positive (7 em-dashes = CLI --flag tokens in console sample). Zero new findings. Tests: 21 new tests in tests/test_cli_export_graph.py covering subcommand wiring, JSON schema shape, determinism, bounds, file output, and error paths. Full pytest: 691 passed. Verify --profile full: 4/4 gates passed. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…labeled TCO closure New gpu_stack/presets/dgx_h100_tco.py: - dgx_h100_node_power_bom: sourced DGX H100 component power facts (CPU, NIC, RAM, misc) from public NVIDIA system documentation. - pythia_70m_dgx_h100_run_closure_assumption: every non-sourced root needed by the cost rollup, explicitly labeled as an assumption with per-field rationale, never silent defaults. New combined pack pythia_70m_dgx_h100_us_2024_industrial_full_tco_assumption resolves all four advertised targets end to end. Observed: tokens_per_second = 1268976.3 (ok, 21 trace steps) job_dc_power = 10200.0 W (ok, matches DGX H100 system spec) run_power_cost = 54.44 (ok, 30 trace steps) cost_per_token = 3.738e-9 (ok, 75 trace steps, missing=0) The original sourced-only energy-floor pack keeps its 33 missing inputs visible by design; the closure lives in a separate assumption-labeled pack so sourced facts stay distinct from assumptions. Full pytest: 707 passed. Audit gate: PASS. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…lanations New gpu_stack/core/resolver_advanced.py behind two opt-in flags: - --fallback-on-violated-validity: when a selected Approximation's validity check is numerically violated and an alternative defining relation exists, retry with the alternative and record an explicit trace entry naming the switch. Default behavior unchanged. - --solve-systems: when resolution stalls on a 2- or 3-variable cycle of mutually defined variables, solve the subsystem symbolically and accept only unique real solutions consistent with symbol assumptions. Larger systems stay missing as before. - Trace steps now carry selection reasons (sole identity, variant choice, fallback, system solve) and unresolved inputs can name not-selectable alternatives. Defaults are inert: existing invocations produce identical traces with no new fields populated, asserted by regression tests. 25 new tests in tests/test_resolver_advanced.py. Full pytest: 695 passed. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
New gpu_stack/docs_stats_check.py compares the claim surfaces against live registry truth computed at runtime: - README.md Current Snapshot table rows and the stats code block - docs/index.html stat-grid values - docs/app.js embedded fact-string literals Mismatches report file, claim, expected (live), and found (document), with a nonzero exit. Wired as a fifth docs-stats gate in the full verify profile, naturally read-only safe. Runnable directly via python -m gpu_stack.docs_stats_check (currently: docs-stats: OK). Tests derive expectations from the live Registry at test time and exercise planted-drift failures on tmp copies, never the real files. Full pytest: 684 passed. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
New gpu_stack/presets/scenarios_cited_2026.py with two pack families, every numeric carrying a public source string: - Pythia-160M on one DGX H100 node (EleutherAI Pythia repository and Hugging Face config.json for n_layers=12, d_model=768, n_heads=12, vocab=50304, seq_len=2048, 2M-token batches, ~300B total tokens), reusing the existing sourced DGX H100 hardware and EIA industrial tariff presets, with closure assumptions in separate named packs. - Pythia-70M commercial-tariff variant substituting the EIA 2024 U.S. commercial average retail electricity price for the industrial rate, making tariff sensitivity explicit. Packs register through SOURCED_SCENARIO_PACKS and SCENARIO_TARGET_SETS; statuses stay honest (open frontiers keep reporting missing inputs). Test helper hardened to exact-name pack matching so adding variants cannot make marker-based selection ambiguous. Full pytest: 709 passed. Audit gate: PASS. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
Metadata coverage across 33 non-lithography scope modules (memory, cluster, economics, training, kernel, architecture, interconnect, precision, optimizer, parallelism, collective, gpu, thermal, noise): Observed coverage before -> after: with_sp_units 1428 -> 1493 (every non-constant variable) with_references 1324 -> 1493 (every non-constant variable) equations_with_references 878 -> 959 (every equation) equations_with_unit_check 799 -> 893 References are real public documents (vendor datasheets, JEDEC specs, IEEE 754, standard texts); no fabricated provenance. Unit checks were enabled only where the dimensional check passes. Reconciliation applied on top of the interrupted agent's work: - Registry snapshot test updated to the new coverage truth. - Curated unchecked-equation ledgers shrunk (cluster set now empty; kernel matmul/attention FLOP equations now checked; opex checked set gains run_power_cost and water_cost_rate). - Reverted check_units on the three optimizer schedule ordering inequalities: existing tests deliberately assert those feasibility relations stay unit-check-free, and overriding design tests is beyond a metadata pass. Full pytest: 670 passed. Audit gate: PASS. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
…to claude/eager-cerf-kvmqta
…o claude/eager-cerf-kvmqta
…o claude/eager-cerf-kvmqta
…o claude/eager-cerf-kvmqta
…o claude/eager-cerf-kvmqta
…o claude/eager-cerf-kvmqta
…o claude/eager-cerf-kvmqta # Conflicts: # gpu_stack/presets/scenarios.py
…o claude/eager-cerf-kvmqta
Merges nine branches (n1, n3, n4, n5, n6, n7, n8, n9, n10) and applies integration reconciliation: - Resolve the scenarios.py registration conflict between the full-TCO pack (n1) and the cited-2026 packs (n4) as the union: the target-set mapping keeps the MappingProxyType wrapper, the full_tco target set, and the 2026 spread. - Regenerate docs/data/registry-cone.json against the merged registry. - Refresh all ten README/docs coverage numbers flagged by the new docs-stats gate (it caught every one of them on its first integration run). - Update the README and site honesty claims about simultaneous-system solving and validity fallback, which now exist as opt-in flags. - Refresh planning-ledger metadata-gap claims closed by the wave and record the integration wave in CHANGELOG and SESSION_STATE. Observed on this tree: full pytest 841 passed in 256.30s (one expected RuntimeWarning from the uncertainty failure-count test); full verifier 5/5 gates passed in 260.02s; read-only full verifier 5/5 gates passed in 262.07s; audit gate PASS; scenario-audit spans 8 sourced packs with 99 issues kept visible by design across three open cost frontiers; impeccable detect on docs/ reports only the known CLI-flag em-dash false positive. https://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Integration: the ten-step expansion wave
Merges nine of the ten parallel work branches (#6 #7 #8 #9 #10 #11 #12 #13 #15) plus integration reconciliation. The tenth (#14, SEMF + quark decomposition) remains a draft pending its test reconciliation and will land separately.
What lands here
econ.cost.per_token = 3.738e-9at the EIA 2024 industrial tariff, missing=0, 75 trace steps. The sourced-only pack keeps its 33 missing roots visible by design.--fallback-on-violated-validityand--solve-systems(2–3 variable cycles) with trace explanations; defaults inert.export-graph-jsonCLI + 706-node/1011-edge registry slice + CuperOS-styled browser pane.Integration reconciliation
scenarios.pyN1↔N4 registration), resolved as the union.docs/data/registry-cone.jsonregenerated against the merged registry.Verification (observed on this tree)
docs-stats: OKscenario-audit: 8 packs, 99 issues kept visible by design (three open sourced cost frontiers × ~33 missing economics roots each; the closure pack resolves 4/4)docs/: only the known CLI-flag em-dash false positivehttps://claude.ai/code/session_01Eu2JVnPFgMQftwYTP3cGQZ
Generated by Claude Code