From 574c03a926bc9498aa71b11f019f3c8fa71dc716 Mon Sep 17 00:00:00 2001 From: Claude Date: Mon, 15 Jun 2026 21:13:48 +0000 Subject: [PATCH 1/7] docs: promote typed-data flow and reframe it end-to-end MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The "Typed objects vs dictionaries" concept is the page that shows how to write Python against the generated data structures, yet it sat last in the Concepts nav and was the 4th item in every reading order. Reframe it as "Typed data, end to end" — leading with the insight that once you are in Python the data starts from a structured set of parameters and stays in typed objects all the way to XML (CalculationResult -> to_model -> generated models -> XML), with the dict contrast as supporting material. All code examples stay bound to the real APIs and remain backed by tests/unit/test_typed_vs_dict.py. Promote it in the navigation: move it to second in Concepts (after schema-as-contract) and surface it from the home mental-model paragraph, the zero-to-green tutorial reading order, and the onboarding reading order. --- docs/concepts/typed-vs-dicts.md | 44 ++++++++++++++++++++++++++++----- docs/index.md | 5 +++- docs/onboarding.md | 6 ++--- docs/tutorials/01-start-here.md | 6 +++-- mkdocs.yml | 2 +- 5 files changed, 50 insertions(+), 13 deletions(-) diff --git a/docs/concepts/typed-vs-dicts.md b/docs/concepts/typed-vs-dicts.md index 725a9ff..f63397a 100644 --- a/docs/concepts/typed-vs-dicts.md +++ b/docs/concepts/typed-vs-dicts.md @@ -1,10 +1,39 @@ -# Typed objects vs. dictionaries +# Typed data, end to end -> **Explanation** — the input is loaded as JSON (nested `dict`s), but the pipeline maps it once -> onto typed objects generated from the schema and stores data in those. This page sets out what -> strong typing constrains at the point data is stored. See +> **Explanation** — once you are writing Python in this pipeline the data lives in **typed +> objects from start to finish**: a structured set of parameters, the calculation result, the +> generated models, then XML. The only loosely-typed moment is the raw JSON at the very edge, +> parsed *once* at a single boundary. This page shows that end-to-end typed flow, and what +> strong typing buys you over a generic `dict`. See > [ADR 0002](../decisions/0002-drop-csv-pickle-and-write_xml.md). +## Start from a structure, not a bag of keys + +If you begin from a **set of parameters held in a structure**, every later stage can stay +typed — there is never a point where the data degrades into an untyped `dict` you have to +trust by convention. The pipeline already works this way: the acoustic seams return a typed +`CalculationResult`, the single mapping turns that into generated model objects, and only those +objects are serialised. + +```python +from acoustic_dataset import acoustics +from acoustic_dataset.mapping import to_model + +result = acoustics.calculate_from_file("examples/calculation_input.json") +# result -> acoustics.CalculationResult (a typed dataclass) +# result.active_sonar -> acoustics.ActiveSonarResult (.source_level_db is a float) +# result.bands[0].sectors[0] -> acoustics.SectorResult(bearing_deg=..., level_db=...) + +platform = to_model(result) +# platform -> models.Platform (from the schema) +# platform.radiated_noise.band[0].directional.sector[0] -> models.Sector +``` + +Each arrow hands a **declared shape** to the next stage. Nothing in this chain is a `dict`: a +field that does not exist is an error on the line that names it, not a surprise three stages +later. Raw JSON is parsed into typed objects at exactly one place, and from there the whole +flow is typed data — which is the whole point of the sections below. + ## Storing data in a dictionary A `dict` places no constraints on what it holds: keys are arbitrary strings and values are `Any`. @@ -18,7 +47,8 @@ record["sourceLevel"] = "loud" # a string replaces the number, with no objec The structure exists only by convention. A misspelled key, a wrong value type, or an omitted field is stored as readily as correct data, so a mistake surfaces later — when something reads -the value, when the XML fails validation, or not at all. +the value, when the XML fails validation, or not at all. Carry data this way and *every* stage +downstream inherits that uncertainty; start from a typed structure and none of it does. ## Storing data in a typed object @@ -74,5 +104,7 @@ A `dict` would hold `9999` and pass it on; the typed boundary rejects it. | Value types | `Any` | declared (e.g. `Decimal`), checked by `mypy` | | Out-of-range values | stored as-is | rejected by the mapping (`MappingError`) | | Relationship to the schema | convention only | generated from it | +| What downstream stages receive | a shape to trust on faith | a declared structure, all the way to XML | -The behaviours above are checked in `tests/unit/test_typed_vs_dict.py`. +Hold the whole flow in typed data and these guarantees compound at every stage instead of +having to be re-checked. The behaviours above are checked in `tests/unit/test_typed_vs_dict.py`. diff --git a/docs/index.md b/docs/index.md index 5fe62af..2022519 100644 --- a/docs/index.md +++ b/docs/index.md @@ -25,7 +25,10 @@ different question — plus a set of **Architecture Decision Records (ADRs)** th The **schema (XSD) is the contract.** Everything derives from it: typed data classes (generated by `xsdata`), the validation gate (`xmlschema`), the HTML reference docs and Mermaid ERD, and the language bindings shipped to consumers. We never hand-write the -things the schema can generate — *configure, don't create*. Output is trusted only after +things the schema can generate — *configure, don't create*. When you write Python here you +start from a structured set of parameters and the data stays in **typed objects end to end**, +from that structure through to the XML — see +[Typed data, end to end](concepts/typed-vs-dicts.md). Output is trusted only after passing **two gates**: a mechanical structural gate (schema-valid + round-trips) and a human semantic gate (is the science right?). See [Schema as the contract](concepts/schema-as-contract.md) and diff --git a/docs/onboarding.md b/docs/onboarding.md index 5a69d2f..d0bc37d 100644 --- a/docs/onboarding.md +++ b/docs/onboarding.md @@ -39,9 +39,9 @@ before the same `make verify` / `make pipeline`. Both paths reach the *same* gre Read these, in order — they're short: 1. [Schema as the contract](concepts/schema-as-contract.md) — the idea everything follows from. -2. [The two verification gates](concepts/two-verification-gates.md) — why schema-valid isn't the same as correct. -3. [Pipeline data flow](concepts/pipeline-data-flow.md) — how input becomes validated XML. -4. [Typed objects vs dictionaries](concepts/typed-vs-dicts.md) — why typed data beats a generic `dict`. +2. [Typed data, end to end](concepts/typed-vs-dicts.md) — how you write Python here: start from a structured set of parameters and keep the data typed all the way to XML. +3. [The two verification gates](concepts/two-verification-gates.md) — why schema-valid isn't the same as correct. +4. [Pipeline data flow](concepts/pipeline-data-flow.md) — how input becomes validated XML. !!! tip "What's done, what's next" The full Phase 1 pipeline is in place — environment, end-to-end pipeline, migration-safety diff --git a/docs/tutorials/01-start-here.md b/docs/tutorials/01-start-here.md index f9a716b..42d09bd 100644 --- a/docs/tutorials/01-start-here.md +++ b/docs/tutorials/01-start-here.md @@ -53,11 +53,13 @@ Markdown that lives next to the code, so they travel with the project ## Step 4 — Build the mental model -Read these two short pages, in order: +Read these short pages, in order: 1. [Schema as the contract](../concepts/schema-as-contract.md) — the one idea everything follows from. -2. [The two verification gates](../concepts/two-verification-gates.md) — why schema-valid +2. [Typed data, end to end](../concepts/typed-vs-dicts.md) — how you write Python here: start + from a structured set of parameters and keep the data in typed objects all the way to XML. +3. [The two verification gates](../concepts/two-verification-gates.md) — why schema-valid isn't the same as correct. Then skim the [decisions overview](../decisions/index.md). Each ADR tells you *why* a choice diff --git a/mkdocs.yml b/mkdocs.yml index ccfe65c..dd73e8f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -52,9 +52,9 @@ nav: - Add a decision record: how-to/add-a-decision-record.md - Concepts: - Schema as the contract: concepts/schema-as-contract.md + - Typed data, end to end: concepts/typed-vs-dicts.md - The two verification gates: concepts/two-verification-gates.md - Pipeline data flow (ERD): concepts/pipeline-data-flow.md - - Typed objects vs dictionaries: concepts/typed-vs-dicts.md - Decisions (ADRs): - Overview: decisions/index.md - ADR template: decisions/0000-template.md From 9583a20c5b390196abb240b183deb8382fa14bed Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 16 Jun 2026 19:35:47 +0000 Subject: [PATCH 2/7] docs: surface "Typed data, end to end" at the top level Reaching the page still took too many steps (open menu -> Concepts -> the entry). Promote it further: - Add it as a top-level nav item (after Onboarding), so it is one tap from the navigation drawer rather than nested under Concepts. It remains listed under Concepts too, for the Diataxis grouping. - Add a prominent tip callout near the top of the home page linking straight to it, so it is reachable in one click from the landing page. --- docs/index.md | 6 ++++++ mkdocs.yml | 2 ++ 2 files changed, 8 insertions(+) diff --git a/docs/index.md b/docs/index.md index 2022519..943afe5 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,6 +4,12 @@ A schema-driven pipeline that turns acoustic calculation output into validated **XML Acoustic Dataset**, plus the documentation that lets you *learn it* and *defend the decisions behind it*. +!!! tip "Writing Python here? Start with the data" + The pipeline keeps everything in **typed objects from a structured parameter set all the + way to XML**. The one page to read first is + **[Typed data, end to end](concepts/typed-vs-dicts.md)** — how to write code against the + generated structures. + This site is organised so it can **grow with your understanding**. It follows the [Diátaxis](https://diataxis.fr) model — four kinds of documentation, each answering a different question — plus a set of **Architecture Decision Records (ADRs)** that record diff --git a/mkdocs.yml b/mkdocs.yml index dd73e8f..3faf1e0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -42,6 +42,7 @@ markdown_extensions: nav: - Home: index.md - Onboarding: onboarding.md + - Typed data, end to end: concepts/typed-vs-dicts.md - Tutorials: - Start here (zero to green): tutorials/01-start-here.md - How-to guides: @@ -55,6 +56,7 @@ nav: - Typed data, end to end: concepts/typed-vs-dicts.md - The two verification gates: concepts/two-verification-gates.md - Pipeline data flow (ERD): concepts/pipeline-data-flow.md + # ^ "Typed data, end to end" is also surfaced as a top-level nav item above for prominence. - Decisions (ADRs): - Overview: decisions/index.md - ADR template: decisions/0000-template.md From a470cb000d99fa20c73550a378a296bc094df88e Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 16 Jun 2026 19:51:55 +0000 Subject: [PATCH 3/7] docs: keep typed-data code samples within mobile width The aligned trailing comments (padding the -> annotations into columns) made the lines wide enough to clip off-screen on a phone, hiding the right-hand side of the first sample. Reformat so every Python line stays narrow: - Replace the aligned inline annotations in the end-to-end sample with a short code block plus an "expression -> type" table. - Put TypeError / mypy / MappingError notes on their own short comment lines instead of long trailing comments, and wrap the dataclasses.replace call. - Bind the example path to a short `input_path` variable so the calculate_from_file lines stay narrow while keeping the real filename. --- docs/concepts/typed-vs-dicts.md | 57 +++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 17 deletions(-) diff --git a/docs/concepts/typed-vs-dicts.md b/docs/concepts/typed-vs-dicts.md index f63397a..ab8881f 100644 --- a/docs/concepts/typed-vs-dicts.md +++ b/docs/concepts/typed-vs-dicts.md @@ -19,17 +19,26 @@ objects are serialised. from acoustic_dataset import acoustics from acoustic_dataset.mapping import to_model -result = acoustics.calculate_from_file("examples/calculation_input.json") -# result -> acoustics.CalculationResult (a typed dataclass) -# result.active_sonar -> acoustics.ActiveSonarResult (.source_level_db is a float) -# result.bands[0].sectors[0] -> acoustics.SectorResult(bearing_deg=..., level_db=...) +# Parse the JSON input once; from here on the data is typed. +input_path = "examples/calculation_input.json" +result = acoustics.calculate_from_file(input_path) +# The single mapping -> schema-generated model objects. platform = to_model(result) -# platform -> models.Platform (from the schema) -# platform.radiated_noise.band[0].directional.sector[0] -> models.Sector ``` -Each arrow hands a **declared shape** to the next stage. Nothing in this chain is a `dict`: a +Every value above carries a **declared type**, not a `dict`: + +| Expression | Type it holds | +|---|---| +| `result` | `acoustics.CalculationResult` (a dataclass) | +| `result.active_sonar` | `acoustics.ActiveSonarResult` | +| `result.active_sonar.source_level_db` | `float` | +| `result.bands[0].sectors[0]` | `acoustics.SectorResult` | +| `platform` | `models.Platform` (from the schema) | +| `platform.radiated_noise.band[0].directional.sector[0]` | `models.Sector` | + +Each stage hands a **declared shape** to the next. Nothing in this chain is a `dict`: a field that does not exist is an error on the line that names it, not a surprise three stages later. Raw JSON is parsed into typed objects at exactly one place, and from there the whole flow is typed data — which is the whole point of the sections below. @@ -40,9 +49,9 @@ A `dict` places no constraints on what it holds: keys are arbitrary strings and ```python record = {} -record["sourceLevel"] = 215.0 # any key, any value type -record["sorceLevel"] = 9999 # a misspelled key is just another entry -record["sourceLevel"] = "loud" # a string replaces the number, with no objection +record["sourceLevel"] = 215.0 # any key, any value type +record["sorceLevel"] = 9999 # misspelled key, stored anyway +record["sourceLevel"] = "loud" # a string replaces the number ``` The structure exists only by convention. A misspelled key, a wrong value type, or an omitted @@ -59,8 +68,12 @@ is stored is defined up front: from decimal import Decimal from acoustic_dataset.models.acoustic_dataset import Sector -Sector(bearing=Decimal("30.000"), level=Decimal("134.000")) # the declared fields -Sector(bering=Decimal("30.000"), level=Decimal("134.000")) # TypeError: unexpected 'bering' +# The declared fields are accepted: +Sector(bearing=Decimal("30.000"), level=Decimal("134.000")) + +# An unknown field fails at construction: +Sector(bering=Decimal("30.000"), level=Decimal("134.000")) +# -> TypeError: unexpected keyword argument 'bering' ``` - A name that is not a declared field is rejected when the object is constructed (`TypeError`), @@ -69,7 +82,9 @@ Sector(bering=Decimal("30.000"), level=Decimal("134.000")) # TypeError: unexp (`mypy`, run by `make verify`) reports a wrong-typed value before the code runs: ```python - Sector(bearing="thirty", level=Decimal("134.000")) # mypy: incompatible type "str" + # mypy flags a wrong-typed value before the code runs: + Sector(bearing="thirty", level=Decimal("134.000")) + # -> mypy: incompatible type "str" ``` - The fields and their documentation are generated from the schema, so the stored object follows @@ -86,12 +101,20 @@ import dataclasses from acoustic_dataset import acoustics from acoustic_dataset.mapping import to_model, MappingError -result = acoustics.calculate_from_file("examples/calculation_input.json") -# Decibels are bounded to [-200, 300]; attempt to store an impossible source level: +input_path = "examples/calculation_input.json" +result = acoustics.calculate_from_file(input_path) + +# Decibels are bounded to [-200, 300]. +# Force an impossible source level, then map it: bad = dataclasses.replace( - result, active_sonar=dataclasses.replace(result.active_sonar, source_level_db=9999.0) + result, + active_sonar=dataclasses.replace( + result.active_sonar, source_level_db=9999.0 + ), ) -to_model(bad) # MappingError — rejected as it is stored, not left for a later stage +to_model(bad) +# -> MappingError: rejected as it is stored, +# not left for a later stage ``` A `dict` would hold `9999` and pass it on; the typed boundary rejects it. From 321748cf101429a881adadaef3c2101526cd7e80 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 16 Jun 2026 20:19:26 +0000 Subject: [PATCH 4/7] docs: persuasive typed-data example; remove ADRs from published site Two changes from review: 1. Make the first typed-data sample persuasive without going wide. Replace the short block + type table with a taller, narrow, runnable drill-down: print statements walk the structure at increasing depth (CalculationResult -> ActiveSonarResult -> SectorResult, then the schema-generated Platform and Sector), each followed by its real output. Every printed value was captured by running the pipeline, and tests/unit/test_typed_vs_dict.py still passes. 2. Remove the ADRs and the "Add a decision record" how-to from the published site while keeping them in the repo. They are excluded from the MkDocs build via exclude_docs (still browsable on GitHub under docs/decisions/). In-text ADR citations are kept as plain text (unlinked); the few navigation pointers that targeted the removed section now point at docs/decisions/ in the repo. --- docs/concepts/pipeline-data-flow.md | 4 +- docs/concepts/schema-as-contract.md | 8 +-- docs/concepts/two-verification-gates.md | 6 +- docs/concepts/typed-vs-dicts.md | 55 +++++++++++++------ docs/glossary.md | 8 +-- docs/how-to/build-the-docs-site.md | 2 +- docs/how-to/change-the-schema.md | 6 +- .../run-the-migration-safety-comparison.md | 2 +- docs/how-to/use-the-codespace.md | 4 +- docs/index.md | 6 +- docs/onboarding.md | 4 +- docs/reference/commands.md | 4 +- docs/reference/index.md | 2 +- docs/tutorials/01-start-here.md | 8 +-- mkdocs.yml | 19 ++----- 15 files changed, 75 insertions(+), 63 deletions(-) diff --git a/docs/concepts/pipeline-data-flow.md b/docs/concepts/pipeline-data-flow.md index 8d7f2f0..b876ba2 100644 --- a/docs/concepts/pipeline-data-flow.md +++ b/docs/concepts/pipeline-data-flow.md @@ -24,7 +24,7 @@ flowchart TD The **populated domain objects, before serialisation, are the typed testable boundary** — tests assert on them directly, or diff the serialised XML against a golden file. No separate intermediate (no CSV, no pickle) is needed to get testability; that whole chain was removed -([ADR 0002](../decisions/0002-drop-csv-pickle-and-write_xml.md)). +(ADR 0002). ## The entities as an ER diagram @@ -32,7 +32,7 @@ This is the kind of **Mermaid ERD** the docs render — here drawn by hand for t (it deliberately includes pipeline entities like the golden and reference files). The **[schema reference](../reference/schema/index.md)** ERD, by contrast, is produced **automatically from the schema** by `make gen-schema-docs` -([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)). +(ADR 0009). ```mermaid erDiagram diff --git a/docs/concepts/schema-as-contract.md b/docs/concepts/schema-as-contract.md index 53a4993..0f51503 100644 --- a/docs/concepts/schema-as-contract.md +++ b/docs/concepts/schema-as-contract.md @@ -25,7 +25,7 @@ diagram a third. Each is a place the truth can rot. If there is exactly **one** source (the XSD) and everything else is generated from it, drift becomes structurally impossible — you change the schema and re-generate, or you don't change the format at all. CI enforces this by regenerating and failing on any difference -(see [ADR 0008](../decisions/0008-generated-models-no-drift.md)). +(see ADR 0008). ## What flows from the one source @@ -66,6 +66,6 @@ right (and enriched) is the high-leverage work. - [The two verification gates](two-verification-gates.md) — why schema-valid isn't enough. - [Pipeline data flow](pipeline-data-flow.md) — the entities and how they move. -- ADRs [0001](../decisions/0001-schema-driven-generation-with-xsdata.md), - [0008](../decisions/0008-generated-models-no-drift.md), - [0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md). +- ADRs 0001, + 0008, + 0009. diff --git a/docs/concepts/two-verification-gates.md b/docs/concepts/two-verification-gates.md index 7c171ea..327c51f 100644 --- a/docs/concepts/two-verification-gates.md +++ b/docs/concepts/two-verification-gates.md @@ -29,10 +29,10 @@ flowchart LR Three mechanical checks, all in CI: - **Conformant by construction** — output comes from schema-generated objects - ([ADR 0001](../decisions/0001-schema-driven-generation-with-xsdata.md)), so it starts in + (ADR 0001), so it starts in the right shape. - **XSD validation** — `xmlschema` confirms the document validates against the contract - ([ADR 0003](../decisions/0003-xmlschema-as-validation-gate.md)). + (ADR 0003). - **Round-trip** — parse the XML back and re-serialise; if it changed meaningfully, a binding or serialisation loss occurred. This catches things validation alone cannot. @@ -54,7 +54,7 @@ generator, new output can be perfectly schema-valid yet differ from files a cons depends on (perhaps relying on a quirk of the old hand-rolled output). The migration-safety comparison diffs new output against a known-good **reference** file to surface exactly this. See [Change the schema](../how-to/change-the-schema.md) and -[ADR 0004](../decisions/0004-two-gate-verification.md). +ADR 0004. ## Why split them at all? diff --git a/docs/concepts/typed-vs-dicts.md b/docs/concepts/typed-vs-dicts.md index ab8881f..6a1212f 100644 --- a/docs/concepts/typed-vs-dicts.md +++ b/docs/concepts/typed-vs-dicts.md @@ -5,7 +5,7 @@ > generated models, then XML. The only loosely-typed moment is the raw JSON at the very edge, > parsed *once* at a single boundary. This page shows that end-to-end typed flow, and what > strong typing buys you over a generic `dict`. See -> [ADR 0002](../decisions/0002-drop-csv-pickle-and-write_xml.md). +> ADR 0002. ## Start from a structure, not a bag of keys @@ -19,29 +19,50 @@ objects are serialised. from acoustic_dataset import acoustics from acoustic_dataset.mapping import to_model -# Parse the JSON input once; from here on the data is typed. input_path = "examples/calculation_input.json" + +# Parse the JSON input once; everything below is typed. result = acoustics.calculate_from_file(input_path) +# It is a typed object, not a dict — drill in by attribute: +print(type(result).__name__) +# CalculationResult + +print(result.name) +# Reference Platform A + +# One level down: the active sonar is its own typed object. +print(type(result.active_sonar).__name__) +# ActiveSonarResult + +print(result.active_sonar.source_level_db) +# 215.0 + +# Deeper still: bands -> sectors, each a typed record. +print(result.bands[0].sectors[0]) +# SectorResult(bearing_deg=0.0, level_db=134.0) + # The single mapping -> schema-generated model objects. platform = to_model(result) +print(type(platform).__name__) +# Platform + +# The same drill-down works on the generated models, which +# now carry the schema's Decimal type all the way down: +sector = platform.radiated_noise.band[0].directional.sector[0] +print(type(sector).__name__) +# Sector + +print(sector.bearing, sector.level) +# 0.000 134.000 ``` -Every value above carries a **declared type**, not a `dict`: - -| Expression | Type it holds | -|---|---| -| `result` | `acoustics.CalculationResult` (a dataclass) | -| `result.active_sonar` | `acoustics.ActiveSonarResult` | -| `result.active_sonar.source_level_db` | `float` | -| `result.bands[0].sectors[0]` | `acoustics.SectorResult` | -| `platform` | `models.Platform` (from the schema) | -| `platform.radiated_noise.band[0].directional.sector[0]` | `models.Sector` | - -Each stage hands a **declared shape** to the next. Nothing in this chain is a `dict`: a -field that does not exist is an error on the line that names it, not a surprise three stages -later. Raw JSON is parsed into typed objects at exactly one place, and from there the whole -flow is typed data — which is the whole point of the sections below. +Each `print` reaches one level deeper, and every value is a **declared object** — +`CalculationResult`, then `ActiveSonarResult`, then `SectorResult`, then the schema-generated +`Platform` and `Sector`. Nothing in this chain is a `dict`: a field that does not exist is an +error on the line that names it, not a surprise three stages later. Raw JSON is parsed into +typed objects at exactly one place, and from there the whole flow is typed data — which is the +whole point of the sections below. ## Storing data in a dictionary diff --git a/docs/glossary.md b/docs/glossary.md index 3461899..9ab600a 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -51,21 +51,21 @@ from the contract. See the [generated schema reference](reference/schema/index.m ## Tooling **xsdata** — generates typed Python dataclasses from an XSD and binds objects ↔ XML. -See [ADR 0001](decisions/0001-schema-driven-generation-with-xsdata.md). +See ADR 0001. **xmlschema** — pure-Python library used as the XSD validation gate. -See [ADR 0003](decisions/0003-xmlschema-as-validation-gate.md). +See ADR 0003. **MkDocs Material** — static-site generator that turns this Markdown into the attractive HTML site, with native Mermaid support. -See [ADR 0009](decisions/0009-mkdocs-material-mermaid-html-docs.md). +See ADR 0009. **Mermaid** — text-based diagramming (flowcharts, ER diagrams) rendered in the browser; keeps diagrams in version control as plain text. **Devcontainer / GitHub Codespaces** — a declarative development environment so a contributor gets a ready-to-run setup with no manual installation. -See [ADR 0006](decisions/0006-codespaces-with-local-fallback.md). +See ADR 0006. **Diátaxis** — the documentation framework (tutorials / how-to / reference / explanation) this site is organised around. diff --git a/docs/how-to/build-the-docs-site.md b/docs/how-to/build-the-docs-site.md index 757f65c..65ecd86 100644 --- a/docs/how-to/build-the-docs-site.md +++ b/docs/how-to/build-the-docs-site.md @@ -3,7 +3,7 @@ > **How-to** — produce and preview the attractive HTML documentation. The docs are written in Markdown under `docs/` and rendered by **MkDocs Material** -([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)). Mermaid diagrams render +(ADR 0009). Mermaid diagrams render natively. ## Preview while you write diff --git a/docs/how-to/change-the-schema.md b/docs/how-to/change-the-schema.md index fc842a3..8d9eec8 100644 --- a/docs/how-to/change-the-schema.md +++ b/docs/how-to/change-the-schema.md @@ -18,7 +18,7 @@ redesign*: models, validation, bindings and the schema docs all regenerate from ``` Runs `xsdata` over the schema and rewrites `src/acoustic_dataset/models/`. **Do not hand-edit** the result — it's a generated artifact - ([ADR 0008](../decisions/0008-generated-models-no-drift.md)). Generation is pinned to the 3.9 + (ADR 0008). Generation is pinned to the 3.9 toolchain so the output is byte-reproducible for the drift gate. 3. **Regenerate the schema docs + ERD.** @@ -26,7 +26,7 @@ redesign*: models, validation, bindings and the schema docs all regenerate from make gen-schema-docs ``` Produces the reference pages and the Mermaid ERD from the schema - ([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)). + (ADR 0009). 4. **Update the mapping.** `src/acoustic_dataset/mapping.py` is the **one place** that knows element names — update it @@ -59,7 +59,7 @@ python -m acoustic_dataset.cli compare build/acoustic_dataset.xml examples/refer A clean match exits 0; a meaningful difference prints a diff and exits non-zero — catching output that is schema-valid but differs from what a consumer depends on -([ADR 0004](../decisions/0004-two-gate-verification.md)). +(ADR 0004). ## What you should *not* touch diff --git a/docs/how-to/run-the-migration-safety-comparison.md b/docs/how-to/run-the-migration-safety-comparison.md index 5a158f2..8d63301 100644 --- a/docs/how-to/run-the-migration-safety-comparison.md +++ b/docs/how-to/run-the-migration-safety-comparison.md @@ -5,7 +5,7 @@ Schema validity proves the output fits the contract; it does **not** prove it says the same thing as a file a consumer already depends on. The `compare` command closes that gap -([ADR 0004](../decisions/0004-two-gate-verification.md), FR-015). +(ADR 0004, FR-015). ## Steps diff --git a/docs/how-to/use-the-codespace.md b/docs/how-to/use-the-codespace.md index e8621fc..81b0fcc 100644 --- a/docs/how-to/use-the-codespace.md +++ b/docs/how-to/use-the-codespace.md @@ -8,7 +8,7 @@ 1. On the repository page: **Code → Codespaces → Create codespace on `claude/zealous-davinci-n4ssvn`** (or your branch). 2. Wait for provisioning. The devcontainer pins **Python 3.9.4** to match the target system - ([ADR 0007](../decisions/0007-pin-python-3-9-4.md)) and runs `make bootstrap` automatically. + (ADR 0007) and runs `make bootstrap` automatically. ## Everyday commands @@ -45,4 +45,4 @@ make bootstrap Everything above works locally too — you only need Python 3.9.x and `make`. Run `make bootstrap` once, then the same targets. See -[ADR 0006](../decisions/0006-codespaces-with-local-fallback.md) for why we keep both paths. +ADR 0006 for why we keep both paths. diff --git a/docs/index.md b/docs/index.md index 943afe5..3efc10a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -22,7 +22,6 @@ different question — plus a set of **Architecture Decision Records (ADRs)** th | Get the environment running and see it work | **[Tutorials](tutorials/01-start-here.md)** | "Show me, step by step." | | Do a specific task | **[How-to guides](how-to/use-the-codespace.md)** | "How do I X?" | | Understand *why* it works this way | **[Concepts](concepts/schema-as-contract.md)** | "Help me understand." | -| Defend or revisit a decision | **[Decisions (ADRs)](decisions/index.md)** | "Why did we choose this?" | | Look up a command or the schema shape | **[Reference](reference/index.md)** | "What exactly is X?" | | Decode a term | **[Glossary](glossary.md)** | "What does that word mean?" | @@ -42,9 +41,8 @@ human semantic gate (is the science right?). See ## How this set grows -- Every significant choice becomes a numbered **ADR** — start at the - [decisions overview](decisions/index.md). Adding one is a documented how-to: - [Add a decision record](how-to/add-a-decision-record.md). +- Every significant choice is recorded as a numbered **ADR**, kept in the + repository under `docs/decisions/` (not published to this site). - New understanding lands as a **concept** page; new recipes as **how-to** guides. - The **schema reference and ERD** are *generated from the enriched XSD* — see the [generated schema reference](reference/schema/index.md) — so the docs can never drift from the contract. diff --git a/docs/onboarding.md b/docs/onboarding.md index d0bc37d..1c2acf9 100644 --- a/docs/onboarding.md +++ b/docs/onboarding.md @@ -18,7 +18,7 @@ make pipeline # produce build/acoustic_dataset.xml (schema-valid, round-trip-e **Local fallback:** you need Python 3.9.x and `make`, then run `make bootstrap` yourself before the same `make verify` / `make pipeline`. Both paths reach the *same* green state -([ADR 0006](decisions/0006-codespaces-with-local-fallback.md)). Run `make help` to see every target. +(ADR 0006). Run `make help` to see every target. ## Where things live @@ -32,7 +32,7 @@ before the same `make verify` / `make pipeline`. Both paths reach the *same* gre | Tests (unit / integration / golden) | `tests/` | | The plan & design artifacts | `specs/001-codespace-xml-scaffold/` (`spec.md`, `plan.md`, `tasks.md`) | | The generated schema reference + ERD | [reference/schema](reference/schema/index.md) (run `make gen-schema-docs`) | -| Why each choice was made | [Decision records](decisions/index.md) | +| Why each choice was made | `docs/decisions/` (ADRs, kept in the repo) | ## Build the mental model diff --git a/docs/reference/commands.md b/docs/reference/commands.md index 2c3c084..1fcc49f 100644 --- a/docs/reference/commands.md +++ b/docs/reference/commands.md @@ -20,7 +20,7 @@ All command targets are implemented. The CI drift gate regenerates the models and schema docs on the Python 3.9 target and fails if the committed artifacts are stale (see -[ADR 0008](../decisions/0008-generated-models-no-drift.md)). +ADR 0008). ## CLI subcommands @@ -37,6 +37,6 @@ above; full input/exit-code semantics are in the contract file. Summary: ## Tooling versions -- **Python**: 3.9.4 (pinned — [ADR 0007](../decisions/0007-pin-python-3-9-4.md)) +- **Python**: 3.9.4 (pinned — ADR 0007) - **xsdata**: ≥ 24 · **xmlschema**: ≥ 3 · **mkdocs-material**: ≥ 9.5 - Lint/type floors: `ruff target-version = py39`, `mypy python_version = 3.9` diff --git a/docs/reference/index.md b/docs/reference/index.md index 3449b43..3051dd7 100644 --- a/docs/reference/index.md +++ b/docs/reference/index.md @@ -21,5 +21,5 @@ the docs site, under version control): `make gen-schema-docs` (re)generates the [schema reference](schema/index.md) from the enriched XSD — the entity tables, every field's `xs:documentation` prose, and the Mermaid ERD — produced -*from the schema* so they cannot drift ([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)). +*from the schema* so they cannot drift (ADR 0009). The committed page is regenerated by CI; a stale copy fails the drift gate. diff --git a/docs/tutorials/01-start-here.md b/docs/tutorials/01-start-here.md index 42d09bd..1fc138c 100644 --- a/docs/tutorials/01-start-here.md +++ b/docs/tutorials/01-start-here.md @@ -21,7 +21,7 @@ The fastest path is a **GitHub Codespace** (no local installs): Python 3.9.4) — see [Use the Codespace](../how-to/use-the-codespace.md) for detail. Prefer local? You need Python 3.9.x and `make`, then run `make bootstrap` yourself. Either -way the commands below are identical ([ADR 0006](../decisions/0006-codespaces-with-local-fallback.md)). +way the commands below are identical (ADR 0006). ## Step 2 — Verify it's green @@ -49,7 +49,7 @@ Open . This is the very site you're reading, rendered by Material — including the **Mermaid diagrams** (try the [pipeline ERD](../concepts/pipeline-data-flow.md)). The docs are generated from the same Markdown that lives next to the code, so they travel with the project -([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)). +(ADR 0009). ## Step 4 — Build the mental model @@ -62,14 +62,14 @@ Read these short pages, in order: 3. [The two verification gates](../concepts/two-verification-gates.md) — why schema-valid isn't the same as correct. -Then skim the [decisions overview](../decisions/index.md). Each ADR tells you *why* a choice +Then skim the ADRs in `docs/decisions/` (kept in the repo). Each one tells you *why* a choice was made and *what was rejected* — that's the material you'll use to defend the design. ## Where to go next - Want to do a specific task? → [How-to guides](../how-to/use-the-codespace.md) - Need to change the schema? → [Change the schema](../how-to/change-the-schema.md) -- Want to record your own decision? → [Add a decision record](../how-to/add-a-decision-record.md) +- Want to record your own decision? → add an ADR under `docs/decisions/` !!! tip "This set grows with you" As you learn, add a concept page; as you find a recipe, add a how-to; as you decide diff --git a/mkdocs.yml b/mkdocs.yml index 3faf1e0..e4ebe11 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -6,6 +6,12 @@ docs_dir: docs # Build into a gitignored folder; the HTML is a generated artifact. site_dir: site +# The ADRs (decisions/) and the ADR how-to are kept in the repo for the decision +# record, but excluded from the published site. They remain browsable on GitHub. +exclude_docs: | + decisions/ + how-to/add-a-decision-record.md + theme: name: material features: @@ -50,25 +56,12 @@ nav: - Build the docs site: how-to/build-the-docs-site.md - Change the schema: how-to/change-the-schema.md - Run the migration-safety comparison: how-to/run-the-migration-safety-comparison.md - - Add a decision record: how-to/add-a-decision-record.md - Concepts: - Schema as the contract: concepts/schema-as-contract.md - Typed data, end to end: concepts/typed-vs-dicts.md - The two verification gates: concepts/two-verification-gates.md - Pipeline data flow (ERD): concepts/pipeline-data-flow.md # ^ "Typed data, end to end" is also surfaced as a top-level nav item above for prominence. - - Decisions (ADRs): - - Overview: decisions/index.md - - ADR template: decisions/0000-template.md - - 0001 Schema-driven generation with xsdata: decisions/0001-schema-driven-generation-with-xsdata.md - - 0002 Drop CSV, pickle and write_xml: decisions/0002-drop-csv-pickle-and-write_xml.md - - 0003 xmlschema as the validation gate: decisions/0003-xmlschema-as-validation-gate.md - - 0004 Two-gate verification: decisions/0004-two-gate-verification.md - - 0005 Placeholder schema, runnable now: decisions/0005-placeholder-schema-runnable-now.md - - 0006 Codespaces with a local fallback: decisions/0006-codespaces-with-local-fallback.md - - 0007 Pin to Python 3.9.4: decisions/0007-pin-python-3-9-4.md - - 0008 Generated models, no drift: decisions/0008-generated-models-no-drift.md - - 0009 MkDocs Material + Mermaid for HTML docs: decisions/0009-mkdocs-material-mermaid-html-docs.md - Reference: - Overview: reference/index.md - Commands: reference/commands.md From c030fb1d1f54ec716372067a455ce702ba620f98 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 16 Jun 2026 20:42:26 +0000 Subject: [PATCH 5/7] docs: frame the typed-data example around IDE autocomplete Clarify that the example explores only typed structures, never the raw JSON. calculate_from_file returns a typed CalculationResult (not the parsed dict), so the walkthrough now says so explicitly and leads with the payoff: you use attributes rather than string keys, the IDE autocompletes each step, and a typo fails immediately. The JSON is parsed once internally and never indexed by key. Printed outputs are unchanged and still captured from a real run. --- docs/concepts/typed-vs-dicts.md | 35 ++++++++++++--------------------- 1 file changed, 13 insertions(+), 22 deletions(-) diff --git a/docs/concepts/typed-vs-dicts.md b/docs/concepts/typed-vs-dicts.md index 6a1212f..977858f 100644 --- a/docs/concepts/typed-vs-dicts.md +++ b/docs/concepts/typed-vs-dicts.md @@ -21,48 +21,39 @@ from acoustic_dataset.mapping import to_model input_path = "examples/calculation_input.json" -# Parse the JSON input once; everything below is typed. +# calculate_from_file parses the JSON once and returns a typed +# object. From here you use attributes, never string keys — so +# the IDE autocompletes each step and a typo fails immediately. result = acoustics.calculate_from_file(input_path) -# It is a typed object, not a dict — drill in by attribute: +# result is a typed CalculationResult; explore it by attribute: print(type(result).__name__) # CalculationResult -print(result.name) -# Reference Platform A - -# One level down: the active sonar is its own typed object. -print(type(result.active_sonar).__name__) -# ActiveSonarResult - print(result.active_sonar.source_level_db) # 215.0 -# Deeper still: bands -> sectors, each a typed record. print(result.bands[0].sectors[0]) # SectorResult(bearing_deg=0.0, level_db=134.0) -# The single mapping -> schema-generated model objects. +# The single mapping -> the schema-generated typed model. platform = to_model(result) print(type(platform).__name__) # Platform -# The same drill-down works on the generated models, which -# now carry the schema's Decimal type all the way down: +# Same attribute access on the model generated from the XSD; +# values now carry the schema's Decimal type all the way down. sector = platform.radiated_noise.band[0].directional.sector[0] -print(type(sector).__name__) -# Sector - print(sector.bearing, sector.level) # 0.000 134.000 ``` -Each `print` reaches one level deeper, and every value is a **declared object** — -`CalculationResult`, then `ActiveSonarResult`, then `SectorResult`, then the schema-generated -`Platform` and `Sector`. Nothing in this chain is a `dict`: a field that does not exist is an -error on the line that names it, not a surprise three stages later. Raw JSON is parsed into -typed objects at exactly one place, and from there the whole flow is typed data — which is the -whole point of the sections below. +Every value here is a **typed object** with declared fields — `CalculationResult`, then +`SectorResult`, then the schema-generated `Platform` and `Sector`. Because each step is typed, +your IDE autocompletes the attribute names and a type checker flags a wrong one; you never have +to remember a JSON key or risk a silent typo. The raw JSON is parsed into typed objects at +exactly one place, and from there the whole flow is typed data — which is the whole point of the +sections below. ## Storing data in a dictionary From 2652499faa6df3cb768c60f59956631302bca636 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 16 Jun 2026 20:47:45 +0000 Subject: [PATCH 6/7] docs: explain what to_model() does in the typed-data example MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The jump to "platform = to_model(result)" was unexplained. Expand the inline comment so it is self-describing — to_model copies the calculation result into the schema-generated classes, converts to the schema's types, and range-checks each value before serialisation — and name to_model in the intro prose so the reader meets it before the code. --- docs/concepts/typed-vs-dicts.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/concepts/typed-vs-dicts.md b/docs/concepts/typed-vs-dicts.md index 977858f..a665c28 100644 --- a/docs/concepts/typed-vs-dicts.md +++ b/docs/concepts/typed-vs-dicts.md @@ -12,8 +12,8 @@ If you begin from a **set of parameters held in a structure**, every later stage can stay typed — there is never a point where the data degrades into an untyped `dict` you have to trust by convention. The pipeline already works this way: the acoustic seams return a typed -`CalculationResult`, the single mapping turns that into generated model objects, and only those -objects are serialised. +`CalculationResult`, the single mapping (`to_model`) turns that into the schema-generated model +objects, and only those objects are serialised. ```python from acoustic_dataset import acoustics @@ -36,7 +36,9 @@ print(result.active_sonar.source_level_db) print(result.bands[0].sectors[0]) # SectorResult(bearing_deg=0.0, level_db=134.0) -# The single mapping -> the schema-generated typed model. +# to_model() is the one mapping step: it copies result into +# the schema-generated classes, converts to the schema's types, +# and range-checks each value -> ready to serialise as XML. platform = to_model(result) print(type(platform).__name__) # Platform From 75ad5c253b58a0b72880955460f913f32f5e2ec2 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 16 Jun 2026 21:30:26 +0000 Subject: [PATCH 7/7] docs: front the typed page on the schema data object, not the intermediate Interim trim ahead of the planned refactor that removes the intermediate CalculationResult. The headline example no longer teaches the result -> to_model -> platform two-step; it collapses production into a single "produce the schema's data object" line and explores only that object (a Platform generated from the XSD) with IDE autocomplete and the schema's Decimal types. Output values are captured from a real run. The page now tells the defensible story: we build one data object that meets the schema. --- docs/concepts/typed-vs-dicts.md | 57 +++++++++++---------------------- 1 file changed, 19 insertions(+), 38 deletions(-) diff --git a/docs/concepts/typed-vs-dicts.md b/docs/concepts/typed-vs-dicts.md index a665c28..0a80ff9 100644 --- a/docs/concepts/typed-vs-dicts.md +++ b/docs/concepts/typed-vs-dicts.md @@ -1,19 +1,15 @@ # Typed data, end to end -> **Explanation** — once you are writing Python in this pipeline the data lives in **typed -> objects from start to finish**: a structured set of parameters, the calculation result, the -> generated models, then XML. The only loosely-typed moment is the raw JSON at the very edge, -> parsed *once* at a single boundary. This page shows that end-to-end typed flow, and what -> strong typing buys you over a generic `dict`. See -> ADR 0002. +> **Explanation** — once you are writing Python in this pipeline the data lives in a **typed +> object that meets the schema**, not a loosely-typed `dict`. The only loosely-typed moment is +> the raw JSON at the very edge, parsed *once* inside the pipeline. This page shows what strong +> typing buys you over a generic `dict`. See ADR 0002. -## Start from a structure, not a bag of keys +## Work with the schema's data object -If you begin from a **set of parameters held in a structure**, every later stage can stay -typed — there is never a point where the data degrades into an untyped `dict` you have to -trust by convention. The pipeline already works this way: the acoustic seams return a typed -`CalculationResult`, the single mapping (`to_model`) turns that into the schema-generated model -objects, and only those objects are serialised. +The pipeline turns the calculation into **one typed data object that meets the schema** — a +`Platform` generated from the XSD. You work with that object directly: every field is declared, +typed, and documented by the contract, so values never have to live in a loosely-typed `dict`. ```python from acoustic_dataset import acoustics @@ -21,41 +17,26 @@ from acoustic_dataset.mapping import to_model input_path = "examples/calculation_input.json" -# calculate_from_file parses the JSON once and returns a typed -# object. From here you use attributes, never string keys — so -# the IDE autocompletes each step and a typo fails immediately. -result = acoustics.calculate_from_file(input_path) - -# result is a typed CalculationResult; explore it by attribute: -print(type(result).__name__) -# CalculationResult - -print(result.active_sonar.source_level_db) -# 215.0 +# Produce the schema's data object from the calculation input: +platform = to_model(acoustics.calculate_from_file(input_path)) -print(result.bands[0].sectors[0]) -# SectorResult(bearing_deg=0.0, level_db=134.0) - -# to_model() is the one mapping step: it copies result into -# the schema-generated classes, converts to the schema's types, -# and range-checks each value -> ready to serialise as XML. -platform = to_model(result) +# It is generated from the XSD; explore it by attribute. +# The IDE autocompletes each step and the values carry +# the schema's Decimal type — no raw JSON key in sight: print(type(platform).__name__) # Platform -# Same attribute access on the model generated from the XSD; -# values now carry the schema's Decimal type all the way down. +print(platform.radiated_noise.band[0].centre_frequency) +# 50.000 + sector = platform.radiated_noise.band[0].directional.sector[0] print(sector.bearing, sector.level) # 0.000 134.000 ``` -Every value here is a **typed object** with declared fields — `CalculationResult`, then -`SectorResult`, then the schema-generated `Platform` and `Sector`. Because each step is typed, -your IDE autocompletes the attribute names and a type checker flags a wrong one; you never have -to remember a JSON key or risk a silent typo. The raw JSON is parsed into typed objects at -exactly one place, and from there the whole flow is typed data — which is the whole point of the -sections below. +Every value here is a typed attribute of the schema's data object, not a `dict` key. Your IDE +autocompletes each step and a type checker flags a wrong one; the values carry the schema's +`Decimal` type. The raw JSON is parsed once, inside the pipeline, and you never index it by key. ## Storing data in a dictionary