Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/concepts/pipeline-data-flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,15 @@ flowchart TD
The **populated domain objects, before serialisation, are the typed testable boundary** —
tests assert on them directly, or diff the serialised XML against a golden file. No separate
intermediate (no CSV, no pickle) is needed to get testability; that whole chain was removed
([ADR 0002](../decisions/0002-drop-csv-pickle-and-write_xml.md)).
(ADR 0002).

## The entities as an ER diagram

This is the kind of **Mermaid ERD** the docs render — here drawn by hand for the data-flow story
(it deliberately includes pipeline entities like the golden and reference files). The
**[schema reference](../reference/schema/index.md)** ERD, by contrast, is produced
**automatically from the schema** by `make gen-schema-docs`
([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)).
(ADR 0009).

```mermaid
erDiagram
Expand Down
8 changes: 4 additions & 4 deletions docs/concepts/schema-as-contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ diagram a third. Each is a place the truth can rot.
If there is exactly **one** source (the XSD) and everything else is generated from it, drift
becomes structurally impossible — you change the schema and re-generate, or you don't change
the format at all. CI enforces this by regenerating and failing on any difference
(see [ADR 0008](../decisions/0008-generated-models-no-drift.md)).
(see ADR 0008).

## What flows from the one source

Expand Down Expand Up @@ -66,6 +66,6 @@ right (and enriched) is the high-leverage work.

- [The two verification gates](two-verification-gates.md) — why schema-valid isn't enough.
- [Pipeline data flow](pipeline-data-flow.md) — the entities and how they move.
- ADRs [0001](../decisions/0001-schema-driven-generation-with-xsdata.md),
[0008](../decisions/0008-generated-models-no-drift.md),
[0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md).
- ADRs 0001,
0008,
0009.
6 changes: 3 additions & 3 deletions docs/concepts/two-verification-gates.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@ flowchart LR
Three mechanical checks, all in CI:

- **Conformant by construction** — output comes from schema-generated objects
([ADR 0001](../decisions/0001-schema-driven-generation-with-xsdata.md)), so it starts in
(ADR 0001), so it starts in
the right shape.
- **XSD validation** — `xmlschema` confirms the document validates against the contract
([ADR 0003](../decisions/0003-xmlschema-as-validation-gate.md)).
(ADR 0003).
- **Round-trip** — parse the XML back and re-serialise; if it changed meaningfully, a binding
or serialisation loss occurred. This catches things validation alone cannot.

Expand All @@ -54,7 +54,7 @@ generator, new output can be perfectly schema-valid yet differ from files a cons
depends on (perhaps relying on a quirk of the old hand-rolled output). The migration-safety
comparison diffs new output against a known-good **reference** file to surface exactly this.
See [Change the schema](../how-to/change-the-schema.md) and
[ADR 0004](../decisions/0004-two-gate-verification.md).
ADR 0004.

## Why split them at all?

Expand Down
84 changes: 67 additions & 17 deletions docs/concepts/typed-vs-dicts.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,58 @@
# Typed objects vs. dictionaries
# Typed data, end to end

> **Explanation** — the input is loaded as JSON (nested `dict`s), but the pipeline maps it once
> onto typed objects generated from the schema and stores data in those. This page sets out what
> strong typing constrains at the point data is stored. See
> [ADR 0002](../decisions/0002-drop-csv-pickle-and-write_xml.md).
> **Explanation** — once you are writing Python in this pipeline the data lives in a **typed
> object that meets the schema**, not a loosely-typed `dict`. The only loosely-typed moment is
> the raw JSON at the very edge, parsed *once* inside the pipeline. This page shows what strong
> typing buys you over a generic `dict`. See ADR 0002.

## Work with the schema's data object

The pipeline turns the calculation into **one typed data object that meets the schema** — a
`Platform` generated from the XSD. You work with that object directly: every field is declared,
typed, and documented by the contract, so values never have to live in a loosely-typed `dict`.

```python
from acoustic_dataset import acoustics
from acoustic_dataset.mapping import to_model

input_path = "examples/calculation_input.json"

# Produce the schema's data object from the calculation input:
platform = to_model(acoustics.calculate_from_file(input_path))

# It is generated from the XSD; explore it by attribute.
# The IDE autocompletes each step and the values carry
# the schema's Decimal type — no raw JSON key in sight:
print(type(platform).__name__)
# Platform

print(platform.radiated_noise.band[0].centre_frequency)
# 50.000

sector = platform.radiated_noise.band[0].directional.sector[0]
print(sector.bearing, sector.level)
# 0.000 134.000
```

Every value here is a typed attribute of the schema's data object, not a `dict` key. Your IDE
autocompletes each step and a type checker flags a wrong one; the values carry the schema's
`Decimal` type. The raw JSON is parsed once, inside the pipeline, and you never index it by key.

## Storing data in a dictionary

A `dict` places no constraints on what it holds: keys are arbitrary strings and values are `Any`.

```python
record = {}
record["sourceLevel"] = 215.0 # any key, any value type
record["sorceLevel"] = 9999 # a misspelled key is just another entry
record["sourceLevel"] = "loud" # a string replaces the number, with no objection
record["sourceLevel"] = 215.0 # any key, any value type
record["sorceLevel"] = 9999 # misspelled key, stored anyway
record["sourceLevel"] = "loud" # a string replaces the number
```

The structure exists only by convention. A misspelled key, a wrong value type, or an omitted
field is stored as readily as correct data, so a mistake surfaces later — when something reads
the value, when the XML fails validation, or not at all.
the value, when the XML fails validation, or not at all. Carry data this way and *every* stage
downstream inherits that uncertainty; start from a typed structure and none of it does.

## Storing data in a typed object

Expand All @@ -29,8 +63,12 @@ is stored is defined up front:
from decimal import Decimal
from acoustic_dataset.models.acoustic_dataset import Sector

Sector(bearing=Decimal("30.000"), level=Decimal("134.000")) # the declared fields
Sector(bering=Decimal("30.000"), level=Decimal("134.000")) # TypeError: unexpected 'bering'
# The declared fields are accepted:
Sector(bearing=Decimal("30.000"), level=Decimal("134.000"))

# An unknown field fails at construction:
Sector(bering=Decimal("30.000"), level=Decimal("134.000"))
# -> TypeError: unexpected keyword argument 'bering'
```

- A name that is not a declared field is rejected when the object is constructed (`TypeError`),
Expand All @@ -39,7 +77,9 @@ Sector(bering=Decimal("30.000"), level=Decimal("134.000")) # TypeError: unexp
(`mypy`, run by `make verify`) reports a wrong-typed value before the code runs:

```python
Sector(bearing="thirty", level=Decimal("134.000")) # mypy: incompatible type "str"
# mypy flags a wrong-typed value before the code runs:
Sector(bearing="thirty", level=Decimal("134.000"))
# -> mypy: incompatible type "str"
```

- The fields and their documentation are generated from the schema, so the stored object follows
Expand All @@ -56,12 +96,20 @@ import dataclasses
from acoustic_dataset import acoustics
from acoustic_dataset.mapping import to_model, MappingError

result = acoustics.calculate_from_file("examples/calculation_input.json")
# Decibels are bounded to [-200, 300]; attempt to store an impossible source level:
input_path = "examples/calculation_input.json"
result = acoustics.calculate_from_file(input_path)

# Decibels are bounded to [-200, 300].
# Force an impossible source level, then map it:
bad = dataclasses.replace(
result, active_sonar=dataclasses.replace(result.active_sonar, source_level_db=9999.0)
result,
active_sonar=dataclasses.replace(
result.active_sonar, source_level_db=9999.0
),
)
to_model(bad) # MappingError — rejected as it is stored, not left for a later stage
to_model(bad)
# -> MappingError: rejected as it is stored,
# not left for a later stage
```

A `dict` would hold `9999` and pass it on; the typed boundary rejects it.
Expand All @@ -74,5 +122,7 @@ A `dict` would hold `9999` and pass it on; the typed boundary rejects it.
| Value types | `Any` | declared (e.g. `Decimal`), checked by `mypy` |
| Out-of-range values | stored as-is | rejected by the mapping (`MappingError`) |
| Relationship to the schema | convention only | generated from it |
| What downstream stages receive | a shape to trust on faith | a declared structure, all the way to XML |

The behaviours above are checked in `tests/unit/test_typed_vs_dict.py`.
Hold the whole flow in typed data and these guarantees compound at every stage instead of
having to be re-checked. The behaviours above are checked in `tests/unit/test_typed_vs_dict.py`.
8 changes: 4 additions & 4 deletions docs/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,21 +51,21 @@ from the contract. See the [generated schema reference](reference/schema/index.m
## Tooling

**xsdata** — generates typed Python dataclasses from an XSD and binds objects ↔ XML.
See [ADR 0001](decisions/0001-schema-driven-generation-with-xsdata.md).
See ADR 0001.

**xmlschema** — pure-Python library used as the XSD validation gate.
See [ADR 0003](decisions/0003-xmlschema-as-validation-gate.md).
See ADR 0003.

**MkDocs Material** — static-site generator that turns this Markdown into the attractive
HTML site, with native Mermaid support.
See [ADR 0009](decisions/0009-mkdocs-material-mermaid-html-docs.md).
See ADR 0009.

**Mermaid** — text-based diagramming (flowcharts, ER diagrams) rendered in the browser;
keeps diagrams in version control as plain text.

**Devcontainer / GitHub Codespaces** — a declarative development environment so a
contributor gets a ready-to-run setup with no manual installation.
See [ADR 0006](decisions/0006-codespaces-with-local-fallback.md).
See ADR 0006.

**Diátaxis** — the documentation framework (tutorials / how-to / reference / explanation)
this site is organised around.
Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/build-the-docs-site.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
> **How-to** — produce and preview the attractive HTML documentation.

The docs are written in Markdown under `docs/` and rendered by **MkDocs Material**
([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)). Mermaid diagrams render
(ADR 0009). Mermaid diagrams render
natively.

## Preview while you write
Expand Down
6 changes: 3 additions & 3 deletions docs/how-to/change-the-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ redesign*: models, validation, bindings and the schema docs all regenerate from
```
Runs `xsdata` over the schema and rewrites `src/acoustic_dataset/models/`. **Do not
hand-edit** the result — it's a generated artifact
([ADR 0008](../decisions/0008-generated-models-no-drift.md)). Generation is pinned to the 3.9
(ADR 0008). Generation is pinned to the 3.9
toolchain so the output is byte-reproducible for the drift gate.

3. **Regenerate the schema docs + ERD.**
```bash
make gen-schema-docs
```
Produces the reference pages and the Mermaid ERD from the schema
([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)).
(ADR 0009).

4. **Update the mapping.**
`src/acoustic_dataset/mapping.py` is the **one place** that knows element names — update it
Expand Down Expand Up @@ -59,7 +59,7 @@ python -m acoustic_dataset.cli compare build/acoustic_dataset.xml examples/refer

A clean match exits 0; a meaningful difference prints a diff and exits non-zero — catching
output that is schema-valid but differs from what a consumer depends on
([ADR 0004](../decisions/0004-two-gate-verification.md)).
(ADR 0004).

## What you should *not* touch

Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/run-the-migration-safety-comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

Schema validity proves the output fits the contract; it does **not** prove it says the same
thing as a file a consumer already depends on. The `compare` command closes that gap
([ADR 0004](../decisions/0004-two-gate-verification.md), FR-015).
(ADR 0004, FR-015).

## Steps

Expand Down
4 changes: 2 additions & 2 deletions docs/how-to/use-the-codespace.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
1. On the repository page: **Code → Codespaces → Create codespace on
`claude/zealous-davinci-n4ssvn`** (or your branch).
2. Wait for provisioning. The devcontainer pins **Python 3.9.4** to match the target system
([ADR 0007](../decisions/0007-pin-python-3-9-4.md)) and runs `make bootstrap` automatically.
(ADR 0007) and runs `make bootstrap` automatically.

## Everyday commands

Expand Down Expand Up @@ -45,4 +45,4 @@ make bootstrap

Everything above works locally too — you only need Python 3.9.x and `make`. Run
`make bootstrap` once, then the same targets. See
[ADR 0006](../decisions/0006-codespaces-with-local-fallback.md) for why we keep both paths.
ADR 0006 for why we keep both paths.
17 changes: 12 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ A schema-driven pipeline that turns acoustic calculation output into validated
**XML Acoustic Dataset**, plus the documentation that lets you *learn it* and *defend
the decisions behind it*.

!!! tip "Writing Python here? Start with the data"
The pipeline keeps everything in **typed objects from a structured parameter set all the
way to XML**. The one page to read first is
**[Typed data, end to end](concepts/typed-vs-dicts.md)** — how to write code against the
generated structures.

This site is organised so it can **grow with your understanding**. It follows the
[Diátaxis](https://diataxis.fr) model — four kinds of documentation, each answering a
different question — plus a set of **Architecture Decision Records (ADRs)** that record
Expand All @@ -16,7 +22,6 @@ different question — plus a set of **Architecture Decision Records (ADRs)** th
| Get the environment running and see it work | **[Tutorials](tutorials/01-start-here.md)** | "Show me, step by step." |
| Do a specific task | **[How-to guides](how-to/use-the-codespace.md)** | "How do I X?" |
| Understand *why* it works this way | **[Concepts](concepts/schema-as-contract.md)** | "Help me understand." |
| Defend or revisit a decision | **[Decisions (ADRs)](decisions/index.md)** | "Why did we choose this?" |
| Look up a command or the schema shape | **[Reference](reference/index.md)** | "What exactly is X?" |
| Decode a term | **[Glossary](glossary.md)** | "What does that word mean?" |

Expand All @@ -25,17 +30,19 @@ different question — plus a set of **Architecture Decision Records (ADRs)** th
The **schema (XSD) is the contract.** Everything derives from it: typed data classes
(generated by `xsdata`), the validation gate (`xmlschema`), the HTML reference docs and
Mermaid ERD, and the language bindings shipped to consumers. We never hand-write the
things the schema can generate — *configure, don't create*. Output is trusted only after
things the schema can generate — *configure, don't create*. When you write Python here you
start from a structured set of parameters and the data stays in **typed objects end to end**,
from that structure through to the XML — see
[Typed data, end to end](concepts/typed-vs-dicts.md). Output is trusted only after
passing **two gates**: a mechanical structural gate (schema-valid + round-trips) and a
human semantic gate (is the science right?). See
[Schema as the contract](concepts/schema-as-contract.md) and
[The two verification gates](concepts/two-verification-gates.md).

## How this set grows

- Every significant choice becomes a numbered **ADR** — start at the
[decisions overview](decisions/index.md). Adding one is a documented how-to:
[Add a decision record](how-to/add-a-decision-record.md).
- Every significant choice is recorded as a numbered **ADR**, kept in the
repository under `docs/decisions/` (not published to this site).
- New understanding lands as a **concept** page; new recipes as **how-to** guides.
- The **schema reference and ERD** are *generated from the enriched XSD* — see the
[generated schema reference](reference/schema/index.md) — so the docs can never drift from the contract.
Expand Down
10 changes: 5 additions & 5 deletions docs/onboarding.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ make pipeline # produce build/acoustic_dataset.xml (schema-valid, round-trip-e

**Local fallback:** you need Python 3.9.x and `make`, then run `make bootstrap` yourself
before the same `make verify` / `make pipeline`. Both paths reach the *same* green state
([ADR 0006](decisions/0006-codespaces-with-local-fallback.md)). Run `make help` to see every target.
(ADR 0006). Run `make help` to see every target.

## Where things live

Expand All @@ -32,16 +32,16 @@ before the same `make verify` / `make pipeline`. Both paths reach the *same* gre
| Tests (unit / integration / golden) | `tests/` |
| The plan & design artifacts | `specs/001-codespace-xml-scaffold/` (`spec.md`, `plan.md`, `tasks.md`) |
| The generated schema reference + ERD | [reference/schema](reference/schema/index.md) (run `make gen-schema-docs`) |
| Why each choice was made | [Decision records](decisions/index.md) |
| Why each choice was made | `docs/decisions/` (ADRs, kept in the repo) |

## Build the mental model

Read these, in order — they're short:

1. [Schema as the contract](concepts/schema-as-contract.md) — the idea everything follows from.
2. [The two verification gates](concepts/two-verification-gates.md) — why schema-valid isn't the same as correct.
3. [Pipeline data flow](concepts/pipeline-data-flow.md) — how input becomes validated XML.
4. [Typed objects vs dictionaries](concepts/typed-vs-dicts.md) — why typed data beats a generic `dict`.
2. [Typed data, end to end](concepts/typed-vs-dicts.md) — how you write Python here: start from a structured set of parameters and keep the data typed all the way to XML.
3. [The two verification gates](concepts/two-verification-gates.md) — why schema-valid isn't the same as correct.
4. [Pipeline data flow](concepts/pipeline-data-flow.md) — how input becomes validated XML.

!!! tip "What's done, what's next"
The full Phase 1 pipeline is in place — environment, end-to-end pipeline, migration-safety
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

All command targets are implemented. The CI drift gate regenerates the models and schema docs
on the Python 3.9 target and fails if the committed artifacts are stale (see
[ADR 0008](../decisions/0008-generated-models-no-drift.md)).
ADR 0008).

## CLI subcommands

Expand All @@ -37,6 +37,6 @@ above; full input/exit-code semantics are in the contract file. Summary:

## Tooling versions

- **Python**: 3.9.4 (pinned — [ADR 0007](../decisions/0007-pin-python-3-9-4.md))
- **Python**: 3.9.4 (pinned — ADR 0007)
- **xsdata**: ≥ 24 · **xmlschema**: ≥ 3 · **mkdocs-material**: ≥ 9.5
- Lint/type floors: `ruff target-version = py39`, `mypy python_version = 3.9`
2 changes: 1 addition & 1 deletion docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,5 @@ the docs site, under version control):

`make gen-schema-docs` (re)generates the [schema reference](schema/index.md) from the enriched
XSD — the entity tables, every field's `xs:documentation` prose, and the Mermaid ERD — produced
*from the schema* so they cannot drift ([ADR 0009](../decisions/0009-mkdocs-material-mermaid-html-docs.md)).
*from the schema* so they cannot drift (ADR 0009).
The committed page is regenerated by CI; a stale copy fails the drift gate.
Loading
Loading