Scoring rules for filters by mattlevine22 · Pull Request #259 · BasisResearch/dynestyx

mattlevine22 · 2026-06-15T20:22:48Z

Summary

This PR adds first-class support for observation scoring on filter-predicted observation distributions, focused on the continuous-time cd_dynamax Gaussian filter path. Users can now attach a scoring_config to Filter(...) to compute proper scoring rules at each observation time, while keeping predictive-summary recording as a separate concern.

Dependency

Relies on CD Dynamax PR.

What’s included

New scoring API in dynestyx/inference/scoring.py:
- ObservationScoringConfig
- GaussianLogProbScore
- DawidSebastianiScore
- ObservationWiseCRPSScore
- EnergyScore
New predicted-observation enrichment layer in dynestyx/inference/observation_predictions.py that:
- canonicalizes backend-specific predicted-observation outputs
- computes predictive observation covariances
- optionally records predicted observation means, covariances, and ensembles
- computes score arrays and records them as NumPyro sites when requested
Integration into the continuous-time filter handler path:
- dynestyx/inference/filters.py
- dynestyx/inference/integrations/cd_dynamax/continuous_filter.py
New BaseFilterConfig options in dynestyx/inference/filter_configs.py for predicted-observation recording:
- record_predicted_observations_mean
- record_predicted_observations_cov
- record_predicted_observations_ensemble

Behavior / API notes

Scoring is defined on the one-step-ahead predictive observation distribution, not on the filtered state posterior.
Scoring and predicted-observation recording are separate:
- users can score without recording predicted means/covariances/ensembles
- users can record predicted means/covariances/ensembles without scoring
In the Filter(...) handler path, scoring only does work when the score arrays will actually be surfaced as NumPyro sites. If record_as_numpyro_sites=False, the handler path skips score computation entirely.
ObservationScoringConfig.sample_seed is now the single scoring-level seed for any synthetic predictive sampling performed by Dynestyx during scoring, including:
- adding observation noise to a latent predictive ensemble
- drawing predictive observation samples from Gaussian moments for EnergyScore
Rules that only need predictive moments (GaussianLogProbScore, DawidSebastianiScore, ObservationWiseCRPSScore) do not depend on ensemble availability or sample_source.
Rules that need predictive samples (EnergyScore) use sample_source to choose between:
- a backend-provided predictive observation ensemble
- a predictive observation ensemble synthesized by adding observation noise to a latent ensemble
- predictive samples drawn from Gaussian predictive moments

Performance / implementation details

Added a structured fast path for fixed observation noise covariances:
- if the observation model is LinearGaussianObservation or GaussianObservation
- and R is non-callable
- the covariance is broadcast across time directly instead of recomputed per observation
Fallback per-time covariance construction uses jax.lax.map rather than a Python loop.
EnergyScore supports both:
- fast vectorized pairwise computation
- lower-memory scan-based pairwise computation via vectorized_pairwise=False

Docs and tutorials

Added public and developer API pages:
- docs/api_reference/public/inference/scoring.md
- docs/api_reference/developer/inference/scoring.md
Updated filter and filter-config API docs to explain scoring vs. predicted-observation recording.
Added a new gentle-intro tutorial:
- docs/tutorials/gentle_intro/12_observation_scoring_with_filters.ipynb
Updated gentle-intro navigation/index so scoring is now Part 12, after the missingness tutorials.

Testing

Added a dedicated scoring test suite in tests/test_filter_scoring.py
Coverage includes:
- score-site correctness against backend outputs
- predicted-observation recording correctness
- Gaussian vs. ensemble-based scoring paths
- unsupported/skip behavior
- synthetic sampling from Gaussian moments
- backend observation ensemble precedence
- fast-path/fallback observation covariance behavior
- vectorized vs. scan equivalence for EnergyScore

Dependency / build notes

cd-dynamax is pinned to the required upstream commit SHA in pyproject.toml while waiting on the upstream release path.

Copilot

Pull request overview

Adds first-class support for computing and (optionally) recording proper scoring rules for one-step-ahead predicted observation distributions produced by continuous-time CD-Dynamax Gaussian filters, including integration into the Filter handler, tests, and documentation updates.

Changes:

Introduces dynestyx.inference.scoring (score definitions + ObservationScoringConfig) and dynestyx.inference.observation_predictions (backend-to-canonical prediction enrichment + trace recording).
Wires scoring/enrichment through the continuous-time CD-Dynamax filter path and the Filter handler (including batched/plate execution).
Adds a comprehensive test suite for scoring/recording behavior and updates tutorials/API docs navigation to include the new scoring topic.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/test_filter_scoring.py	New tests covering scoring rule outputs and trace recording behavior across continuous-time filter configs.
pyproject.toml	Switches `cd-dynamax` dependency to a Git branch reference needed for the new backend outputs.
mkdocs.yml	Adds tutorial + API nav entries for observation scoring and related missing-observations tutorials.
dynestyx/inference/scoring.py	New scoring-rule implementations and scoring configuration dataclass.
dynestyx/inference/observation_predictions.py	New canonicalization/enrichment layer to derive prediction summaries and scores from backend outputs and record them into the trace.
dynestyx/inference/integrations/cd_dynamax/continuous_filter.py	Plumbs `scoring_config` into the continuous-time CD-Dynamax filter run and records prediction/score sites.
dynestyx/inference/filters.py	Adds `scoring_config` to the `Filter` handler and enforces current support constraints; wires scoring through continuous-time paths and plate/batched execution.
dynestyx/inference/filter_configs.py	Adds `record_predicted_observations_*` fields to filter configs and includes them in recording kwargs.
dynestyx/inference/init.py	Exposes the new `scoring` module at the package level.
docs/tutorials/gentle_intro/11c_missing_observations_hmms.ipynb	Updates tutorial “Next” navigation to point to the new scoring tutorial.
docs/tutorials/gentle_intro/00_index.ipynb	Adds the Part 12 scoring tutorial to the gentle intro index.
docs/api_reference/public/inference/filters.md	Documents `Filter` scoring support and links to the Scoring page.
docs/api_reference/public/inference/filter_configs.md	Mentions predicted-observation recording fields and links to the Scoring page.
docs/api_reference/developer/inference/filters.md	Developer-facing note about scoring entry point and where backend translation lives.
docs/api_reference/developer/inference/filter_configs.md	Developer-facing note about predicted-observation recording fields and scoring linkage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    if scoring_config.target == "latent_predictive":
+        if scoring_config.sample_source in {"auto", "backend_ensemble"}:
+            return predictions.ensemble
+        if scoring_config.sample_source == "latent_ensemble_plus_noise":
+            if predictions.ensemble is None or predictions.noise_cov is None:
+                raise NotImplementedError(
+                    "Sampling a data-predictive ensemble from a latent ensemble "
+                    "requires both a latent predictive ensemble and observation "
+                    "noise covariance."
+                )
+            return _sample_data_predictive_ensemble(
+                predictions.ensemble,
+                predictions.noise_cov,
+                sample_seed=scoring_config.sample_seed,
+            )
+        raise NotImplementedError(
+            f"Unsupported scoring sample source: {scoring_config.sample_source}."
+        )


    "effectful>=0.2.0",
    "cuthbert>=0.0.10",
    "cuthbertlib>=0.0.10",
-    "cd-dynamax>=0.3.3",
+    "cd-dynamax @ git+https://github.com/hd-UQ/cd_dynamax.git@ml-return-ypreds",
    "matplotlib>=3.10.7",


+        pairwise = pred_ensemble[..., :, None, :] - pred_ensemble[..., None, :, :]
+        second_term = 0.5 * jnp.mean(
+            jnp.linalg.norm(pairwise, axis=-1) ** self.beta,
+            axis=(-2, -1),
+        )


+        record_predicted_observations_mean (bool): Save the predicted
+            observation mean at each observation time, before conditioning on
+            that observation. Defaults to `False`, and scoring can be used
+            without automatically recording predictive summaries.
+        record_predicted_observations_cov (bool): Save the predicted
+            observation covariance at each observation time, before
+            conditioning on that observation. Defaults to `False`.
+        record_predicted_observations_ensemble (bool): Save the
+            predicted observation ensemble at each observation time
+            (ensemble-based filters only). Defaults to `False`.


…tions, address copilot comments

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 5 comments.

+    assert predictions.mean is not None
+    if predictions.obs_cov is None:
+        raise NotImplementedError(
+            "Observation scoring requires predictive observation covariance."
+        )
+    return (
+        predictions.mean,
+        predictions.obs_cov,
+        _select_scoring_ensemble(
+            predictions,
+            scoring_config=scoring_config,
+        ),
+    )


    "cuthbert>=0.0.10",
    "cuthbertlib>=0.0.10",
-    "cd-dynamax>=0.3.3",
+    "cd-dynamax @ git+https://github.com/hd-UQ/cd_dynamax.git@ml-return-ypreds",


+    t_len = _time_len_from_array(obs_times, plate_shapes)
+    state_shape = (*plate_shapes, dynamics.state_dim)
+    x_probe = jnp.zeros(state_shape, dtype=jnp.asarray(obs_times).dtype)
+    covs = []
+    for t_idx in range(t_len):
+        t = _slice_time_axis(obs_times, t_idx, plate_shapes)
+        u_t = (


Copilot

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

+    t_len = _time_len_from_array(obs_times, plate_shapes)
+    state_shape = (*plate_shapes, dynamics.state_dim)
+    x_probe = jnp.zeros(state_shape, dtype=jnp.asarray(obs_times).dtype)


    "effectful>=0.2.0",
    "cuthbert>=0.0.10",
    "cuthbertlib>=0.0.10",
-    "cd-dynamax>=0.3.3",
+    "cd-dynamax @ git+https://github.com/hd-UQ/cd_dynamax.git@ml-return-ypreds",
    "matplotlib>=3.10.7",


Copilot

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

+        sample_source: Strategy for obtaining predictive observation
+            ensembles when a rule needs samples. `"auto"` prefers a
+            backend-provided predictive observation ensemble, then falls back
+            to adding observation noise to a latent predictive ensemble, and
+            finally to Gaussian moments if the rule supports that path.
+        sample_seed: PRNG seed used when Dynestyx needs to synthesize
+            predictive ensembles from moments or latent ensembles plus noise.


    "effectful>=0.2.0",
    "cuthbert>=0.0.10",
    "cuthbertlib>=0.0.10",
-    "cd-dynamax>=0.3.3",
+    "cd-dynamax @ git+https://github.com/hd-UQ/cd_dynamax.git@0fd1bbf9dba5154af70d9dae9b925e572c023368",
    "matplotlib>=3.10.7",


Copilot

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

+    posterior, predictions, score_arrays = enrich_continuous_filter_output(
+        posterior,
+        dynamics=dynamics,
+        filter_config=filter_config,
+        obs_times=obs_times,
+        obs_values=obs_values,
+        ctrl_values=ctrl_values,
+        scoring_config=scoring_config,
+        plate_shapes=plate_shapes,
+    )


    "cuthbert>=0.0.10",
    "cuthbertlib>=0.0.10",
-    "cd-dynamax>=0.3.3",
+    "cd-dynamax @ git+https://github.com/hd-UQ/cd_dynamax.git@0fd1bbf9dba5154af70d9dae9b925e572c023368",


DanWaxman · 2026-06-17T11:49:16Z

Will take a closer look later, but we currently have dynestyx/diagnostics --- I think we should consider renaming dynestyx/diagnostics to dynestyx/evaluation and putting this there.

Agreed with moving out of inference and into dynestyx/diagnostics; I'm thinking we pull out ObservationScoringConfig and put it into a new inference/scoring_configs.py. Would be a short file, but mimics filter_configs and smoother_configs.

I'm open to renaming to evaluation, but it does create solid churn across notebooks etc. Worth it?

Agreed with moving out of inference and into dynestyx/diagnostics; I'm thinking we pull out ObservationScoringConfig and put it into a new inference/scoring_configs.py. Would be a short file, but mimics filter_configs and smoother_configs.

Great! I can picture it being not-so-short down the line, anyways.

I'm open to renaming to evaluation, but it does create solid churn across notebooks etc. Worth it?

I think so! But I'm biased. Maybe we should focus-group it...

Okay sounds good! I'll change the name, I like evaluation a bit better too...just wanted to keep the PR from changing too many files. But if you like it let's do it.

mattlevine22 added 6 commits June 13, 2026 23:54

wip

61dca28

tests run; notebook works

859b93c

Merge branch 'main' into ml-filter-scoring

ce38287

make the notebook a tutorial

383e06b

improve docs and small refactors

5980bfc

improving docs, tests, and tutorial

7fec431

mattlevine22 requested a review from Copilot June 15, 2026 23:12

Copilot started reviewing on behalf of mattlevine22 June 15, 2026 23:13 View session

Copilot AI reviewed Jun 15, 2026

View reviewed changes

add jaxtyping, add pairwise dist options, remove latent-predictive op…

5729c5e

…tions, address copilot comments

mattlevine22 requested a review from Copilot June 16, 2026 00:33

Copilot started reviewing on behalf of mattlevine22 June 16, 2026 00:33 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

mattlevine22 added 2 commits June 15, 2026 20:51

oops adding forgotten docs

7f34629

fix when ensemble_scoring happens; jaxify a for loop

4baadda

mattlevine22 requested a review from Copilot June 16, 2026 01:47

Copilot started reviewing on behalf of mattlevine22 June 16, 2026 01:48 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

use a fixed SHA for CD-Dynamax; exploit constant obs covs when possible

c945520

mattlevine22 requested a review from Copilot June 16, 2026 02:09

Copilot started reviewing on behalf of mattlevine22 June 16, 2026 02:09 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

streamline scoring sample seed

379ccfc

mattlevine22 requested a review from Copilot June 16, 2026 02:27

Copilot started reviewing on behalf of mattlevine22 June 16, 2026 02:27 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Refine filter scoring outputs and docs

7ec18c3

mattlevine22 marked this pull request as ready for review June 16, 2026 21:05

mattlevine22 requested review from DanWaxman and baptistar June 16, 2026 21:07

mattlevine22 requested a review from LukeSnow0 June 16, 2026 21:08

DanWaxman reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoring rules for filters#259

Scoring rules for filters#259
mattlevine22 wants to merge 12 commits into
mainfrom
ml-filter-scoring

mattlevine22 commented Jun 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

DanWaxman Jun 17, 2026

Uh oh!

mattlevine22 Jun 18, 2026

Uh oh!

DanWaxman Jun 18, 2026

Uh oh!

mattlevine22 Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mattlevine22 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependency

What’s included

Behavior / API notes

Performance / implementation details

Docs and tutorials

Testing

Dependency / build notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

DanWaxman Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

mattlevine22 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

DanWaxman Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

mattlevine22 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattlevine22 commented Jun 15, 2026 •

edited

Loading