Skip to content

STEF-3047 fm forecaster#970

Draft
egordm wants to merge 23 commits into
feature/foundational-modelsfrom
feature/STEF-3047-fm-forecaster
Draft

STEF-3047 fm forecaster#970
egordm wants to merge 23 commits into
feature/foundational-modelsfrom
feature/STEF-3047-fm-forecaster

Conversation

@egordm

@egordm egordm commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

egordm added 22 commits June 15, 2026 15:58
… backends

Phase 1 + 2 of the generic ONNX foundation-model forecaster (STEF-3047).

- Wire openstef-models dependency and cpu/gpu/hub/torch extras into the
  openstef-foundation-models package.
- Add CheckpointMetadata / ResolvedCheckpoint and Local/Hub checkpoint refs
  (discriminated union) with lazy huggingface-hub download.
- Add inference layer: InferenceBackend protocol, execution-provider configs
  (cpu/cuda/tensorrt/coreml + session options), and fail-early OnnxBackend and
  model-independent TorchBackend with use-after-close guards.
- Add throwaway chronos-2 torch loader script for ONNX-vs-Torch parity checks.
- Document the optional-dependency hybrid pattern (fail-early implementation
  modules + import-light aggregators) in the code style guide.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
…e backend

Phase 3 of the generic ONNX foundation-model forecaster (STEF-3047).

- Add interpolate_quantiles: a pure NumPy helper that resamples model-native
  quantile predictions onto a requested quantile grid (piecewise-linear,
  constant-clamped extrapolation), unit tested in isolation.
- Implement Chronos2Forecaster(Forecaster) composing an InferenceBackend. It
  owns Chronos-2 preprocessing (context / attention_mask / group_ids from the
  raw target history, left-padded and masked) and postprocessing (horizon slice
  + quantile resample into a ForecastDataset). Zero-shot: fit is a no-op and the
  model is always fitted. Batch-first: predict_batch runs the backend once and
  predict is a batch-of-one wrapper.
- Add unit tests mirroring the src layout (tests/unit/utils, tests/unit/models/
  forecasting) using a recording stub backend; drop the placeholder example test.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Add presets/forecasting_workflow.py providing declarative config for
building foundation-model forecasters:

- OnnxBackendConfig wraps a checkpoint + execution providers and builds an
  OnnxBackend, lazy-importing ONNX Runtime inside build() so the preset
  stays importable without the cpu/gpu extra installed.
- FoundationForecasterConfig selects a model family (chronos2) plus
  quantiles/horizons; create_foundation_forecaster() composes the built
  backend into a Chronos2Forecaster.

The backend type is a named alias (BackendConfig) so it can grow into a
discriminated union when a second backend lands. The torch backend arm is
deferred to avoid coupling the generic preset to throwaway loader code.

Tests cover factory wiring, checkpoint resolution + option forwarding,
config JSON round-trip, and an import-light guard asserting onnxruntime is
not eagerly imported.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Add integrations/backtesting.py (behind a new [benchmarking] extra pulling
openstef-beam) bridging any Forecaster to beam's backtesting interface:

- FoundationModelBacktestForecaster wraps a single, pre-built forecaster and
  reuses it across every window (load-once: one shared backend session). Each
  window becomes a ForecastInputDataset whose forecast_start is the window
  horizon; the batch path runs a whole batch in a single backend call.
- requires_training defaults to False (foundation models are zero-shot, so
  fit() is a no-op). Windows without observed history before the horizon yield
  None and never reach the backend.
- create_foundation_model_backtest_forecaster() builds the adapter, deriving
  the predict horizon from the forecaster when not given.

The integrations package stays import-light: the module carries its own
optional dependency and is imported directly. Tests use a counting fake
forecaster to assert single-backend-call batching, None-position preservation,
horizon-aligned output, and instance reuse.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Add slow+integration tests that exercise the real exported Chronos-2 ONNX
checkpoint end to end. They self-skip when the artifact (or torch/chronos)
is unavailable, so CI and default runs stay fast.

- conftest: session-scoped fixtures resolving the artifact (env override
  OPENSTEF_CHRONOS2_ONNX_PATH), building checkpoint metadata + a sidecar,
  and loading a single shared OnnxBackend (load-once).
- test_chronos2_onnx: real backend shape/finiteness, forecaster end-to-end
  (positive raw-scale, horizon-sized, quantile-monotone), batch==single
  consistency, metadata exposure.
- test_onnx_torch_parity: ONNX-vs-Torch parity via the export-lab wrapper,
  gated on torch + chronos-forecasting.
- test_backtesting_integration: real forecaster through the beam adapter,
  asserting one ONNX session is reused across all backtest windows.

Replaces the placeholder integration stub.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Add a jupytext-paired tutorial demonstrating zero-shot probabilistic
forecasting with Chronos-2 through the ONNX inference backend:

- Build a Chronos2Forecaster from a LocalCheckpoint + metadata sidecar via
  create_foundation_forecaster (Phase 4 preset).
- Feed raw Liander load history (model owns normalization) and read raw-scale
  P10/P50/P90, then plot with ForecastTimeSeriesPlotter.
- Points at the local lab export (override with OPENSTEF_CHRONOS2_ONNX_PATH);
  collapses to a one-line HubCheckpoint once the checkpoint is published.

Wired into docs/examples.rst under a new "Foundation Models" section. The
checkpoint is large and not published yet, so the notebook is excluded from
docs execution (nb_execution_excludepatterns) and rendered without running.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
The quantile-grid resampling helper is broadly useful beyond foundation
models, so relocate it from openstef-foundation-models into
openstef_core.utils. Update the Chronos2Forecaster import and move the
unit tests alongside the other core util tests. The now-empty
foundation-models utils package is removed.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Chronos-2's headline feature is conditioning on covariates. Treat every
non-target feature column as a known covariate spanning both history and
the forecast horizon: each series now contributes a target row plus one
row per covariate, all sharing a group id so Chronos-2 attends across
them. Covariate history feeds extra context rows and covariate horizon
values feed the new future_covariates / future_covariates_mask inputs,
while the target row's future is masked out. After inference, each
series' target row is sliced back out of the grouped output.

Add unit tests covering covariate row layout, per-series grouping,
known-future values and masking, out-of-range horizon masking, and
target-row selection.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Reshape the foundation-model preset to return a CustomForecastingWorkflow
instead of a bare forecaster. The workflow wraps the Chronos-2 forecaster
with a feature Selector (preprocessing) and QuantileSorter (postprocessing),
matching the structure of other OpenSTEF models. Fit is a no-op beyond
fitting the selector, since Chronos-2 is zero-shot.

Add a configurable top-level checkpoint field (default inferred inline from
model) and configurable covariate column names defaulting to the Liander
2024 conventions. With no explicit selection, the target plus the radiation,
wind-speed and temperature columns are kept and forwarded as known
covariates.

Rename FoundationForecasterConfig -> ForecastingWorkflowConfig and
create_foundation_forecaster -> create_forecasting_workflow.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
The backtesting adapter now wraps a CustomForecastingWorkflow instead of a
bare forecaster, so the model's preprocessing (feature selection /
covariates) and postprocessing (quantile sorting) apply to every backtest
window. The wrapped workflow is built once and reused across all windows
(load-once), and the batched path still runs a whole batch of windows in a
single backend call by driving the underlying forecaster directly.

create_foundation_model_backtest_forecaster now takes a workflow and derives
the target column and default horizon from it.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
…variate graph

Rework the Chronos-2 tutorial around the new workflow API and the
re-exported five-input covariate graph:

- Tutorial now builds a CustomForecastingWorkflow via
  create_forecasting_workflow(ForecastingWorkflowConfig(...)), feeds a
  history+horizon window so weather columns act as known-future
  covariates, and predicts through workflow.predict.
- Drop the OPENSTEF_CHRONOS2_ONNX_PATH env var and the hand-built
  metadata sidecar; the LocalCheckpoint now auto-discovers the
  .metadata.json written next to the weights by the export script.
- Integration conftest reads CheckpointMetadata from the real sidecar
  and resolves the checkpoint via auto-discovery.
- Raw-backend and parity tests pass all five inputs (context, group_ids,
  attention_mask, future_covariates, future_covariates_mask); add a
  forecaster covariate-conditioning test and a known-future-covariate
  backend test.
- Backtesting integration test wraps the forecaster in a workflow.
- Consolidate the foundation-model tutorial into the main tutorials list.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
The Torch backend, its Chronos-2 loader script, and the ONNX-vs-Torch
parity test exist only to validate the ONNX export against the original
weights. Add explicit "REMOVE WHEN THE TORCH BACKEND IS DROPPED" banners
so they can be deleted together once the export is trusted and the heavy
[torch] extra is no longer wanted.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Replace the separate _build_context and _build_future_covariate with a
single module-level _reindex_to_matrix(frame, index) that aligns the whole
target+covariates frame onto one context+horizon grid in a single
vectorised pass. _build_series_rows now slices the context and future
blocks out of that matrix and masks the target's own future.

The helper is model-agnostic and importable so other foundation-model
forecasters can reuse the forecast-input -> (matrix, mask) step.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Let callers control the inference device:

- OnnxBackend.from_checkpoint now auto-selects providers when none are
  given, preferring CUDA when onnxruntime reports it available and falling
  back to CPU otherwise (new _default_providers helper). TensorRT/CoreML
  stay opt-in.
- create_forecasting_workflow accepts a pre-built `backend`, so a caller
  can hand in a GPU ONNX session or a Torch backend directly instead of
  going through provider config.

Provider docstrings updated to reflect the CUDA-then-CPU default.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
The adapter no longer reaches into the workflow's model. Removed the
_validate_template_model validator (the workflow's interface is trusted)
and the _model property. Every window is now forecast via
CustomForecastingWorkflow.predict with the window horizon as the forecast
start.

To keep the adapter on the workflow surface, expose quantiles,
max_horizon and target_column accessors on CustomForecastingWorkflow.

Forecasting runs one window at a time; batching multiple windows into a
single backend call is left as a separate, planned optimisation, so the
batch mixin, batch_size field and predict_batch path are dropped. Tests
updated accordingly.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Broaden the pure-NumPy helper module from quantiles.py to numpy.py and add
zero_fill_with_mask, which replaces non-finite values with zero and returns a
companion float32 finite-mask. Foundation-model wrappers use it to pack model
inputs without scattering NaN handling.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Build one matrix per attention group via _build_group_matrix using a single
regular date_range (context + horizon), with the target as row zero. Drop the
_SeriesRows dataclass and the _reindex_to_matrix helper, concatenate groups on
the batch dimension, and reuse zero_fill_with_mask for NaN packing.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Make model and checkpoint mandatory so users always declare what they run.
Decouple the checkpoint from the compute backend: OnnxBackendConfig now holds
only providers/session options and takes the checkpoint as a build() argument,
and it is nested under a single backend field. Make selected_features (default
all columns) the sole covariate selector, dropping the per-weather-column knobs
and the pre-built backend escape hatch. Update the tutorial accordingly.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Remove the quantiles/max_horizon/target_column properties that merely delegated
to the wrapped model; the backtest adapter now reads workflow.model directly.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Replace the subprocess-based onnxruntime check, which is unreliable when the
interpreter already has ONNX Runtime loaded, with a plain import that asserts
the preset module loads without the heavy backend.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
Forecast a 7-day horizon from a November start with P30/P50/P70 quantiles for a
more illustrative example, and generalise the plot prose to a quantile band.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
These power local notebook execution/conversion (jupytext --execute, the
notebooks-clear poe task). nbconvert was already assumed by notebooks-clear but
never pinned.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
@egordm egordm changed the title Feature/stef 3047 fm forecaster STEF-3047 fm forecaster Jun 16, 2026
@github-actions github-actions Bot added the feature New feature or request label Jun 16, 2026
The backtest adapter built each window with end=horizon, so no rows past
the forecast start reached the model. Chronos-2 is covariate-aware, so it
received zero future weather forecasts (wind, radiation, temperature) —
the dominant predictor for wind/solar — and produced poor forecasts.

Extend the window to horizon + predict_length (matching the openstef4
baseline). available_before=horizon still excludes future target actuals,
so future weather forecasts are used without look-ahead leakage.

On a wind-park smoke benchmark: rCRPS 0.147 -> 0.059, rMAE 0.271 -> 0.091.

Adds a regression test and the Chronos-2 torch/onnx benchmark notebooks.

Signed-off-by: Egor Dmitriev <egor.dmitriev@alliander.com>
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant