DecisionRules.jl

DecisionRules.jl is a Julia package for training parametric decision rules through multi-stage optimization, following the Two-Stage General / Deep Decision Rules (TS-GDR / TS-DDR) framework. The main use-case is multi-stage stochastic control where the feasible (closed-loop) action at each stage is obtained by solving an optimization problem (e.g., OPF, MPC), and we want to train a policy end-to-end that maps observed states and uncertainties to target trajectories.

Motivation and workflow

In TS-GDR, the policy does not directly output a control action. Instead, it outputs targets (typically a target state trajectory) that are enforced inside an optimization model through target constraints with slack. For a sampled uncertainty trajectory $($w_{1:T})$, the workflow is:

sample $w_{1:T}$
predict targets $\hat x_{1:T} = \pi(\cdot;\theta)$
solve an optimization problem that projects targets onto the feasible set (dynamics + constraints)
differentiate to update $\theta$ using dual information and/or implicit sensitivities (via DiffOpt)

DecisionRules.jl implements this workflow in three flavors:

Deterministic equivalent (direct transcription): one coupled optimization over the full horizon.
Stage-wise decomposition (single shooting): one optimization per stage in a sequential rollout.
Windowed decomposition (multiple shooting): one coupled optimization per window, chained by the realized end-state.

Installation

using Pkg
Pkg.add(url="https://github.com/LearningToOptimize/DecisionRules.jl.git")

What you need to provide

DecisionRules.jl is intentionally “model-first”: you describe your problem in JuMP (DiffOpt-enabled for all but the deterministic equivalent approach), then the package handles simulation and training.

For any multi-stage model you will need:

subproblems::Vector{JuMP.Model}: one JuMP model per stage (DiffOpt-enabled).
state_params_in[t]: a vector of parameter variables for the incoming state at stage t.
state_params_out[t]: a vector of (target_param, realized_state_var) tuples at stage t. The target_param is the parameter variable that the policy sets; the realized_state_var is the JuMP decision variable whose value becomes the realized state.
an uncertainty_sampler that returns per-stage samples in the format used by DecisionRules.sample(...).
Differentiable policies built with Flux.jl or similar compatible libraries. Input size is the number of uncertainty components per stage plus the size of the initial state; output size is the size of the target state at each stage.

Working patterns are provided in examples/.

Deterministic equivalent (TS-DDR / direct transcription)

This corresponds to solving one coupled optimization over all stages (the deterministic equivalent of a sampled trajectory). You build the deterministic-equivalent JuMP model with DecisionRules.deterministic_equivalent! and then train with the deterministic-equivalent overload of train_multistage.

using DecisionRules, JuMP, DiffOpt, Flux
using SCS

# 1) Build per-stage subproblems (DiffOpt-enabled) and collect:
#    subproblems, state_params_in, state_params_out, uncertainty_sampler, uncertainties_structure

# 2) Build the deterministic equivalent over the full horizon
det = DiffOpt.diff_model(() -> DiffOpt.diff_optimizer(SCS.Optimizer))

det, uncertainties_structure_det = DecisionRules.deterministic_equivalent!(
    det,
    subproblems,
    state_params_in,
    state_params_out,
    Float64.(initial_state),
    uncertainties_structure,
)

# 3) Train a TS-DDR policy end-to-end
num_uncertainties = length(uncertainty_sampler()[1])  # number of uncertainty components per stage
policy = Chain(
    Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
    Dense(64, length(initial_state)),
)

DecisionRules.train_multistage(
    policy,
    initial_state,
    det,
    state_in_det,
    state_out_det,
    uncertainty_sampler;
    num_batches=100,
    num_train_per_batch=32,
    optimizer=Flux.Adam(1e-3),
)

This mode typically gives the most faithful gradient signal (full coupling across the horizon), but it requires solving the largest inner problem per sample.

Stage-wise decomposition (single shooting)

Single shooting solves one optimization per stage and rolls forward using the realized state returned by the solver. The policy can be closed-loop because it receives the realized state $x_{t-1}$ when predicting the next target $\hat x_t$.

using DecisionRules, Flux

num_uncertainties = length(uncertainty_sampler()[1])
policy = Chain(
    Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
    Dense(64, length(initial_state)),
)

DecisionRules.train_multistage(
    policy,
    initial_state,
    subproblems,
    state_params_in,
    state_params_out,
    uncertainty_sampler;
    num_batches=100,
    num_train_per_batch=32,
    optimizer=Flux.Adam(1e-3),
)

Internally, gradients are obtained by combining (i) dual information for target parameters and (ii) solution sensitivities computed through DiffOpt along the rollout.

Windowed decomposition (multiple shooting)

Multiple shooting partitions the horizon into windows of length window_size. Each window solves a deterministic equivalent over its stages, then passes the realized end state to the next window. This can stabilize learning over long horizons compared to pure single shooting, while remaining cheaper than a full-horizon deterministic equivalent.

using DecisionRules, Flux, DiffOpt
using SCS

num_uncertainties = length(uncertainty_sampler()[1])
policy = Chain(
    Dense(DecisionRules.policy_input_dim(num_uncertainties, length(initial_state)), 64, relu),
    Dense(64, length(initial_state)),
)

windows = DecisionRules.setup_shooting_windows(
    subproblems,
    state_params_in,
    state_params_out,
    Float64.(initial_state),
    uncertainty_samples;
    window_size=24,
    model_factory=() -> DiffOpt.nonlinear_diff_model(optimizer_with_attributes(
        SCS.Optimizer,
        "verbose" => 0,
    )),
)

DecisionRules.train_multiple_shooting(
    policy,
    initial_state,
    windows,
    state_params_in,
    state_params_out,
    uncertainty_sampler;
    window_size=24,  # e.g., 6, 24, ...
    num_batches=100,
    num_train_per_batch=32,
    optimizer=Flux.Adam(1e-3),
)

Evaluation: stage-wise rollout and target-violation share

Evaluating a trained policy only through the deterministic equivalent can overstate its quality: the coupled solve re-optimizes all stages jointly and absorbs targets that are not followable stage by stage through the slack penalty — exactly what deployment cannot do. The stage-wise rollout is the deployment semantics of a target-trajectory policy, so report it as the headline metric, together with a target-violation measure.

The training loops record metrics through a per-sample SampleLog cache and a per-batch record(sample_log, iter, model) callback. RolloutEvaluation is a ready-made helper that evaluates the policy stage-wise on a fixed held-out scenario set; call it from within record:

using DecisionRules, Random

# Materialize a FIXED held-out evaluation set once, before training
Random.seed!(1234)
eval_scenarios = [DecisionRules.sample(uncertainty_samples) for _ in 1:8]

rollout_eval = RolloutEvaluation(
    subproblems, state_params_in, state_params_out, initial_state, eval_scenarios;
    stride=25,  # evaluate every 25 batches
    policy_state=:realized,
)

train_multistage(policy, initial_state, det, state_in_det, state_out_det, uncertainty_sampler;
    num_batches=100,
    record=(sample_log, iter, model) -> begin
        rollout_eval(iter, model)
        return false
    end)

policy_state selects which state is fed back to the policy between stage solves:

:realized feeds the previous realized optimizer state to the policy. This is the closed-loop/deployment rollout and is the default.
:target feeds the previous target/predicted state to the policy, matching the deterministic-equivalent target-generation path while still solving the stage subproblems sequentially.

Use policy_state=:target when comparing against a deterministic-equivalent training loss, because both metrics then call the policy on the same target-state history. Log :realized separately as the harder deployment diagnostic.

Each evaluation reports (a) the rollout objective excluding the target-slack penalty term (the operational cost) and (b) the target-violation share — the realized slack penalty divided by the full objective. Policy comparisons are only trustworthy when the violation share is small (≤ ~0.05): a larger share means the policy's targets are not followable stage by stage and the reported cost is not what deployment would realize. When training drives the violation share to ~0, the deterministic-equivalent and rollout views are expected to coincide; the rollout metric is the guard that detects when they don't.

Per-sample debugging hooks can be attached with SampleLog(on_sample=(s, models, log) -> ...); the training loop calls the hook after each sample's solve with the live JuMP model(s). The previous record_loss=(iter, model, loss, tag) -> ... keyword keeps working as a deprecated adapter.

Examples and tests

Examples live in examples/. Run tests with:

julia --project -e 'using Pkg; Pkg.test()'

Citation

If you use this package in academic work, please cite:

@article{rosemberg2024efficiently,
  title={Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality},
  author={Rosemberg, Andrew and Street, Alexandre and Vallad{\~a}o, Davi M and Van Hentenryck, Pascal},
  journal={arXiv preprint arXiv:2405.14973},
  year={2024}
}

DiffOpt (for differentiating through optimization): https://github.com/jump-dev/DiffOpt.jl

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
.github		.github
docs		docs
examples		examples
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
LocalPreferences.toml		LocalPreferences.toml
Project.toml		Project.toml
README.md		README.md
diagram_tsddr.png		diagram_tsddr.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DecisionRules.jl

Motivation and workflow

Installation

What you need to provide

Deterministic equivalent (TS-DDR / direct transcription)

Stage-wise decomposition (single shooting)

Windowed decomposition (multiple shooting)

Evaluation: stage-wise rollout and target-violation share

Examples and tests

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DecisionRules.jl

Motivation and workflow

Installation

What you need to provide

Deterministic equivalent (TS-DDR / direct transcription)

Stage-wise decomposition (single shooting)

Windowed decomposition (multiple shooting)

Evaluation: stage-wise rollout and target-violation share

Examples and tests

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages