Temporally enforced metrics for 3D / 4D perception.
stmetrics packages is a set of DDP-compatible pytorch metrics for perception, understanding, and reconstruction tasks spanning multiple temporal observations. Metrics collapse to their standard 3D form if evaluating one timestep.
Submodules:
- Instances: standard mAP, temporal AP (t-AP), per-timestep AP, per-aux-label recall, instance feature similarity (t-SIM)
- Planned: points, tracking, geometry, lidar
git clone https://github.com/GradientSpaces/stmetrics.git
cd stmetrics
pip install -e .Requires Python >= 3.8 and PyTorch >= 1.10.
Each evaluator applies one temporal-enforcement policy. The configured list of
evaluators is the metric set; recall=True adds per-aux-label recall.
from stmetrics.instances import (
InstanceMetrics, LegacyAPEvaluator, TemporalEvaluator, SelectTimestepEvaluator,
)
metric = InstanceMetrics(
dataset="/path/to/my_dataset.yaml",
heads=[ # omit for a single TemporalEvaluator()
TemporalEvaluator(recall=True, aux="changes"), # t-AP + t-REC
LegacyAPEvaluator(), # AP (overlap pooled over timesteps)
SelectTimestepEvaluator(timesteps=[0]), # stage1-AP
SelectTimestepEvaluator(timesteps=[1]), # stage2-AP
],
log_prefix="val",
)
for preds, targets in your_validation_loader:
metric.update(preds, targets)
results = metric.compute()
print(results["val_mean_AP"], results["val_mean_t-AP"])See examples/minimal_instances.py for a
runnable end-to-end script with synthetic data.
InstanceMetrics.update(preds, targets) takes two lists of dicts of the
same length — one entry per scene. Tensors can live on CPU or GPU; the metric
moves them to its own device.
| Key | Shape | dtype | Description |
|---|---|---|---|
pred_classes |
(K,) |
int | Predicted class id (matches valid_class_ids in the dataset spec) for each of K predicted instances |
pred_scores |
(K,) |
float | Confidence in [0, 1] per instance |
pred_masks |
(N, K) |
bool / int | Per-point predicted masks; N is the total number of points/voxels in the scene |
| Key | Shape | dtype | Description |
|---|---|---|---|
ids |
(G,) |
int | GT instance id per GT object (>= 0 valid; negative means ignore) |
labels |
(G,) |
int | GT class id per GT object |
masks |
(G, N) |
bool | Per-point GT masks |
timesteps |
(N,) |
int | Timestep id per point. Optional but will treat as 3D if not provided. Key configurable via InstanceMetrics(timestep_key=...) |
| Key | Shape | dtype | Description |
|---|---|---|---|
aux_labels |
(G,) |
int | Change-type id per GT object (matches valid_aux_ids in the spec) |
ambiguities |
list[list[int]] |
-- | Optional. Groups of instance ids that are interchangeable across stages (e.g. identical objects swapped between time t and t+1). Empty list if none. |
To enable pairwise temporal feature similarity:
| Key (configurable) | Shape | dtype | Description |
|---|---|---|---|
features |
(M, D) |
float | Per-segment / per-point embedding of dimension D |
ids |
(M,) |
int | GT instance id each segment belongs to (the head compares same-id features across stages) |
timesteps |
(M,) |
int | Stage id per segment. Note must be restricted to two distinct integer labels |
Each dataset is described by a small YAML file. See examples/rio.yaml
for a complete example (an 18-class indoor 4D-scene benchmark).
# my_dataset.yaml
name: my_dataset
# Human-readable class names, in iteration order. Must align with valid_class_ids.
class_labels:
- cabinet
- chair
- table
- door
# Integer class IDs corresponding 1:1 to class_labels. These are the values
# that appear in target.labels / pred.pred_classes. Omit to assume 0..N-1.
valid_class_ids:
- 3
- 5
- 7
- 8
# Key on each target dict holding the per-instance aux label (e.g. "changes").
aux: changes
# Aux-label names, in iteration order. Output keys use these (e.g. val_rigid_REC).
# Datasets with no aux concept can use a single label, e.g. ["all"] / [0].
aux_labels:
- static
- rigid
- nonrigid
# Integer aux IDs corresponding 1:1 to aux_labels. Match target[aux].
valid_aux_ids:
- 0
- 1
- 2
# Optional. Group class names into categories for "mean per-category" output keys:
# val_mean_head_AP, val_mean_common_AP, val_mean_tail_AP, ...
# Omit the categories block entirely if your dataset has no such structure.
categories:
head:
- cabinet
- chair
common:
- table
tail:
- doorPass the path to your yaml file:
from stmetrics import load_dataset_spec
spec = load_dataset_spec("/path/to/my_dataset.yaml")InstanceMetrics(dataset=...) accepts the same form.
compute() returns a dict[str, torch.Tensor] with keys of the form:
<log_prefix>_<class_or_change_or_'mean'>_<metric_type>[_50|_25]
Where <metric_type> is one of:
| Suffix | Source evaluator |
|---|---|
AP |
LegacyAPEvaluator — mAP, overlap pooled over timesteps, IoUs 0.5--0.9 |
t-AP |
TemporalEvaluator — temporal mAP (IoU > threshold in every timestep) |
stage1-AP, stage2-AP, ... |
SelectTimestepEvaluator(timesteps=[k]) — per-timestep AP |
REC, t-REC, stage1-REC, ... |
any evaluator with recall=True — per-aux-label recall |
_50 / _25 variants are computed at IoU=0.5 / 0.25 only.
Per-category mean keys (<prefix>_mean_<cat>_<metric>) appear iff the
dataset spec declares a categories: block.
TSimHead emits:
<log_prefix>_tsim_mean
<log_prefix>_tsim_median
<log_prefix>_tsim_std
<log_prefix>_tsim_n_pairs
If you already drive your project via Hydra, the composite accepts a list of
{"_target_": ...} head configs directly:
# my_metric.yaml
_target_: stmetrics.instances.InstanceMetrics
dataset: /path/to/my_dataset.yaml
heads:
- _target_: stmetrics.instances.TemporalEvaluator
recall: true
aux: changes
- _target_: stmetrics.instances.LegacyAPEvaluator
- _target_: stmetrics.instances.SelectTimestepEvaluator
timesteps: [0]If you find our code and paper useful, please cite our work ReScene4D which first introduced t-mAP.
@inproceedings{steiner2026rescene4d,
author = {Steiner, Emily and Zheng, Jianhao and Howard-Jenkins, Henry and Xie, Chris and Armeni, Iro},
title = {ReScene4D: Temporally Consistent Semantic Instance Segmentation of Evolving Indoor 3D Scenes},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}