Skip to content

GradientSpaces/stmetrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stmetrics

Temporally enforced metrics for 3D / 4D perception.

stmetrics packages is a set of DDP-compatible pytorch metrics for perception, understanding, and reconstruction tasks spanning multiple temporal observations. Metrics collapse to their standard 3D form if evaluating one timestep.

Submodules:

  • Instances: standard mAP, temporal AP (t-AP), per-timestep AP, per-aux-label recall, instance feature similarity (t-SIM)
  • Planned: points, tracking, geometry, lidar

Installation

git clone https://github.com/GradientSpaces/stmetrics.git
cd stmetrics
pip install -e .

Requires Python >= 3.8 and PyTorch >= 1.10.


Quickstart — instance segmentation

Each evaluator applies one temporal-enforcement policy. The configured list of evaluators is the metric set; recall=True adds per-aux-label recall.

from stmetrics.instances import (
    InstanceMetrics, LegacyAPEvaluator, TemporalEvaluator, SelectTimestepEvaluator,
)

metric = InstanceMetrics(
    dataset="/path/to/my_dataset.yaml",
    heads=[                                          # omit for a single TemporalEvaluator()
        TemporalEvaluator(recall=True, aux="changes"),   # t-AP + t-REC
        LegacyAPEvaluator(),                             # AP (overlap pooled over timesteps)
        SelectTimestepEvaluator(timesteps=[0]),          # stage1-AP
        SelectTimestepEvaluator(timesteps=[1]),          # stage2-AP
    ],
    log_prefix="val",
)

for preds, targets in your_validation_loader:
    metric.update(preds, targets)

results = metric.compute()
print(results["val_mean_AP"], results["val_mean_t-AP"])

See examples/minimal_instances.py for a runnable end-to-end script with synthetic data.

Instance Input format (instances)

InstanceMetrics.update(preds, targets) takes two lists of dicts of the same length — one entry per scene. Tensors can live on CPU or GPU; the metric moves them to its own device.

preds[i] — one scene's predictions

Key Shape dtype Description
pred_classes (K,) int Predicted class id (matches valid_class_ids in the dataset spec) for each of K predicted instances
pred_scores (K,) float Confidence in [0, 1] per instance
pred_masks (N, K) bool / int Per-point predicted masks; N is the total number of points/voxels in the scene

targets[i] — one scene's ground truth

Key Shape dtype Description
ids (G,) int GT instance id per GT object (>= 0 valid; negative means ignore)
labels (G,) int GT class id per GT object
masks (G, N) bool Per-point GT masks
timesteps (N,) int Timestep id per point. Optional but will treat as 3D if not provided. Key configurable via InstanceMetrics(timestep_key=...)
optional temporal annotations
Key Shape dtype Description
aux_labels (G,) int Change-type id per GT object (matches valid_aux_ids in the spec)
ambiguities list[list[int]] -- Optional. Groups of instance ids that are interchangeable across stages (e.g. identical objects swapped between time t and t+1). Empty list if none.

Optional — for TSimHead

To enable pairwise temporal feature similarity:

Key (configurable) Shape dtype Description
features (M, D) float Per-segment / per-point embedding of dimension D
ids (M,) int GT instance id each segment belongs to (the head compares same-id features across stages)
timesteps (M,) int Stage id per segment. Note must be restricted to two distinct integer labels

Dataset spec (YAML)

Each dataset is described by a small YAML file. See examples/rio.yaml for a complete example (an 18-class indoor 4D-scene benchmark).

# my_dataset.yaml
name: my_dataset

# Human-readable class names, in iteration order. Must align with valid_class_ids.
class_labels:
- cabinet
- chair
- table
- door

# Integer class IDs corresponding 1:1 to class_labels. These are the values
# that appear in target.labels / pred.pred_classes. Omit to assume 0..N-1.
valid_class_ids:
- 3
- 5
- 7
- 8

# Key on each target dict holding the per-instance aux label (e.g. "changes").
aux: changes

# Aux-label names, in iteration order. Output keys use these (e.g. val_rigid_REC).
# Datasets with no aux concept can use a single label, e.g. ["all"] / [0].
aux_labels:
- static
- rigid
- nonrigid

# Integer aux IDs corresponding 1:1 to aux_labels. Match target[aux].
valid_aux_ids:
- 0
- 1
- 2

# Optional. Group class names into categories for "mean per-category" output keys:
#   val_mean_head_AP, val_mean_common_AP, val_mean_tail_AP, ...
# Omit the categories block entirely if your dataset has no such structure.
categories:
  head:
  - cabinet
  - chair
  common:
  - table
  tail:
  - door

Pass the path to your yaml file:

from stmetrics import load_dataset_spec
spec = load_dataset_spec("/path/to/my_dataset.yaml")

InstanceMetrics(dataset=...) accepts the same form.

Output keys (instances)

compute() returns a dict[str, torch.Tensor] with keys of the form:

<log_prefix>_<class_or_change_or_'mean'>_<metric_type>[_50|_25]

Where <metric_type> is one of:

Suffix Source evaluator
AP LegacyAPEvaluator — mAP, overlap pooled over timesteps, IoUs 0.5--0.9
t-AP TemporalEvaluator — temporal mAP (IoU > threshold in every timestep)
stage1-AP, stage2-AP, ... SelectTimestepEvaluator(timesteps=[k]) — per-timestep AP
REC, t-REC, stage1-REC, ... any evaluator with recall=True — per-aux-label recall

_50 / _25 variants are computed at IoU=0.5 / 0.25 only.

Per-category mean keys (<prefix>_mean_<cat>_<metric>) appear iff the dataset spec declares a categories: block.

TSimHead emits:

<log_prefix>_tsim_mean
<log_prefix>_tsim_median
<log_prefix>_tsim_std
<log_prefix>_tsim_n_pairs

Hydra integration

If you already drive your project via Hydra, the composite accepts a list of {"_target_": ...} head configs directly:

# my_metric.yaml
_target_: stmetrics.instances.InstanceMetrics
dataset: /path/to/my_dataset.yaml
heads:
  - _target_: stmetrics.instances.TemporalEvaluator
    recall: true
    aux: changes
  - _target_: stmetrics.instances.LegacyAPEvaluator
  - _target_: stmetrics.instances.SelectTimestepEvaluator
    timesteps: [0]


Citation

If you find our code and paper useful, please cite our work ReScene4D which first introduced t-mAP.

@inproceedings{steiner2026rescene4d,
      author = {Steiner, Emily and Zheng, Jianhao and Howard-Jenkins, Henry and Xie, Chris and Armeni, Iro},
      title = {ReScene4D: Temporally Consistent Semantic Instance Segmentation of Evolving Indoor 3D Scenes},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year = {2026},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages