Skip to content

Solus-sano/OASIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OASIS: On-Demand Hierarchical Event Memory for Streaming Video Reasoning

arXiv License: MIT CVPR 2026

Streaming video reasoning requires models to operate in a setting where history grows without bound while meaningful evidence remains scarce. In such a landscape, relevant signal is like an oasis — small, critical, and easily lost in a desert of redundancy. Enlarging memory only widens the desert; aggressive compression dries up the oasis. The real difficulty lies in discovering where to look, not how much to remember.

OASIS is a training-free, plug-and-play framework that organizes streaming history into hierarchical events and performs reasoning as controlled refinement — short-context inference first, followed by semantically grounded retrieval only when uncertainty arises.

Data Flow

VideoStreamer ─► ShortMemory.push()
                  ├─ NowWindow  (recent fine-grained frames)
                  ├─ Buffer     (downsampled short-term context)
                  └─ Segment ready? ──► MLLM summarize ──► EventForest.insert_root()
                                                               └─ enforce_root_budget() ──► merge if #roots > K

User Question ──► process_query()
                   ├─ Coarse: NowWindow + Buffer + root summaries + QA summary ──► MLLM
                   └─ Fine (if tool call): retrieve events & QAs ──► MLLM with retrieved clips
                        └─ Store QA, update QA summary

Project Structure

OASIS/
├── setup.sh                           # Environment setup
├── src/
│   ├── configs/default.yaml           # Reference configuration
│   ├── oasis/                         # Core library
│   │   ├── config.py                  # OasisConfig (Pydantic)
│   │   ├── types.py                   # FramePacket, EventNode, QANode, etc.
│   │   ├── model.py                   # OasisModel — stream processing & two-stage QA
│   │   ├── event/
│   │   │   ├── forest.py              # EventForest — insert, merge, retrieve, persist
│   │   │   ├── segmenter.py           # ShortMemory — NowWindow + buffer management
│   │   │   └── compression.py         # uniform_keyframes
│   │   ├── io/stream.py               # VideoStreamer (decord/torchvision), AudioStreamer
│   │   └── utils/logging.py           # Colored logger
│   └── scripts/
│       └── eval.py                    # Unified evaluation script
├── metadata/                          # Benchmark metadata (unified JSON format)
│   ├── eval_ovo_bench.json
│   ├── eval_streambench.json
│   └── StreamingBench/
│       ├── Sequential_Question_Answering.json
│       ├── Misleading_Context_Recognition.json
│       ├── Anomaly_Context_Understanding.json
│       └── Real_Time_Visual_Understanding.json
└── datasets/                          # Video files (user-provided)

Getting Started

Requirements

  • Python 3.12
  • CUDA 12.1

Installation

conda create -n OASIS python=3.12
conda activate OASIS

conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia

pip install qwen-vl-utils[decord]
pip install accelerate==1.10.1
pip install opencv-python pydantic sentence-transformers
pip install transformers==4.57.6

# Flash Attention (Linux x86_64, CUDA 12, PyTorch 2.5)
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.5cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
pip install flash_attn-2.8.3+cu12torch2.5cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

Models

Download the following models and place them under ./models/:

Model HuggingFace Path
Qwen3-VL-8B-Instruct Qwen/Qwen3-VL-8B-Instruct models/Qwen3-VL-8B-Instruct
Qwen3-Embedding-0.6B Qwen/Qwen3-Embedding-0.6B models/Qwen3-Embedding-0.6B

Datasets

Download the benchmark datasets and place the video files under ./datasets/:

Benchmark Source Expected path under datasets/
OVO-Bench JoeLeelyf/OVO-Bench OVO-Bench/chunked_videos/
StreamBench Barry-12138/StreamBench_v0.3 StreamBench/{Ego,Movie,WebVideo}/
StreamingBench mjuicem/StreamingBench StreamingBench/{task_name}/

The metadata/ directory already contains our pre-processed annotation files in a unified format.

Evaluation

All benchmarks are evaluated through a single script. Run from the project root:

# OVO-Bench
CUDA_VISIBLE_DEVICES=0 python src/scripts/eval.py \
    --metadata metadata/eval_ovo_bench.json \
    --dataset_root datasets/ \
    --output_dir output/ovobench

# StreamBench
CUDA_VISIBLE_DEVICES=0 python src/scripts/eval.py \
    --metadata metadata/eval_streambench.json \
    --dataset_root datasets/ \
    --output_dir output/streambench

# StreamingBench (per sub-task)
CUDA_VISIBLE_DEVICES=0 python src/scripts/eval.py \
    --metadata metadata/StreamingBench/Anomaly_Context_Understanding.json \
    --dataset_root datasets/ \
    --output_dir output/streamingbench_ACU

Citation

@inproceedings{liang2026oasis,
  title     = {OASIS: On-Demand Hierarchical Event Memory for Streaming Video Reasoning},
  author    = {Liang, Zhijia and Li, Jiaming and Chen, Weikai and Zhang, Yanhao and Lu, Haonan and Li, Guanbin},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

License

This project is licensed under the MIT License.

About

[CVPR 2026] OASIS: On-Demand Hierarchical Event Memory for Streaming Video Reasoning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors