This repository contains the experimental engine for Plura, a research-focused VLM (Visual Language Model) designed to audit and outperform GPT-5.2 in specific UI navigation tasks.
- Core Logic:
plura_engine.py- The main orchestration pipeline. - Visual Physics:
indexing/visual_physics/- Modules for "Spectral Saliency" and click refinement (auditing perception artifacts). - Architecture:
indexing/ocumamba_lite/- Implementation of Mamba-based vision encoders for high-efficiency inference. - Benchmarking:
scripts/gpt52_benchmark_fixed.py- The evaluation harness used to compare speed/cost against SOTA.
- See
scripts/VASTAI_DEPLOY.mdfor GPU cluster deployment notes.
Note: This is an active research repo. You will see failed experiment scripts (e.g., _v1, _debug) which are preserved for audit trails.
Research and benchmarking for GUI visual grounding models.
scripts/- Benchmark and evaluation scriptsindexing/- Active Inference GUI grounding implementationdocs/- Research reports and documentationsecurity/- Rate limiting and security utilities
gpt52_cot_benchmark.py- GPT-5.2 benchmark with Chain-of-Thought reasoninggpt_benchmark.py- GPT-4o benchmark scriptactive_gui_grounding.py- OcuMamba-Lite + Active Inference implementation
pip install openai datasets pillow
export OPENAI_API_KEY="your-key"python scripts/gpt52_cot_benchmark.pySee docs/research_report.md for full analysis.