data(leaderboard): publish the 4 remaining LLM entries (GPT-5.5, Gemini, Qwen, DeepSeek) by FlyM1ss · Pull Request #75 · Open-Finance-Lab/AgenticTrading

FlyM1ss · 2026-07-05T17:02:21Z

Summary

Adds full 161-step contest-window runs (2026-04-15 → 2026-05-15) for three CommonStack-gateway LLM entries to the committed seed DB. Each is 100% LLM-decided (161/161 calls) and clears the H6 95%-coverage guard from #74 — no rule-based fallback masquerading as a model.

Model	Return	Sharpe	Max DD	Trades	Est. cost
GPT-5.5	+1.45%	1.56	2.55%	239	~$13.89
Gemini 3.1 Pro Preview	+2.32%	2.03	3.29%	267	~$11.26
Qwen3.7 Plus	+2.49%	2.19	2.59%	205	~$1.59

This brings the leaderboard to 5 of 6 LLM models (Claude Haiku 4.5 + Sonnet 4.6 already live via #72/#73).

Notes

DeepSeek V4 Pro is deliberately not in this PR — it's a slow reasoning model still running (~2 min/call); it will land in a follow-up PR once complete, so the board isn't blocked on the slowest entry.
Only dashboard/storage/data/backtest.db changes (additive: 3 new runs + their equity curves), mirroring feat(leaderboard): publish Claude Haiku 4.5 + Sonnet 4.6 contest entries #72.
The DB is in WAL mode; the snapshot was taken with VACUUM INTO so all WAL-committed runs are captured in the committed file.
leaderboard.json already lists all six LLM entries, so no config change is needed — these entries simply needed cached runs.

🤖 Generated with Claude Code

Adds full 161-step contest-window runs (2026-04-15→05-15) for three CommonStack-gateway LLM entries to the seed DB, each 100% LLM-decided (161/161 calls) — clearing the H6 95%-coverage guard (PR #74): GPT-5.5 +1.45% Sharpe 1.56 239 trades ~$13.89 Gemini 3.1 Pro Prev +2.32% Sharpe 2.03 267 trades ~$11.26 Qwen3.7 Plus +2.49% Sharpe 2.19 205 trades ~$1.59 Brings the board to 5 of 6 LLM models (Claude Haiku 4.5 + Sonnet 4.6 already live via #72/#73). DeepSeek V4 Pro to follow in a separate PR (slow reasoning model, still running). Snapshot taken via VACUUM INTO so all WAL-committed runs are captured in the committed file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014t1xHNTNh5mybdjMQAvQcG

DeepSeek V4 Pro finished its full 161-step contest-window run, 100% LLM-decided (161/161 calls), clearing the H6 95%-coverage guard: DeepSeek V4 Pro +7.49% Sharpe 5.01 277 trades ~$0.76 It tops the LLM field and is the only agent to beat the passive baselines. Completes the CommonStack LLM set alongside GPT-5.5, Gemini 3.1 Pro, and Qwen3.7 Plus in this PR; all six leaderboard LLM models now have runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014t1xHNTNh5mybdjMQAvQcG

FlyM1ss and others added 2 commits July 6, 2026 01:02

FlyM1ss changed the title ~~data(leaderboard): publish GPT-5.5, Gemini 3.1 Pro, Qwen3.7 Plus entries~~ data(leaderboard): publish the 4 remaining LLM entries (GPT-5.5, Gemini, Qwen, DeepSeek) Jul 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

data(leaderboard): publish the 4 remaining LLM entries (GPT-5.5, Gemini, Qwen, DeepSeek)#75

data(leaderboard): publish the 4 remaining LLM entries (GPT-5.5, Gemini, Qwen, DeepSeek)#75
FlyM1ss wants to merge 2 commits into
mainfrom
leaderboard-llm-entries-round2

FlyM1ss commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FlyM1ss commented Jul 5, 2026

Summary

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant