Skip to content

data(leaderboard): publish the 4 remaining LLM entries (GPT-5.5, Gemini, Qwen, DeepSeek)#75

Open
FlyM1ss wants to merge 2 commits into
mainfrom
leaderboard-llm-entries-round2
Open

data(leaderboard): publish the 4 remaining LLM entries (GPT-5.5, Gemini, Qwen, DeepSeek)#75
FlyM1ss wants to merge 2 commits into
mainfrom
leaderboard-llm-entries-round2

Conversation

@FlyM1ss

@FlyM1ss FlyM1ss commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds full 161-step contest-window runs (2026-04-15 → 2026-05-15) for three CommonStack-gateway LLM entries to the committed seed DB. Each is 100% LLM-decided (161/161 calls) and clears the H6 95%-coverage guard from #74 — no rule-based fallback masquerading as a model.

Model Return Sharpe Max DD Trades Est. cost
GPT-5.5 +1.45% 1.56 2.55% 239 ~$13.89
Gemini 3.1 Pro Preview +2.32% 2.03 3.29% 267 ~$11.26
Qwen3.7 Plus +2.49% 2.19 2.59% 205 ~$1.59

This brings the leaderboard to 5 of 6 LLM models (Claude Haiku 4.5 + Sonnet 4.6 already live via #72/#73).

Notes

  • DeepSeek V4 Pro is deliberately not in this PR — it's a slow reasoning model still running (~2 min/call); it will land in a follow-up PR once complete, so the board isn't blocked on the slowest entry.
  • Only dashboard/storage/data/backtest.db changes (additive: 3 new runs + their equity curves), mirroring feat(leaderboard): publish Claude Haiku 4.5 + Sonnet 4.6 contest entries #72.
  • The DB is in WAL mode; the snapshot was taken with VACUUM INTO so all WAL-committed runs are captured in the committed file.
  • leaderboard.json already lists all six LLM entries, so no config change is needed — these entries simply needed cached runs.

🤖 Generated with Claude Code

FlyM1ss and others added 2 commits July 6, 2026 01:02
Adds full 161-step contest-window runs (2026-04-15→05-15) for three
CommonStack-gateway LLM entries to the seed DB, each 100% LLM-decided
(161/161 calls) — clearing the H6 95%-coverage guard (PR #74):

  GPT-5.5               +1.45%  Sharpe 1.56  239 trades  ~$13.89
  Gemini 3.1 Pro Prev   +2.32%  Sharpe 2.03  267 trades  ~$11.26
  Qwen3.7 Plus          +2.49%  Sharpe 2.19  205 trades  ~$1.59

Brings the board to 5 of 6 LLM models (Claude Haiku 4.5 + Sonnet 4.6
already live via #72/#73). DeepSeek V4 Pro to follow in a separate PR
(slow reasoning model, still running). Snapshot taken via VACUUM INTO so
all WAL-committed runs are captured in the committed file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014t1xHNTNh5mybdjMQAvQcG
DeepSeek V4 Pro finished its full 161-step contest-window run, 100%
LLM-decided (161/161 calls), clearing the H6 95%-coverage guard:

  DeepSeek V4 Pro       +7.49%  Sharpe 5.01  277 trades  ~$0.76

It tops the LLM field and is the only agent to beat the passive baselines.
Completes the CommonStack LLM set alongside GPT-5.5, Gemini 3.1 Pro, and
Qwen3.7 Plus in this PR; all six leaderboard LLM models now have runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014t1xHNTNh5mybdjMQAvQcG
@FlyM1ss FlyM1ss changed the title data(leaderboard): publish GPT-5.5, Gemini 3.1 Pro, Qwen3.7 Plus entries data(leaderboard): publish the 4 remaining LLM entries (GPT-5.5, Gemini, Qwen, DeepSeek) Jul 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant