Comet is a Muon-family matrix optimizer research artifact extracted from the nanochat optimizer experiments. It is not a proven best optimizer. The current evidence supports a narrower claim: Comet is competitive, sometimes better than tuned Muon on local proxies, and faster than full Aurora, but Aurora still has the best local d12 validation-bpb result.
Comet keeps the useful parts of Muon and adds a conservative Aurora-style correction:
- Muon-style momentum followed by matrix polar/sign projection.
- Polar Express-style GPU-friendly polar iterations.
- Aurora-inspired row-balanced projection for tall matrices.
- A tall-matrix blend between the Muon polar update and the row-balanced update.
- Muon/NorMuon-style factored variance reduction after projection.
The standalone default is conservative: pp_beta=0.60, pp_blend=0.10.
The best local nanochat d12 Comet proxy used a scheduled blend from 0.55 to
0.10.
Local d12 200-step proxy, GTX 1080 Ti:
| Optimizer | Validation bpb | Time | Peak memory |
|---|---|---|---|
| Aurora, 2 projection iterations | 2.130442 | 3.41m | 4944.41 MiB |
| Aurora, 1 projection iteration | 2.131034 | 2.27m | 4944.41 MiB |
| Scheduled Comet | 2.131265 | 1.62m | 5134.68 MiB |
| Tuned Muon | 2.131409 | 1.25m | 5080.40 MiB |
Comet is close, but the gap is too small versus Muon and still behind Aurora on bpb. The standard nanochat leaderboard benchmark remains the real gate: validation bpb, full CORE, wall-clock time, MFU, and peak memory.
