Skip to content

vishruthb/comet-optimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

comet

Comet is a Muon-family matrix optimizer research artifact extracted from the nanochat optimizer experiments. It is not a proven best optimizer. The current evidence supports a narrower claim: Comet is competitive, sometimes better than tuned Muon on local proxies, and faster than full Aurora, but Aurora still has the best local d12 validation-bpb result.

Comet keeps the useful parts of Muon and adds a conservative Aurora-style correction:

  1. Muon-style momentum followed by matrix polar/sign projection.
  2. Polar Express-style GPU-friendly polar iterations.
  3. Aurora-inspired row-balanced projection for tall matrices.
  4. A tall-matrix blend between the Muon polar update and the row-balanced update.
  5. Muon/NorMuon-style factored variance reduction after projection.

The standalone default is conservative: pp_beta=0.60, pp_blend=0.10. The best local nanochat d12 Comet proxy used a scheduled blend from 0.55 to 0.10.

latest benchmarks

Local d12 200-step proxy, GTX 1080 Ti:

Optimizer Validation bpb Time Peak memory
Aurora, 2 projection iterations 2.130442 3.41m 4944.41 MiB
Aurora, 1 projection iteration 2.131034 2.27m 4944.41 MiB
Scheduled Comet 2.131265 1.62m 5134.68 MiB
Tuned Muon 2.131409 1.25m 5080.40 MiB

Comet is close, but the gap is too small versus Muon and still behind Aurora on bpb. The standard nanochat leaderboard benchmark remains the real gate: validation bpb, full CORE, wall-clock time, MFU, and peak memory.

About

Optimizer playground for model training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors