Popular repositories Loading
-
Qwen3.5-9B-OSS-Distill
Qwen3.5-9B-OSS-Distill PublicDistilling GPT-OSS reasoning traces into Qwen 3.5 9B to fix reasoning spirals — no-answer rate 36.2% → 0.5% on a hard holdout.
-
falcon-h1-slerp-merge
falcon-h1-slerp-merge PublicFirst SLERP merge of Mamba-2 hybrid LLMs (Falcon-H1-7B-Instruct × H1R-7B). Includes merge script, benchmarks, and architecture documentation.
Python
-
Zamba2-SLERP-Merge
Zamba2-SLERP-Merge PublicSLERP merge of Zamba2-7B hybrid models. Merge succeeds but weight-sharing architecture prevents evaluation. Second in a series on non-transformer SLERP merging.
Python
-
falcon-h1-deep-reasoning
falcon-h1-deep-reasoning PublicQLoRA math reasoning adapter for Falcon-H1-1.5B-Deep, the deepest Mamba-2 hybrid (66 layers, 1.5B params). 50% → 65% on math benchmarks with 2000 training examples.
Python
-
Qwen3.5-9B-Dense-To-Moe
Qwen3.5-9B-Dense-To-Moe PublicAttempted dense-to-MoE conversion of Qwen 3.5 9B (DeltaNet hybrid) using CMoE and D2DMoE. Documents why post-hoc MoEfication fails on SwiGLU models without extensive sparsification. Negative result.
If the problem persists, check the GitHub status page or contact support.