Kronecker Embeddings: byte-level structured token representations for parameter-efficient language models. Reference implementation.
-
Updated
May 29, 2026 - Python
Kronecker Embeddings: byte-level structured token representations for parameter-efficient language models. Reference implementation.
🐍This is a fast, lightweight, and clean CPython extension for the Byte Pair Encoding (BPE) algorithm, which is commonly used in LLM tokenization and NLP tasks.
An efficient openblocks parser module.
A deterministic byte-level BPE tokenizer in pure Python, built from scratch with strict tests, typed code, and polished docs.
An editable, auditable 807K-param byte-level LLM: CRUD single facts with provable per-edit locality, and abstain when unsure instead of guessing. CPU, offline.
Byte-level 16MB language model — the only byte-level submission in 622+ OpenAI Parameter Golf entries. Built the right size, not shrunk from the wrong one.
File security system using remote authentication
NeuralPiece: An adaptive byte-level neural tokenizer designed to surpass traditional BPE and Unigram via deep learning chunking.
Engram without a tokenizer
Non-learned byte-level signal encoder for PyTorch - one modality-agnostic 27-D exact base (anchor rule, Delta = Gray code), losslessly invertible. pip install hsl-embedding
Add a description, image, and links to the byte-level topic page so that developers can more easily learn about it.
To associate your repository with the byte-level topic, visit your repo's landing page and select "manage topics."