byte-level

Here are 10 public repositories matching this topic...

theschoolofai / kronecker-embeddings

Kronecker Embeddings: byte-level structured token representations for parameter-efficient language models. Reference implementation.

nlp transformers pytorch embeddings language-models tokenization byte-level

Updated May 29, 2026
Python

neluca / tinybpe

Star

🐍This is a fast, lightweight, and clean CPython extension for the Byte Pair Encoding (BPE) algorithm, which is commonly used in LLM tokenization and NLP tasks.

tokenizer byte-level cpython-extensions bpe llm bpe-tokenizer

Updated Jun 17, 2026
Python

OpenBlocksTeam / whinstone

Star

An efficient openblocks parser module.

byte-level openblocks openblocks-module

Updated Apr 14, 2021
Java

dinesh-git17 / bpetite

Sponsor

Star

A deterministic byte-level BPE tokenizer in pure Python, built from scratch with strict tests, typed code, and polished docs.

python nlp cli tokenizer byte-level bpe byte-pair-encoding deterministic-ai

Updated Jun 8, 2026
Python

TilelliLab / Yaz

Star

An editable, auditable 807K-param byte-level LLM: CRUD single facts with provable per-edit locality, and abstain when unsure instead of guessing. CPU, offline.

machine-learning crud cpu pytorch grace knowledge-base language-model byte-level interpretability rome serac abstention tiny-ml selective-prediction llm model-editing knowledge-editing memit

Updated Jun 20, 2026
Python

pauljump / itchy

Star

Byte-level 16MB language model — the only byte-level submission in 622+ OpenAI Parameter Golf entries. Built the right size, not shrunk from the wrong one.

python nlp machine-learning deep-learning transformer openai language-model byte-level parameter-golf

Updated Apr 13, 2026
Python

sohanpatil / fissra-server

Star

File security system using remote authentication

desktop-app encryption byte-level

Updated Dec 18, 2019
C#

juan3861 / NeuralPiece

Star

NeuralPiece: An adaptive byte-level neural tokenizer designed to surpass traditional BPE and Unigram via deep learning chunking.

nlp machine-learning deep-learning tokenizer pytorch neural-networks byte-level bpe sentencepiece llm subword-tokenization

Updated Jun 12, 2026
Python

bobwen-dev / CharLevelEngram

Star

Engram without a tokenizer

machine-learning transformers pytorch ngram byte-level huggingface engram deepseek causal-lm tokenizer-free deterministic-lookup

Updated Jan 13, 2026
Python

Woojiggun / hsl-embedding

Star

Non-learned byte-level signal encoder for PyTorch - one modality-agnostic 27-D exact base (anchor rule, Delta = Gray code), losslessly invertible. pip install hsl-embedding

signal-processing pytorch embedding byte-level gray-code multimodal tokenizer-free

Updated Jun 11, 2026
Python

Improve this page

Add a description, image, and links to the byte-level topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the byte-level topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

byte-level

Here are 10 public repositories matching this topic...

theschoolofai / kronecker-embeddings

neluca / tinybpe

OpenBlocksTeam / whinstone

dinesh-git17 / bpetite

TilelliLab / Yaz

pauljump / itchy

sohanpatil / fissra-server

juan3861 / NeuralPiece

bobwen-dev / CharLevelEngram

Woojiggun / hsl-embedding

Improve this page

Add this topic to your repo