feat: Add MNN_LLM and FP4_ULTRA processors by Subaskar-S · Pull Request #6 · GeniusVentures/SGProcessingManager

Subaskar-S · 2026-05-26T12:51:08Z

Summary

Adds two new processor types:

MNN_LLM — Autoregressive text generation processor (token-by-token with sampling)
MNN_FP4Ultra — FP4_ULTRA quantized input processor (4-bit NF4 dequantize + inference)

Both are registered in ProcessingManager, validated in CheckProcessValidity(), and wired into the build system.

MNN_LLM Processor

Parses input as space-separated token IDs
Runs MNN forward loop with dynamic sequence length resize per step
Samples next token using temperature + top-K + top-P nucleus sampling
Stops on EOS token or max_tokens limit
Outputs generated token IDs as raw int32 bytes
Chains SHA256 hashes per generation step for proof-of-work
Generation parameters (maxTokens, temperature, topP, topK, eosTokenId) read from JSON parameters array

MNN_FP4Ultra Processor

Dequantizes packed 4-bit nibbles + per-macroblock float32 scales to FLOAT32
Uses NF4 symmetric lookup table (same as NeoSwarm FP4Codec)
Runs windowed MNN inference with overlap-add stitching (same pattern as MNN_Float)
Data layout: [packed_nibbles: ceil(N/2) bytes][scales: num_macroblocks * sizeof(float)]
Macroblock size: 64x64 = 4096 elements

Add "llm" and "fp4_ultra" to the data_type enum in the GNUS processing schema. Regenerated all C++ headers via quicktype. - LLM: autoregressive text generation processor type - FP4_ULTRA: 4-bit NF4 quantized input processor type

…ation Implements token-by-token LLM generation within SGProcessingManager: - Parses input as space-separated token IDs - Runs MNN forward loop with dynamic sequence length - Samples via temperature + top-K + top-P nucleus sampling - Stops on EOS token or max_tokens limit - Outputs generated token IDs as raw int32 bytes - Chains SHA256 hashes per step for proof-of-work Generation parameters (maxTokens, temperature, topP, topK, eosTokenId) are read from the JSON parameters array. Registered under DataType::LLM (JSON type "llm").

Handles FP4-quantized input data (4-bit NF4 with per-macroblock scales): - Dequantizes packed nibbles + float32 scales → FLOAT32 - Runs windowed MNN inference with overlap-add stitching - Same chunking/hashing pattern as MNN_Float Data layout: [packed_nibbles: ceil(N/2) bytes][scales: num_macroblocks * 4 bytes] Macroblock size: 64x64 = 4096 elements. Registered under DataType::FP4_ULTRA (JSON type "fp4_ultra").

- Add factory registration for DataType::LLM and DataType::FP4_ULTRA - Add validation cases in CheckProcessValidity() - Uncomment FP4_ULTRA in TENSOR format validation - Add includes for new processor headers - Add new source files to CMakeLists.txt

Subaskar-S added 4 commits May 25, 2026 19:53

Subaskar-S self-assigned this May 26, 2026

Subaskar-S requested a review from itsafuu May 26, 2026 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add MNN_LLM and FP4_ULTRA processors#6

feat: Add MNN_LLM and FP4_ULTRA processors#6
Subaskar-S wants to merge 4 commits into
dev_proc_data_typesfrom
feat/add-llm-fp4ultra-processors

Subaskar-S commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Subaskar-S commented May 26, 2026

Summary

MNN_LLM Processor

MNN_FP4Ultra Processor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant