An MIT App Inventor extension for running GGUF language models fully on-device using llama.cpp.
LlamaAndroid lets you load and run local LLMs inside any App Inventor app. It streams tokens as they generate, caches the model across restarts so users only pick the file once, and supports all the popular model families out of the box.
- Streaming token output
- Automatic model caching across app restarts
- Accepts direct file paths or content:// URIs from the file picker
- Zero-copy loading when models are placed in Downloads/models/
- Built-in stop string properties for Qwen, Llama 3, Mistral, Gemma and Phi
- MIT App Inventor or a compatible builder (Kodular, Niotron, etc.)
- libllamajni.so (the native llama.cpp JNI bridge for Android)
- A GGUF model file, available on HuggingFace etc.
- Java JDK 11+ and Apache Ant 1.10+ (only if building from source)
Clone with submodules so the build dependencies come along:
git clone --recurse-submodules https://github.com/pocketive/llamandroid.git
cd llamandroid
bash setup.shBuild:
antThe compiled extension will appear at out/com.pocketive.llamandroid.aix. Import it into App Inventor and you're ready.
- Call
LoadLibwith the path tolibllamajni.so - Wait for
LibsReady - Check
SavedModelPath— if empty, show a file picker; otherwise the model is already auto-loading - Call
LoadModel(path, contextSize, threads) - Wait for
ModelLoaded(success) - Call
Infer(prompt, maxTokens, stopString) - Handle
OnTokenfor streaming andOnCompletefor the full result
Pass the stop string as the third argument to Infer, or use the built-in properties:
| Model family | Property | Value |
|---|---|---|
| Qwen | StopQwen |
<|im_end|> |
| Llama 3 | StopLlama3 |
<|eot_id|> |
| Mistral | StopMistral |
</s> |
| Gemma | StopGemma |
<end_of_turn> |
| Phi | StopPhi |
<|end|> |
Pass an empty string to disable and run until maxTokens or the model's own end token.
- Place GGUF files in
Downloads/models/for zero-copy loading. TheModelsFolderproperty gives you the exact path. - Models loaded via file picker are copied to internal storage on first use and automatically load on every launch after that.
- Call
FreeModelwhen you're done to release RAM
Load
libllamajni.sofrom a file path or content:// URI.
Load a GGUF model. Fires
ModelLoaded(success)when done.
Run inference on the loaded model. Streams via
OnToken, finishes withOnComplete.
Free the loaded model from memory.
Clear the saved lib path so
LoadLibmust be called again.
Clear the saved model path so it will be re-picked next launch.
Delete the cached
.sofile and forget its path.
True if the native library is loaded and ready.
True if a model is currently loaded.
The cached lib path, or empty string.
The last successfully loaded model path, or empty string.
The device Downloads folder path.
The Downloads/models/ folder path. Put GGUF files here for zero-copy loading.
Built-in stop string values for each model family.
Fires when the native library is ready, including auto-load at startup.
Fires when model loading completes.
successis true if the model is ready to use.
Fires for each batch of generated tokens during inference.
Fires when generation is complete with the full output text.
Fires when an error occurs. Progress updates are also sent here prefixed with
progress:.



















