A CLI tool to interact with Ollama instances. Supports streaming chat, tool calling, thinking-process handling, media understanding (image/video/audio), and flexible input/output options.
- Streaming Support — Real-time output as the model generates text.
- Thinking Process — Display or save the model's internal thought process
(
-t,--thoughtfile). - Output Redirection — Save generated content to a file (
-o). - Flexible Input — Direct string (
-i), file (-f), or stdin (--stdin). - Chat Mode — Multi-turn REPL with slash commands (
--chat). - Tool Calling — Load Python modules as callable tools (
--tool). - Tool Documentation — Inspect loaded tools, their signatures, and
environment variables (
--help-tools). - Media Understanding — Image description/OCR, video frame analysis, audio
transcription via bundled
tools.media_understanding_tools. - Model Listing — List available or running models (
-l,--ps). - Ollama Options — Pass through
temperature,num_ctx,num_gpu,keep_alive. - Model Transfer — Copy models between ollama instances (
--transfer). Supports localhost-to-remote and remote-to-remote via a built-in blob HTTP server (--serve-blobs).
- Ollama installed and running (ollama.com).
- Python 3.9+.
- The
ollamaPython library.
pip install ollamapython3 lama_ole.py -m gemma2:2b -i "Explain the theory of relativity."python3 lama_ole.py -m gemma2:2b -t -i "Solve a complex math problem step by step."python3 lama_ole.py -m gemma2:2b -i "Write a story" \
--thoughtfile thoughts.txt -o story.txtecho "Tell me a joke." | python3 lama_ole.py --stdin -m llama3.2:3bpython3 lama_ole.py --chat -m llama3.2:3bWith an initial system message:
python3 lama_ole.py --chat -m llama3.2:3b -i "You are a helpful assistant."Load one or more tool modules. The LLM can then invoke them automatically.
python3 lama_ole.py -m llama3.2:3b -i "What's the weather in Paris?" \
--tool tools.example_toolspython3 lama_ole.py --chat -m llama3.2:3b --tool tools.example_toolsUse --help-tools to see all loaded tools, their signatures, and which
environment variables they read:
python3 lama_ole.py --help-tools --tool tools.media_understanding_toolsOutput:
Tool Module: tools.media_understanding_tools
Environment Variables:
LAMA_OLE_VISION_HOST: Ollama host for vision/audio models (defaults to --host value)
Functions:
image_describe(path: string, [model: string]) — Describe the contents of an image using a vision model
image_ask(path: string, question: string, [model: string]) — Ask a specific question about an image
image_ocr(path: string, [lang: string]) — Extract text from an image using OCR (requires tesseract)
video_describe(path: string, [interval: number], [model: string]) — Describe a video by extracting frames
video_scene_changes(path: string, [threshold: number]) — Detect scene changes with ffmpeg
video_transcribe(path: string, [model: string]) — Extract audio and transcribe with Whisper
video_ask(path: string, question: string, [interval: number], [model: string]) — Ask about a video
audio_transcribe(path: string, [model: string]) — Transcribe speech with Whisper
audio_ask(path: string, question: string, [model: string]) — Transcribe and answer a question
list_vision_models() — List available vision models configured via --vision_model
python3 lama_ole.py --chat -m llama3.2:3b \
--tool tools.example_tools \
--tool tools.media_understanding_tools \
--tool tools.web_toolsThe bundled tools.media_understanding_tools module provides image, video, and
audio comprehension via Ollama vision models, Whisper transcription, and OCR.
Specify which vision models the tools should use with --vision_model
(repeatable). The first model is the default when the LLM doesn't pick one.
python3 lama_ole.py --chat -m llama3.2:3b \
--tool tools.media_understanding_tools \
--vision_model gemma3:12b --vision_model llava:13bThe LLM can call list_vision_models() to see which models are available, then
choose one by passing model="gemma3:12b" to any vision tool.
| Tool | Description |
|---|---|
image_describe(path, [model]) |
Describe image contents in detail |
image_ask(path, question, [model]) |
Ask a specific question about an image |
image_ocr(path, [lang]) |
Extract text via tesseract (default eng) |
| Tool | Description |
|---|---|
video_describe(path, [interval], [model]) |
Extract frames every interval seconds and describe each |
video_scene_changes(path, [threshold]) |
Detect scene cuts with ffmpeg |
video_transcribe(path, [model]) |
Extract audio and transcribe with Whisper |
video_ask(path, question, [interval], [model]) |
Transcribe audio + analyze a mid-video frame |
| Tool | Description |
|---|---|
audio_transcribe(path, [model]) |
Transcribe speech via Whisper |
audio_ask(path, question, [model]) |
Transcribe and answer a question |
# Describe an image
python3 lama_ole.py -m llama3.2:3b -i "Describe this image" \
--tool tools.media_understanding_tools \
--vision_model llava-phi3:3.8b
# Ask about a video
python3 lama_ole.py -m llama3.2:3b -i "What objects are in this video?" \
--tool tools.media_understanding_tools
# Transcribe audio
python3 lama_ole.py -m llama3.2:3b -i "Transcribe this recording" \
--tool tools.media_understanding_tools| Module | Description | Tools |
|---|---|---|
tools.example_tools |
Example/reference tools | get_weather, calculate, read_file |
tools.media_understanding_tools |
Image, video, audio comprehension | image_describe, image_ask, image_ocr, video_describe, video_scene_changes, video_transcribe, video_ask, audio_transcribe, audio_ask, list_vision_models |
tools.dev_tools |
Development (filesystem, code, git) | run_command, read_file, write_file, glob, grep, git_status, etc. |
tools.dev_tools_safer |
Safer subset of dev tools | (limited operations) |
tools.web_tools |
Internet access | web_fetch, web_search |
tools.image_tools |
Basic image operations | image format conversion, resizing |
tools.video_tools |
Basic video operations | video format conversion, trimming |
tools.audio_tools |
Basic audio operations | audio format conversion |
tools.read_base64 |
Base64 decoding | decode base64 strings |
lama_ole can transfer models between ollama instances using --transfer.
python3 lama_ole.py -m gemma4:12b --transfer localhost other_hostThe source must be localhost (the machine running lama_ole). The tool reads
model blobs directly from the local ollama model store and uploads them to the
destination ollama instance via its API.
On the source machine, start the blob HTTP server:
python3 lama_ole.py --serve-blobs --blob-port 9999On the orchestrator machine, point --transfer at the blob server URL:
python3 lama_ole.py -m gemma4:12b \
--transfer http://192.168.1.100:9999 other_hostThe orchestrator downloads manifests and blobs from the source's blob server, then uploads them to the destination's ollama API. Blobs are streamed in chunks to avoid memory spikes.
ssh user@remote_source "lama_ole --serve-blobs --blob-port 9999" &
python3 lama_ole.py -m gemma4:12b \
--transfer http://remote_source:9999 other_host- lama_ole reads the model manifest and all blob digests from the source
- Each blob is uploaded to the destination via
POST /api/blobs/ - The FROM path in the Modelfile is rewritten to match the destination's model store
- The model is created on the destination via
POST /api/create
In chat mode (--chat), lines starting with / are commands:
| Command | Description |
|---|---|
/feed <path> |
Read a file and send its content as a message |
/clear |
Clear the conversation history |
/model <name> |
Switch to a different model |
/save <path> |
Save the conversation to a JSON file |
/load <path> |
Load a conversation from a JSON file |
/tools |
List loaded tools |
/context |
Show message count and total character count |
/help |
Show this help message |
/exit, /quit |
Exit the chat |
| Flag | Description | Default |
|---|---|---|
-h, --help |
Show help message and exit | |
-V, --version |
Show program version and exit | |
--host HOST |
Ollama instance host | http://localhost:11434 |
-m, --model MODEL |
Model name to use | (required) |
-i, --input TEXT |
Input string for the model | |
-f, --inputfile PATH |
Read input from a file | |
--stdin |
Read input from standard input | |
-o, --outfile PATH |
Save main output to file | |
-t, --thinking |
Show model's thought process | |
--thoughtfile PATH |
Save thoughts to file (independent of -t) |
|
--temperature FLOAT |
Sampling temperature | 0.0 |
--num_ctx INT |
Context window size | (Ollama default) |
--num_gpu INT |
GPU layers to use | (Ollama default) |
--keep_alive DURATION |
Keep model in memory (5m, 1h) |
(Ollama default) |
--chat |
Start interactive chat REPL | |
--tool MODULE |
Load tool module (repeatable) | |
--vision_model MODEL |
Vision model for media tools (repeatable) | (auto-detect) |
--help-tools |
Show loaded tool documentation and exit | |
--safe |
Confirm before dangerous tool operations | |
--max_tool_rounds N |
Max tool-calling rounds | (no limit) |
--max_tool_rounds_continuation |
Behavior at limit: ask or fallback |
ask |
-l, --list |
List all available models | |
--ps |
List all running models | |
--stop MODEL |
Stop/unload a running model from memory | |
--ollama_websearch |
Activate Ollama's built-in web search tool (requires Ollama 0.5+) | |
--transfer SOURCE DEST |
Transfer a model from SOURCE to DEST ollama instance | |
--serve-blobs |
Start a blob HTTP server for remote transfer source | |
--blob-host HOST |
Host to bind blob server | 127.0.0.1 |
--blob-port PORT |
Port for blob server | random |
-v to -vvv |
Verbosity level (repeat for more) | silent |
| Level | Output |
|---|---|
| (default) | Silent — no debug output |
-v |
Tool call names + truncated results (500 chars) |
-vv |
Full tool results + messages payload before API calls |
-vvv |
Raw streaming chunks as they arrive |
Tools are Python functions decorated with @tool from tool_base:
from tool_base import tool
@tool(description="Multiply two numbers")
def multiply(a: int, b: int) -> int:
return a * b
@tool(description="Get the population of a city")
def get_population(city: str) -> str:
return f"Population of {city}: 2.5 million"Parameter types are inferred from annotations. For complex schemas, pass
explicit params:
@tool(
description="Search the web",
params={
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
},
"required": ["query"],
},
)
def web_search(query: str) -> str:
...Load your module:
python3 lama_ole.py -m llama3.2:3b -i "search for python tutorials" --tool mytoolsTools that read environment variables should define a module-level
__tool_env__ dict. These are displayed by --help-tools:
__tool_env__ = {
"MY_API_KEY": "API key for external service",
}| Variable | Default | Description |
|---|---|---|
LAMA_OLE_VISION_HOST |
--host value |
Ollama host used by media understanding tools. Overrides --host for vision/audio model calls. |
- Connection Error — Ensure Ollama is running and
--hostmatches your setup (defaulthttp://localhost:11434). - File Exists Error — The script refuses to overwrite existing files. Remove the target file first or use a different path.
- Missing Library — Run
pip install ollama. For media tools, alsopip install Pillow. - Tool not found — Use dotted module names, not file paths:
--tool tools.example_tools(not--tool tools/example_tools.py). - Media: no vision model found — Use
--vision_model MODELto specify which installed Ollama models are vision-capable. Run--help-toolswith your tool module to verify configuration. - Chat errors — Model errors in chat mode are caught gracefully and printed without exiting the REPL.
This project is open-source and available under the