A pure RAG pipeline with TUI, designed to work well with smaller local LLMs (via Ollama).
rag-app-readme.webm
- This project is a streamlined RAG application.
- By sticking to a pure RAG pipeline, this app works well with smaller models, such as
llama3.2:3b. - Smaller models don't handle agentic workflows and tool calling very well; they often get stuck in a loop or make things up.
- With this app you get answers referencing only the uploaded documents. When the information is not there, the app will tell you instead of making things up.
- Python: 3.14 or newer.
- Package Manager: uv
- LLM Engine: Ollama installed and running locally.
- Make sure Ollama is running (
ollama serve). Ideally, configure it to autostart on boot. - You need to download an LLM (e.g.,
ollama pull llama3.2:3b) and an embedding model (e.g.,ollama pull nomic-embed-text). - Run
ollama lsto verify that the models are downloaded.
- Make sure Ollama is running (
uv tool install git+https://github.com/jotalac/rag-app.git-
Clone the repository:
git clone git@github.com:jotalac/rag-app.git cd rag-app -
Install app:
uv tool install .
If you installed globally via Git: Run the upgrade command:
uv tool upgrade rag-appIf you installed via local clone (Alternative): Pull the latest code first, then upgrade:
git pull
uv tool install . --forceStart the TUI by running:
rag-appNote on Cold Starts: The first generation query is always slow because Ollama needs to load the model into memory. This delay also occurs anytime you change the model in the configuration.
- In the app, run
/helpto see all available options. ctrl+popens the default menu, where you can change theme or do other actions
- Create a folder anywhere on your device where all your resources will be stored.
- In the TUI config dialog (
/config), set the resources directory to your created folder. - Run
/add-resources file1 file2 ...or/add-resources-dir dir_nameto embed the files into the vector database. - After the files are embedded, you can safely delete them from the resources directory.
- Type your prompt in the input, and the app will automatically look at the uploaded resources.
- Smaller models might struggle if the resources are not in English.
- If no relevant data is retrieved from the vector database, generation won't start, and you will see a info message.
- Single Workspace: You cannot separate your resources; all resources are available for all prompts.
- Language Support: For smaller models, querying in languages other than English often yields poor or hallucinated results.
- Thinking Models: Thinking output is not currently visible.
- Add support for importing resources directly from web URLs.
- Add support for embedding and querying images, audio, and other media resources.
- Add support for cloud LLM providers.
- Adding resources from any folder (not only from one resources directory)