π BookRetrievalAI β Flexible .NET RAG Chatbot System with Azure OpenAI, Ollama & Qdrant Vector DB
A modular Retrieval-Augmented Generation (RAG) system built with .NET 9, powered by Semantic Kernel, supporting:
- βοΈ Azure OpenAI
- π₯ Local LLMs via Ollama
- π§ Qdrant Vector Database (Docker)
- π Configuration-based provider switching
- π§ Custom local model & embedding selection
This project implements a complete RAG pipeline:
- Parse book summaries dataset
- Chunk content into smaller segments
- Generate embeddings
- Store vectors in Qdrant
- Retrieve relevant chunks
- Build contextual prompt
- Generate final answer using selected LLM provider
You can switch between Azure OpenAI and local Ollama models without changing code β only configuration.
Retrieval-Augmented Generation (RAG) improves LLM responses by:
- Searching relevant information from a vector database
- Injecting that context into the prompt
- Generating grounded, data-aware answers
Instead of relying only on model training data, RAG uses your own dataset.
User Question
β
Embedding Model
β
Qdrant Vector Search
β
Context Builder
β
Prompt Builder
β
Chat Model (Azure or Ollama)
β
Final Response
Used for:
- Chat completion
- Embedding generation
- Prompt orchestration
- Multi-provider abstraction
Default configuration:
- Chat Model:
gpt-4o-mini - Embedding Model:
text-embedding-3-small
β‘ You can change deployment names in appsettings.json to use any Azure deployment you create.
Default configuration:
- Chat Model:
qwen2.5:3b - Embedding Model:
nomic-embed-text
This system is not limited to the default models.
You can use any chat model or embedding model supported by Ollama.
Simply update:
"ChatModel": "your-local-chat-model",
"EmbeddingModel": "your-local-embedding-model"As long as the model exists in Ollama, the system can use it.
Vector database used to:
- Store embeddings
- Perform similarity search
- Retrieve relevant chunks
Runs locally via Docker.
Run using Docker:
docker run -p 6334:6333 qdrant/qdrantQdrant will be available at:
http://localhost:6334
Your config:
"QdrantEndpoint": "http://localhost:6334"Download from:
Pull required models:
ollama pull qwen2.5:3b
ollama pull nomic-embed-textStart Ollama:
ollama serveDefault endpoint:
http://localhost:11434
All configuration is controlled via appsettings.json.
"AzureOpenAI": {
"Endpoint": "https://your-endpoint.openai.azure.com/",
"ApiKey": "YOUR_API_KEY",
"ChatDeployment": "gpt-4o-mini",
"EmbeddingDeployment": "text-embedding-3-small",
"collectionName": "books",
"Enabled": true
}Disable Ollama:
"Ollama": {
"Enabled": false
}"Ollama": {
"Endpoint": "http://localhost:11434",
"ChatModel": "your-model",
"EmbeddingModel": "your-embedding-model",
"collectionName": "booksWithOllama",
"Enabled": true
}Disable Azure:
"AzureOpenAI": {
"Enabled": false
}Default dataset file:
dataset/booksummaries.txt
The included dataset contains 100 book summaries. It is intentionally small for:
- Fast testing
- Quick indexing
- Development purposes
For high-scale RAG testing, you can download the full CMU Book Summary Dataset:
https://www.kaggle.com/datasets/ymaricar/cmu-book-summary-dataset
This dataset contains thousands of book summaries and is ideal for:
- Performance testing
- Large vector indexing
- Real-world RAG benchmarking
After downloading, update:
"DatasetFilePath": "your-new-dataset-path"- Start Qdrant (Docker)
- (Optional) Start Ollama
- Configure
appsettings.json - Run project:
dotnet run- Qdrant must be running before indexing
- Ollama must be running if local mode enabled
- Azure requires valid API key and deployment names
- Collections are separated per embedding model
- When changing embedding models, use a new collection name

