AM_Eng is a full-stack AI chat application built with Django and powered by a fine-tuned Large Language Model (LLM). The system delivers a ChatGPT-like experience with a custom-trained LoRA model, enabling contextual, technical, and concise conversations focused on computer science and engineering topics.
The goal of AM_Eng is to explore how custom-trained LLMs can be embedded into real-world applications, focusing on:
- Context-aware technical assistance
- Persistent memory across interactions
- Efficient inference on limited hardware
- Clean and responsive user experience
- ChatGPT-like interface with real-time interaction
- Custom LoRA fine-tuned model for technical domains
- Persistent memory system (user fact extraction + reuse)
- Optimized inference pipeline using 4-bit quantization
- Secure Markdown rendering (XSS-safe with DOMPurify)
- Context-aware responses using conversation history
- Django (Python)
- REST-like endpoint for LLM interaction
- Session-based conversation handling
- Base Model:
Qwen2.5-3B-Instruct - Fine-tuning: LoRA (PEFT)
- Inference: Transformers + BitsAndBytes (4-bit)
Fine-tuned LoRA model available at: https://huggingface.co/PGFerraz/qwen-alpaca20k-amEng
Highlights:
- Reduced VRAM usage via quantization
- Domain-adapted responses for engineering contexts
- Modular integration with base model
- Python 3.10+
- Git
- (Recommended) NVIDIA GPU with CUDA
- pip / virtualenv
Optional:
- CUDA Toolkit
nvidia-smiworking
git clone https://github.com/PGFerraz/AM_Eng-Web.git
cd AM_Eng-Webpython -m venv .venv
source .venv/bin/activate # Linux / Mac
# Windows
.venv\Scripts\activatepip install --upgrade pip
pip install -r requirements.txtThe model is downloaded automatically from Hugging Face: https://huggingface.co/PGFerraz/qwen-alpaca20k-amEng First run may take time.
python manage.py runserverOpen: http://127.0.0.1:8000