AutoTuningLLM: Framework for Customized Text Generation and Fine-tuning LLM Models from PDF Documents using RAG (Retrieval-Augmented Generation)

This repository serves as a foundation for a framework that enables users to create customized Large Language Model (LLM) models by fine-tuning them on their own documents in PDF format using Retrieval-Augmented Generation (RAG).

Overview

The primary goal of this project is to develop a versatile and user-friendly platform that uses LLMs fine-tuned on personal data to generate high-quality text. The framework will provide a suite of tools for:

PDF Document Analysis: Converting PDF files into text using Meta's Nougat model
Context Retrieval: Utilizing Meta's FAISS library for efficient similarity search and context retrieval
Text Embeddings: Employing the "jina-embeddings-v2-base-en" model for generating dense vector representations of text
Model Fine-tuning: Adjusting LLM models to optimize performance on specific document collections

Dependencies

This project relies on the following dependencies:

torch
transformers
faiss-gpu
sentence-transformers
jina
pdf2image
pymupdf
python-Levenshtein
nltk
ollama

Up-to-Now Status

Up to this point, we have successfully implemented the initial components of the framework: PDF-to-Text conversion using Nougat, context retrieval using FAISS, and text embeddings using "jina-embeddings-v2-base-en". The implementation of text post-processing, model leading, GUI, and fine-tuning is still in progress.

Future Development

In the coming phases, we plan to:

Implement advanced post-processing techniques to enhance text quality
Develop a model leading system that enables users to easily fine-tune their preferred open models
Establish a robust fine-tuning pipeline to optimize performance on diverse document collections

Installation

To install the required dependencies, run the following command:

pip install torch transformers faiss-gpu sentence-transformers jina pdf2image pymupdf python-Levenshtein nltk ollama

Note: This is not an exhaustive list of dependencies.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
GUI		GUI
LoadModel		LoadModel
PdfToText		PdfToText
RAG		RAG
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoTuningLLM: Framework for Customized Text Generation and Fine-tuning LLM Models from PDF Documents using RAG (Retrieval-Augmented Generation)

Overview

Dependencies

Up-to-Now Status

Future Development

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoTuningLLM: Framework for Customized Text Generation and Fine-tuning LLM Models from PDF Documents using RAG (Retrieval-Augmented Generation)

Overview

Dependencies

Up-to-Now Status

Future Development

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages