FRED_NLP

As of 4/15/24 This is my current project

In order to install all necessary packages please run this command:

pip install -r requirements.txt

Repository and File Information

code folder: Contains the Jupyter Notebook and all code files. Run the Fred_Runner file to use our Bert Model on your data. Please contact adikrish6824@gmail.com for access to our weights file. Use the code in the jupyter notebook to generate your own model!
sampleData folder: Sample Data. Due to FERPA, I can not publish real student data/survey that we will train the model on. To this end, I have used chatGPT to generate data to train my model for this demo.
ReadME.MD file: This file
.ipynb_checkpoints folder: Checkpoints for my Jupyter Notebook file

Project Story

Large online courses have surveys that instructors need to manually sort through which may take hours of time, and for certain MOOCs may be borderline impossible
Dr. Mayer, a professor I have researched with and my former Linear Algebra and Multivariable Calculus professor brought this problem up with me.
To reduce instructor effort in large courses, this project serves to categorize survey responses using machine learning and then automate the appropriate instructor action (send an email with a response to the FAQ) or flag certain responses that require instructor attention.
I am working with Dr. Mayer on this project

Important Dates

10/31/23: First Sprint Deadline (Minimum Viable Product) ✅
11/6/23: Proposal Deadline ✅
4/15/23 -> 4/16/23: University System of Georgia Education Conference ✅

Current Progress

Sprint 1 complete, basic three-pronged classifier created with tensor flow. No major issues were detected in testing.
Sprint 2 complete, 4-pronged classifier that combines various flows. Sprint 2 issues: The model has been overfitted due to majority of data being NC it started categorizing every response as NC. Possible Options to fix the problem in Sprint 2 (model categorizes everything as no concern due to a large amount of no concern within the data set):
1. Organically make my data better by adding more options that have concerns and less options that are "no concern" (This would be the best for JUST this problem BUT LACKS Generalization)
2. Data Augmentation (maybe just cloning?) (I think this might be the best option right now as far as long term expansion goes)
3. Changing to semi supervised learning (Not enough data)
4. Convert to a model that uses transfer learning instead (Need to look into this more)
Sprint 3 complete, Back Translation Augmentation Attempt: Sprint 3 issues: Major Roadblock. Google Translate API is extremely slow and unreliable. An alternative solution needs to be found.
Sprint 4 COMPLETE, Look into Alternate Data Augmentation Method using Open AI API.

Important Resource: https://cookbook.openai.com/examples/how_to_handle_rate_limits

Open-AI API fine tuning problems: https://medium.com/@abhishekmazumdar94/fine-tuning-an-open-ai-model-dc78e6ad5a07 THIS HAS BEEN DEEMED AS FEASIBLE. However it is time-consuming and I will get back to this after I have a viable first product. I have managed to generate augmented data.

This method has been successfully implemented on a smaller scale.
Sprint 5 Complete!, Use an LLM for text categorization

Important Resources: https://towardsdatascience.com/choosing-the-right-language-model-for-your-nlp-use-case-1288ef3c4929

SUCCESS WITH BERT Added Feature to automatically categorize any column for an excel sheet.
```
                 [ https://www.youtube.com/watch?v=IzbjGaYQB-U&ab_channel=PritishMishra](url)
```
Sprint 6, Improve performance on more complicated cases OR Implement Unsupervised learning

Important Resources: To be found

Method: To be determined

Possible Ideas: Augmented REAL data by using the same method as I used to generate augmented data. Reinforcement training of sorts with real data. Transfer learning?

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
code		code
generatedResponses		generatedResponses
sampleData		sampleData
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FRED_NLP

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FRED_NLP

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages