This project shows a simple way to prep data before using any machine learning model.
It’s meant to be easy to follow and learn from.
Here’s what the code does:
- Handles missing values
- Encodes categorical (non-numeric) data
- Splits the dataset into training and testing sets
- Scales numerical columns so big numbers don’t mess up the model
The file Data.csv has:
- Country (France, Spain, Germany)
- Age
- Salary
- Purchased (Yes / No)
- Put
Data.csvin the same folder as any of workflow file - Run the Python file
- After running, you’ll get:
x_trainandx_test: prepped featuresy_trainandy_test: encoded target labels
- Some steps, like scaling, aren’t needed for certain models (like Decision Trees or Random Forest).
- The code has comments explaining each step in a simple way.