Titanic Survival Prediction (Dockerized App) This project is a Python-based data science application that predicts passenger survival on the Titanic using Machine Learning. It is fully containerized using Docker to ensure a consistent environment for training and evaluation.
๐ Features Data Processing: Cleans the Titanic dataset, handles missing values, and encodes categorical features (e.g., Sex).
Machine Learning: Trains a LogisticRegression model to predict survival outcomes.
Data Visualization: Automatically generates visual insights:
survival.png: A bar chart showing survival counts.
age.png: A histogram representing the age distribution of passengers.
Containerization: Environment-agnostic execution using Docker.
๐ Tech Stack Language: Python 3.11
Data Analysis: Pandas, Scikit-learn
Visualization: Matplotlib
DevOps: Docker, GitHub Actions
๐ Getting Started Prerequisites Docker installed on your machine.
(Optional) Git to clone the repository.
Local Setup & Execution Build the Docker Image:
Bash docker build -t titanic-app . Run the Container:
Bash docker run --name titanic-container titanic-app View Results: The model accuracy will be printed in your terminal. To view the generated graphs, copy them from the container to your local machine:
Bash docker cp titanic-container:/app/survival.png ./survival.png docker cp titanic-container:/app/age.png ./age.png ๐ค GitHub Deployment (CI/CD) This project is configured to run automatically via GitHub Actions. Every time you push code to the repository, GitHub will:
Initialize an Ubuntu runner.
Build the Docker image.
Run the container to verify the code and model training.
๐ File Structure app.py: The main script containing data cleaning, model training, and plotting logic.
Dockerfile: Configuration for creating the Docker image.
requirements.txt: Python dependencies (pandas, matplotlib, scikit-learn).
titanic.csv: The dataset used for training and testing.
๐ก Project Insights The model uses features like Passenger Class (Pclass), Sex, and Age to determine survival probability. By using a Dockerized approach, we eliminate the "it works on my machine" problem, making the analysis reproducible anywhere.
Developed as part of a Containerization & Docker learning journey.