Skip to content

Nduka99/AppliedAI

Repository files navigation

Applied AI Portfolio by Nduka Nwagbo

Welcome to my Applied AI repository. This collection of projects demonstrates my approach to solving complex machine learning challenges across different domains—from time-series forecasting of physiological data to computer vision and natural language processing.

Each project prioritizes rigorous data handling, thoughtful feature engineering, and algorithm optimization over plug-and-play solutions.


1. A Rigorous Machine Learning Approach to Heart-Rate Forecasting

Notebook: heartrate_forecast.ipynb

Overview

This project tackles the difficult task of forecasting a patient's heart rate 20 minutes ahead using only 226 minutes of wearable-sensor data. The raw data presented significant ethical and technical challenges, including severe sensor errors and missing oximeter readings.

Methodology & Results

*Data Cleaning: Impossible values were replaced with NaNs and backfilled using oximeter pulse data as a biologically consistent proxy, while remaining gaps were filled with a 5-minute cubic spline interpolation. *Feature Engineering: Constructed 36 features across 9 clinically driven groups, eventually using Mutual Information to select the top 12 features for the K-Nearest Neighbor (KNN) model. *Optimization: The most impactful optimization was restricting the KNN model's memory to a localized window of the most recent 45 minutes of patient history, preventing older, differing physiological states from contaminating the prediction. *Performance: Achieved a highly accurate, physiologically plausible forecast with an RMSE of 5.36.


2. The Cat vs Dog Machine Learning Classification Challenge

Notebook: imgprocessing.ipynb

Overview

Can traditional machine learning models tell a cat from a dog without relying on automated CNNs? Using a dataset of 10,000 evenly split images, I set a strict target of achieving at least 80% accuracy and F1 score.

Methodology & Results

*Feature Engineering: Generated 10,221 hardcoded features spanning 8-dimensional families, including HOG for shape silhouettes, LBP and Gabor for fur texture, and Hu moments for pose-invariant geometry. *The PCA Bottleneck: Initial model baselines failed to reach the 80% mark. An investigation revealed that applying PCA compression rotated the feature space, which fundamentally conflicted with the axis-aligned splits required by tree-based algorithms. *Optimization: Discarding PCA and feeding the raw high-dimensional features directly into the tree-based models drastically improved performance. *Performance: The improved XGBoost model exceeded the project goal, delivering an accuracy of 0.8033 and an F1 Macro of 0.8032.


3. Optimizing Sentiment Analysis for Offensive Social Media Language

Notebooks: tweet_classification.ipynb and Deep_Learning_tweet_sentiment_analysis.ipynb

Overview

This natural language processing challenge involved classifying 13,240 tweets into three categories: Not Offensive (NOT), Targeted Insult (TIN), and Untargeted Insult (UNT). The primary hurdle was extreme class imbalance, as the UNT category contained only 524 instances.

Methodology & Results

*Preprocessing & Feature Extraction: Utilized a combination of de-censoring, wordnet lemmatization, and entity-abstraction to restore masked profanity and create a stable vocabulary. *Matrix Construction: The most effective approach was a "kitchen-sink" matrix of 278 dimensions, combining TF-IDF, GloVe embeddings, and 28 specifically engineered signals. *Two-Stage Pipeline: Reframed the classification task into a two-stage binary pipeline to isolate the difficult UNT class. An XGBoost "gatekeeper" first separated NOT tweets from Offensive tweets. Then, a specialized CatBoost classifier split TIN from UNT using targeted class-weight optimization. *Performance: The final cascade model achieved an accuracy of 0.74 and an F1 Macro of 0.57.

*Deep Learning Cascade Architecture: Upgraded the pipeline to a hierarchical deep learning model to better capture semantic nuance. A RoBERTaTwitter "Gatekeeper" classifier handles the initial NOT vs. OFFENSIVE split, while a specialized SupConTweetClassifier (leveraging Supervised Contrastive Learning and BERTweet) strictly handles the OFFENSIVE subset to isolate the difficult UNT class from TIN.

  • Advanced Optimization: The deep learning pipeline incorporates rigorous threshold optimization (tuning the specialist threshold to 0.80) and explores hybrid architectures that concatenate the 28 engineered features directly with the BERT embeddings to maximize the final F1-Macro score. Final deep learning cascade model achieved an accuracy of 0.81 and an F1 Macro of 0.73

About

NLP, Time_Series and Image Processing Challenges

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors