Skip to content
View clyv's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report clyv

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
clyv/ReadMe.md

Hi, I'm Clivin John Geju 👋

Data Scientist & ML Engineer — M.S. Data Science, UT Arlington (2025) · Dubai, UAE

I build end-to-end data pipelines, ML models, and computer vision systems. Currently an ML Engineer Intern at Dezzex Technologies, working on real-time object detection for live CCTV video analytics pipelines using YOLO-based architectures.

LinkedIn Portfolio Email


What I Work On

  • Data Engineering — end-to-end pipelines, data quality contracts (Great Expectations, PyDeequ/AWS Deequ), CI/CD automation
  • Machine Learning — predictive modeling, consumer segmentation, time-series forecasting (95% accuracy on ERCOT grid data)
  • Computer Vision — YOLOv8 object detection, real-time video analytics, image detection for surveillance pipelines
  • Analytics & BI — Power BI, Tableau, Looker dashboards; reduced stakeholder insight time by 30% at Cardinality AI
  • Cloud & Scale — AWS (Glue, S3, EMR), GCP (BigQuery, Dataflow), PySpark on billion-row datasets

💻 Tech Stack

Languages Python SQL R Scala C C++

ML & AI PyTorch TensorFlow scikit-learn OpenCV NumPy Pandas

Data Engineering Apache Spark Apache Kafka Apache Airflow dbt

Cloud & Infra AWS Google Cloud Azure Docker

Visualization & BI Power BI Tableau Matplotlib Plotly


📌 Featured Projects

Medicare GX Data Contracts · Python Great Expectations PostgreSQL Docker GitHub Actions

Data quality contracts on the CMS Medicare 2023 dataset (10M+ provider records). 20+ typed expectations across 7 quality dimensions. CI/CD pipeline auto-generates browsable GX Data Docs HTML reports on every push.

Data Quality Validation Pipeline · PySpark PyDeequ AWS Deequ Python

Automated validation pipeline on 3–4M rows/month of NYC Yellow Taxi records. 12 constraint checks, column profiling, drift detection. Documented AWS Glue migration path for billion-row scale.

ERCOT Grid Analytics Dashboard · Python Tableau Time-Series Analysis

Aggregated 10+ years of ERCOT API data (~100K data points). Time-series models achieved 95% forecast accuracy on energy production and grid stability patterns.


✍️ Dev Quote


Popular repositories Loading

  1. Emed Emed Public

    The main goal of this project is to make an automated system to carry out different operations of a medical store. The system will provide the ease, comfort of use to the customer buying medical su…

    PHP

  2. VirQueue VirQueue Public

    Forked from Chris-george-anil/VirQueue

    HTML

  3. Coronavirus-Probability-Checker Coronavirus-Probability-Checker Public

    Forked from Siddhant-K-code/Coronavirus-Probability-Checker

    It is a Probability Checker for COVID-19 , people can input values and symptoms and accr. to the data , patient will get the probability of +ve COVID-19.

    Jupyter Notebook

  4. fullstack-course4 fullstack-course4 Public

    Forked from jhu-ep-coursera/fullstack-course4

    Example code for HTML, CSS, and Javascript for Web Developers Coursera Course

    JavaScript

  5. coursera-test coursera-test Public

    Coursera test Repository

    HTML

  6. PDCFinal PDCFinal Public

    C++