Skip to content
View Sam-24-dev's full-sized avatar

Block or report Sam-24-dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sam-24-dev/README.md
Samir Caizapasto - Junior Data Engineer and Data Analyst Value Proposition

Junior Data Engineer & Data Analyst | Building Reliable Pipelines + BI-Ready Data Products

I turn messy, multi-source data into validated pipelines, analytics-ready models, and decision-focused dashboards.

Proof of impact: 133+ automated tests · 1.2M+ records processed · up to 40% faster SQL workloads · $16.66K opportunity identified

📍 Guayaquil, Ecuador · Open to Trainee / Junior Data Engineer and Data Analyst / BI Analyst roles · Remote / Hybrid LATAM-US


⚡ Recruiter Snapshot

Area Signal
Target roles Trainee / Junior Data Engineer · Data Analyst / BI Analyst
Main value I build reliable data pipelines and turn them into BI-ready decision products
Engineering proof 133+ tests · Pandera validation · DuckDB/dbt · CI/CD · scheduled workflows
Analytics proof Power BI dashboards · DAX · KPI modeling · revenue gap analysis
Business impact $16.66K opportunity identified · 1.2M+ records processed · up to 40% faster SQL workloads
Availability Remote / Hybrid · LATAM / US-friendly teams

🧭 Dual Track Positioning

🛠️ Data Engineering Track

I build reproducible data systems with validation, testing, and automated delivery.

  • ETL/ELT pipeline development.
  • Data contracts and quality gates with Pandera.
  • DuckDB/dbt analytical transformation layers.
  • CI/CD validation and scheduled refresh workflows.
  • Versioned artifacts and runbooks for maintainable delivery.

🔗 Best-fit projects:

📊 Data Analysis / BI Track

I translate data into KPI models, dashboards, and business recommendations.

  • KPI definition and business metric modeling.
  • Power BI dashboards for executive reporting.
  • DAX measures and storytelling layouts.
  • Customer, seller and market segmentation.
  • Insight-to-action recommendations for stakeholders.

🔗 Best-fit projects:


📌 Impact Metrics

Metric Proof of Impact
133+ automated tests Production-style data platforms with CI quality gates
Up to 40% faster queries SQL tuning and indexing for analytical workloads
💰 $16.66K performance gap identified BI analysis for business decision prioritization
📦 1.2M+ records processed Dynamic pricing pipeline with explainable ML artifacts
🧪 126 tests in eSports project Reliable ETL + analytics delivery workflow

🧪 Portfolio Quality Index

Automated audit of flagship projects (CI/CD, tests, docs, demos, and latest updates).

Project CI/CD Tests Docs Demo Last update Signal
RideFare 2026-05-11 CLI pipeline · Pandera · DuckDB/dbt · ML artifacts
Technology Trends 2026-05-25 Multi-source ETL · data contracts · scheduled refresh
Customer Profile 2026-04-11 Python preprocessing · Power BI semantic model
Grocery Sales BI Revenue gap analysis · KPI modeling

🚀 Featured Projects

Self-directed Data Engineering · ML · Analytics Product

End-to-end pipeline modernization with reproducible commands, validation, ML artifacts and public delivery.

RideFare preview
  • Problem: Ride pricing analysis started as a notebook-style workflow with limited reproducibility and weak delivery structure.
  • Built: CLI-based pipeline stages for ingestion, transformation, training and web export (ridefare ingest, transform, train, export-web).
  • Engineering work: Pandera validation, DuckDB/dbt transformations, XGBoost + SHAP explainability, CI checks, deterministic JSON exports, preview/prod deploy pipelines and release automation.
  • Impact: Production-style pricing intelligence platform processing 1.2M+ records with public demo routes and transparent ML outputs.
  • Stack: Python, Pandera, DuckDB, dbt, XGBoost, SHAP, GitHub Actions, Next.js.
   

Self-directed Data Engineering · Analytics Engineering

Multi-source ETL platform with data contracts, CI/CD, DuckDB analytics and dashboard delivery.

Technology Trends preview
  • Problem: Developer trend signals are fragmented across GitHub, StackOverflow and Reddit.
  • Built: Unified analytics platform for ingesting, validating, transforming and exposing trend metrics.
  • Engineering work: Python ETL, Pandera data contracts, DuckDB analytical transformations, CI/CD validation, scheduled refresh workflows and frontend-ready outputs.
  • Impact: 133+ passing tests, automated quality gates and public dashboard for technology ranking and monitoring.
  • Stack: Python, Pandera, DuckDB, GitHub Actions, Flutter Web, APIs.
   

Bootcamp-backed BI Case · Independently polished for portfolio delivery

Customer segmentation, KPI storytelling and stakeholder-ready Power BI reporting.

Customer Profile preview
  • Problem: Business stakeholders needed a clearer view of customer value, premium spending behavior and segment-level opportunities.
  • Built: Reproducible workflow from raw dataset to Python preprocessing, clean CSV and Power BI dashboard.
  • Analytics work: DAX measures, customer segmentation, desktop/mobile layouts and executive narrative from context to insight to action.
  • Impact: Decision-focused dashboard supporting campaign planning through clear KPI storytelling and premium-spend segment discovery.
  • Stack: Power BI, DAX, Python, data cleaning, KPI modeling, dashboard storytelling.
 

🛒 Grocery Sales BI Dashboard (Analytical Case)

BI / Revenue Opportunity Analysis

Commercial KPI analysis with measurable revenue gap identification.

  • Problem: Seller performance varied significantly across markets and categories, reducing total revenue potential.
  • Built: Power BI dashboard with seller segmentation, revenue comparison, category analysis and market-level performance views.
  • Analytics work: DAX measures, KPI modeling, seller contribution analysis, variance analysis and commercial prioritization.
  • Impact: Identified a $16.66K seller performance gap, surfaced a top revenue category worth $80.05K, analyzed 23 active sellers and highlighted Tulsa as the strongest market.
  • Stack: Power BI, DAX, Excel, KPI analysis, revenue analytics.

📁 Additional Projects

Hybrid Data Engineering + Analytics delivery for the LATAM eSports ecosystem.

eSports Analytics preview
  • Built a full pipeline: MySQL → Python ETL → validated JSON contracts → web dashboard.
  • Integrated Random Forest projections (2026) to combine descriptive and predictive analytics.
  • Delivered reliable outputs with 126 automated tests and CI-driven deployment.
  • Consolidated visibility across teams, players, competitions, and prize performance.
 

Operational analytics platform combining ETL outputs, KPI views and strategic recovery modeling.

  • Engineered pipeline: MySQL → Python ETL → JSON outputs → 5-view web dashboard.
  • Modeled strategic recovery from -5.58% ROI to +15% target (+20.6 pts).
  • Projected +75% productivity uplift with KPI-driven operational analysis.
  • Delivered reproducible implementation backed by automated ETL tests.
 

Statistical modeling case packaged into dashboard-ready JSON/PNG outputs and a lightweight web report.

  • Validated a Negative Binomial model with goodness-of-fit acceptance (p = 0.6603).
  • Processed 309 observations and confirmed mean serve time under 2 seconds (1.945s).
  • Automated JSON/PNG exports from R pipeline for dashboard-ready delivery.
  • Improved interpretability by packaging statistical outputs into a lightweight web report.
 

48-hour full-stack + applied analytics MVP recognized as a NASA Space Apps Global Nominee.

  • Built MVP in 48 hours during NASA Space Apps Challenge.
  • Processed 10 years of climate-related data for 195+ countries.
  • Delivered interactive map workflows with <2s response time for user exploration.
  • Recognized as Galactic Problem Solver (Global Nominee).
 

🛠️ Technical Stack

Category Technologies
💻 Languages Python R SQL TypeScript Dart
⚙️ Data Engineering & DBs DuckDB MySQL SQLite Pandas Jupyter
🤖 Machine Learning Scikit-Learn
🧪 Testing & Quality Pytest Pandera
📊 Visualization & BI Power BI Tableau Plotly Excel
🌐 Web & Mobile React Flutter Flask Tailwind CSS Vite Bootstrap Leaflet
🚀 DevOps & Cloud GitHub Actions Vercel Git
📚 Learning AWS dbt

🏆 Certifications & Awards

🎖️ Certification / Award 🏢 Issuer 📅 Status / Date 🔗 Link
📗 Microsoft Office Specialist: Excel Associate (Microsoft 365 Apps) Microsoft Issued: Mar 2026 📄 Credential
📊 Microsoft Certified: Power BI Data Analyst Associate (PL-300) Microsoft Credential verified 📄 Credential
📊 Data Analyst Associate DataCamp Issued: Mar 2026 📄 Credential
🛠️ ETL y ELT en Python DataCamp Issued: Mar 2026 📄 Credential
🌍 Galactic Problem Solver — Global Nominee NASA Space Apps Challenge Oct 2025 📄 View
🤖 Curso de IA: De 0 a Agentes BIG school Issued: Mar 2026 📜 Credential
📊 Data-Driven Decision Specialist (Bootcamp) ESPOL & MINTEL Credential verified 📄 Credential
⭐ Top Project

🌎 Spoken Languages

      
Actively preparing for C1 certification

👋 About Me

Junior Data Engineer & Data Analyst | Computer Engineering Student (ESPOL, 8th semester)

I’m a Computer Engineering student at ESPOL, currently in my 8th semester, building a dual-track data career across Data Engineering and Business Intelligence. My strongest focus is designing reliable, reproducible data pipelines with validation, testing and automated delivery, while also translating analytical outputs into dashboards and business recommendations.

What I bring

  • Self-directed Data Engineering projects with ETL/ELT workflows, Pandera validation, automated testing and CI/CD quality gates.
  • Analytics engineering foundations with SQL, DuckDB/dbt concepts, transformation layers and query optimization.
  • BI/Data Analysis delivery through Power BI, DAX, KPI modeling, customer segmentation and revenue opportunity analysis.
  • Portfolio-grade data products that connect raw data, validated artifacts, dashboards and clear stakeholder narratives.

🔭 Current Focus

Focus Area Current Work
📊 Applied BI delivery Strengthening Power BI modeling, DAX and business storytelling through stakeholder-ready dashboard projects.
☁️ Cloud + dbt learning path Building stronger foundations in modern data stack practices, transformation workflows and analytics engineering standards.
🧩 Portfolio refinement Improving project narratives, measurable impact and recruiter-facing positioning for Junior Data Engineer / Data Analyst opportunities.

📊 GitHub Activity, WakaTime & Contribution Snake

GitHub Stats

Weekly Coding Activity

Real-time stats powered by WakaTime — tracking every line of code I write.


WakaTime Stats

Contribution Trend

Contribution Snake

github contribution grid snake animation

🤝 Let’s Connect

Open to Junior Data Engineer / Data Analyst roles (remote/hybrid, LATAM/US).

I’m ready to contribute from day one in data pipeline automation, analytics engineering, and decision-focused BI.

Profile Views

Pinned Loading

  1. Technology-trend-analysis-platform Technology-trend-analysis-platform Public

    Data intelligence platform for technology trends across GitHub, StackOverflow, and Reddit using Python ETL, Pandera quality gates, DuckDB trend engine, and Flutter Web.

    Dart

  2. Analisis-Ping-Pong Analisis-Ping-Pong Public

    Automated statistical analysis pipeline using R to model ping pong serve precision with Negative Binomial distribution (309 observations). Includes interactive web dashboard.

    HTML 1

  3. Analisis-Cultivo-Arroz Analisis-Cultivo-Arroz Public

    End-to-end data engineering platform for agricultural analytics. ETL pipeline (Python) + Interactive dashboard (Chart.js) with KPIs, financial analysis, and strategic insights.

    HTML

  4. easyparker-pwa easyparker-pwa Public

    EasyParker es una PWA para reservar parqueo en Guayaquil | Modos: Conductor y Anfitrión | Chat tiempo real | Eventos con surge pricing | Calificaciones etc| React + TypeScript + Tailwind

    TypeScript

  5. eSports-Analytics-Dashboard eSports-Analytics-Dashboard Public

    Dashboard analítico end-to-end para eSports LATAM con ETL en Python, validación de datos, visualización web y proyección ML 2026.

    Python

  6. RideFare-ETL-Pipeline RideFare-ETL-Pipeline Public

    Portfolio-grade pricing intelligence product for urban mobility, built with DuckDB, dbt, XGBoost, Next.js, and Vercel.

    Jupyter Notebook