Senior Data Engineer · Azure · Microsoft Fabric · Databricks · PySpark · Delta Lake
5+ years building mission-critical data platforms at petabyte scale
Senior Data Engineer with 5+ years architecting and shipping production-grade data infrastructure — real-time event streaming, ACID-compliant Lakehouse design, and ETL orchestration at petabyte scale.
Deep expertise in PySpark, Delta Lake, and distributed data processing across Azure Cloud, Microsoft Fabric, and Databricks. Delivered mission-critical platforms spanning Workplace Analytics, Insurance, and Finance — enabling C-suite decisions through unified data models, real-time KPI pipelines, and self-serve BI layers.
| Domain | Technologies |
|---|---|
| Languages | Python · SQL · PySpark · YAML · Shell/Bash · PowerShell |
| Data Engineering | Apache Spark · Spark SQL · Hadoop · Hive · Delta Lake · Parquet · Avro |
| Cloud & Platform | Microsoft Azure · Microsoft Fabric · Azure Databricks · Synapse Analytics · ADLS Gen2 |
| Data Ingestion | Azure Event Hubs · Apache Kafka · Azure Data Factory · REST APIs · Schema Registry |
| Lakehouse & Storage | Delta Lake · Fabric Lakehouse · Snowflake · Azure Blob Storage · AWS S3 |
| Databases | SQL Server · Databricks SQL Warehouse · Cosmos DB · Azure SQL DB |
| DevOps & CI/CD | Azure DevOps · Git · GitHub · IaC · PowerShell · Azure Repos |
| Analytics & BI | Power BI · Tableau · ThoughtSpot · Palantir Foundry |
| Monitoring | Azure App Insights · Log Analytics Workspace · OpenTelemetry |
Workplace Analytics Platform Scalable real-time analytics for workplace intelligence across bookings, assets, visitors and spaces. Built on Microsoft Fabric with Azure Event Hubs streaming and Delta Lakehouse medallion architecture.
Insurance Data and Reporting Platform End-to-end ETL framework for multi-layer data ingestion, validation and transformation. Built on Azure Databricks with Delta Lake, feeding analytics and ML workloads.
| Repo | Description | Tech |
|---|---|---|
| pyspark-flatten-file | PySpark utility to flatten nested/complex file structures | Python · PySpark |
| aws-lambda-scripts | Lambda scripts to stop EC2 instances using boto3 | Python · AWS |
| NLP-spam-detection | Spam detection using NLP and NLTK | Python · NLP |
| TextUtils | Text manipulation web app with Django backend | Python · Django |
| tut-pandas | Pandas data manipulation practice notebooks | Python · Pandas |
- DBaaS Migration Appreciation — zero data loss delivery
- Innovation Appreciation Award — improved pipeline efficiency
- On The Spot Award — critical delivery under tight timelines
Ask me about PySpark optimisation, Delta Lake architecture, or Azure data platform design.
LinkedIn · Portfolio
