Skip to content
View ar2029's full-sized avatar

Block or report ar2029

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ar2029/README.md

Hi, I'm Prasoon Garg 👋

Senior Data Engineer · Azure · Microsoft Fabric · Databricks · PySpark · Delta Lake
5+ years building mission-critical data platforms at petabyte scale

LinkedIn Portfolio Open to Work Location


About Me

Senior Data Engineer with 5+ years architecting and shipping production-grade data infrastructure — real-time event streaming, ACID-compliant Lakehouse design, and ETL orchestration at petabyte scale.

Deep expertise in PySpark, Delta Lake, and distributed data processing across Azure Cloud, Microsoft Fabric, and Databricks. Delivered mission-critical platforms spanning Workplace Analytics, Insurance, and Finance — enabling C-suite decisions through unified data models, real-time KPI pipelines, and self-serve BI layers.


Tech Stack

Domain Technologies
Languages Python · SQL · PySpark · YAML · Shell/Bash · PowerShell
Data Engineering Apache Spark · Spark SQL · Hadoop · Hive · Delta Lake · Parquet · Avro
Cloud & Platform Microsoft Azure · Microsoft Fabric · Azure Databricks · Synapse Analytics · ADLS Gen2
Data Ingestion Azure Event Hubs · Apache Kafka · Azure Data Factory · REST APIs · Schema Registry
Lakehouse & Storage Delta Lake · Fabric Lakehouse · Snowflake · Azure Blob Storage · AWS S3
Databases SQL Server · Databricks SQL Warehouse · Cosmos DB · Azure SQL DB
DevOps & CI/CD Azure DevOps · Git · GitHub · IaC · PowerShell · Azure Repos
Analytics & BI Power BI · Tableau · ThoughtSpot · Palantir Foundry
Monitoring Azure App Insights · Log Analytics Workspace · OpenTelemetry

Featured Projects

Workplace Analytics Platform Scalable real-time analytics for workplace intelligence across bookings, assets, visitors and spaces. Built on Microsoft Fabric with Azure Event Hubs streaming and Delta Lakehouse medallion architecture.

Insurance Data and Reporting Platform End-to-end ETL framework for multi-layer data ingestion, validation and transformation. Built on Azure Databricks with Delta Lake, feeding analytics and ML workloads.


Key Repositories

Repo Description Tech
pyspark-flatten-file PySpark utility to flatten nested/complex file structures Python · PySpark
aws-lambda-scripts Lambda scripts to stop EC2 instances using boto3 Python · AWS
NLP-spam-detection Spam detection using NLP and NLTK Python · NLP
TextUtils Text manipulation web app with Django backend Python · Django
tut-pandas Pandas data manipulation practice notebooks Python · Pandas

Achievements

  • DBaaS Migration Appreciation — zero data loss delivery
  • Innovation Appreciation Award — improved pipeline efficiency
  • On The Spot Award — critical delivery under tight timelines

GitHub Stats

GitHub Stats Top Languages

GitHub Streak


Ask me about PySpark optimisation, Delta Lake architecture, or Azure data platform design.
LinkedIn · Portfolio

Pinned Loading

  1. aws-lambda-scripts aws-lambda-scripts Public

    simple aws scripts to stop instances in python boto3

    Python

  2. NLP-spam-detection NLP-spam-detection Public

    A spam detection example with NLP nltk package in python

    Jupyter Notebook

  3. prasoon-portfolio prasoon-portfolio Public

    TypeScript

  4. pyspark-flatten-file pyspark-flatten-file Public

    Python

  5. TextUtils TextUtils Public

    A website to manipulate text in Django backend

    Python

  6. tut-pandas tut-pandas Public

    Python