Skip to content
View shrvan30's full-sized avatar

Highlights

  • Pro

Block or report shrvan30

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shrvan30/README.md

πŸ‘‹ Hey, Shravan Upadhye Here

ENTC Undergrad Β |Β  ML & Systems Β |Β  Building at the edge of AI and Hardware Β |Β  Semiconductor AI Enthusiast

🌐 LinkedIn β€’ πŸ“§ Email


🧠 About Me

  • πŸŽ“ Undergrad in Electronics & Telecommunication Engineering at Pune Institute of Computer Technology (PICT) β€” curious, hardworking
  • πŸ’Ό Currently interning as a Software Development Intern @ DeepTek.ai β€” working at the intersection of medical AI, Transformer workflows, and scalable backend systems
  • ⚑ Passionate about GPU computing and AI systems β€” from writing low-level CUDA kernels to deploying end-to-end ML pipelines
  • 🎯 Driven by a long-term vision of becoming an AI Engineer in the semiconductor space β€” where hardware meets intelligence
  • 🏍️ Fitness enthusiast, bike rider, and occasional swimmer β€” I believe a strong body fuels a sharper mind

πŸ”— Socials


πŸ–₯️ Tech Stack

βš™οΈ Languages

⚑ GPU Computing

πŸ€– ML & AI

πŸ—„οΈ Databases

🌐 Frontend

πŸ› οΈ Frameworks & Tools

☁️ Cloud & DevOps


πŸš€ Featured Projects

CUDA Β· C++ Β· Parallel Computing Β· GPU Architecture

  • Engineered a FlashAttention-style GPU kernel with shared-memory tiling, online softmax, and fused attention to minimize HBM memory movement
  • Achieved 254Γ— over CPU baseline and 70.69Γ— over simple GPU baseline, reaching 303 GFLOPs/s on NVIDIA RTX 3090
  • Applied kernel fusion, warp-synchronous computation, and SRAM reuse β€” avoiding NΓ—N intermediate memory materialization

Python Β· FastAPI Β· FAISS Β· BM25 Β· Whisper Β· Docker Β· PostgreSQL

  • Distributed RAG system converting video into a searchable knowledge base via Whisper transcription, semantic chunking, and hybrid FAISS+BM25 retrieval with CrossEncoder re-ranking
  • LLM-based Q&A (llama.cpp / Phi-3), timestamp-level retrieval, Redis caching, and PostgreSQL metadata store

Python Β· OpenCV Β· TensorFlow Lite Β· MediaPipe Β· Flutter Β· Firebase

  • Real-time workout evaluation system achieving ~95% accuracy in posture detection and rep counting
  • Optimized TFLite inference reducing latency by 40% for edge deployment; full-stack with Flutter + Firebase

πŸ“œ Certifications


πŸ“Š GitHub Stats


"Transforming attention from a memory-bound workload into a compute-efficient kernel β€” one CUDA thread at a time."

Popular repositories Loading

  1. vidrag vidrag Public

    Distributed Video Retrieval-Augmented Generation (RAG) system using FastAPI, FAISS, Whisper, and LLMs

    Python 1

  2. flash-attention-cuda flash-attention-cuda Public

    FlashAttention-style CUDA implementation with shared-memory tiling, online softmax fusion, IO-aware optimization, and GPU benchmarking.

    Cuda 1

  3. Nutrient-Management-Optimization Nutrient-Management-Optimization Public

    This is the Nutrient Management Optimization which exacts the data like soil color , pH value of soil , NPK values , Temperature , Rainfall and guide to use the best fertilizer and its market value

    Python

  4. The_personality-based_book_recommendation_system The_personality-based_book_recommendation_system Public

    A personality-based book recommendation system that classifies users into 16 personality types through a short quiz and suggests books tailored to their traits, making reading more engaging and per…

    Jupyter Notebook

  5. Stock_Market_Trend_Analyzer_using_Moving_Averages_and_Machine_Learning Stock_Market_Trend_Analyzer_using_Moving_Averages_and_Machine_Learning Public

    Fetches 20 years of stock data by default and allows users to modify the date range interactively, enabling flexible analysis of long-term or short-term market trends using Streamlit UI

    Jupyter Notebook

  6. DBMS_25-26 DBMS_25-26 Public

    Python