Skip to content
View ankitgmishra's full-sized avatar
:octocat:
loosing sleep
:octocat:
loosing sleep

Block or report ankitgmishra

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ankitgmishra/README.md

Ankit Mishra

I build software and AI products from 0 → 1. I thrive in high-ownership environments where curiosity, speed, and execution matter.

Currently a Founding AI Engineer at DocuraHealth (YC W26). Previously a Machine Learning Engineer at Pibit.ai (YC W21) and Founding AI Engineer at AarogyaID. During college, I worked across 5 startups and an open-source organization as an AI & Software Engineer while also teaching Python.

My interests span Agentic AI, Software Engineering ,Backend Engineering, LLMs, Inference Engineering, Kernel Engineering, Computer Vision, and DevOps/MLOps. I enjoy understanding systems from first principles and taking ideas from research to production.

Attention Formula

The Mathematics That Bent The Trajectory of AI


Links

Pinned Loading

  1. flash-attention flash-attention Public

    A ground-up explanation of FlashAttention. We are going to build exactly why standard attention fails when sequences get long, and how FlashAttention uses smart memory tricks to fix it. We use simp…

    Jupyter Notebook

  2. int8 int8 Public

    LLM.int8() from First Principles. Most people understand that large language models have billions of parameters, but don't know exactly how we compress them to run on normal GPUs without destroying…

    Jupyter Notebook

  3. QTP-Quantization QTP-Quantization Public

    QTIP and QUIP Quantization from First Principles. Most people understand basic INT8 quantization, but don't know exactly why lower bit-widths like INT4 completely destroy model quality, and how we …

    Jupyter Notebook

  4. SnapKV SnapKV Public

    KV Cache is one of the most critical optimization techniques in modern Large Language Models. However, it also creates one of the biggest memory bottlenecks in AI inference.

    Jupyter Notebook