Medicare Drug Dashboards

Interactive Streamlit app for exploring Medicare Part D prescribing patterns and Medicare Part B drug spending.

The app has four pages, accessible from a top navigation bar:

Med D Drugs Dashboard — aggregate analysis of pharmacy-dispensed drug costs, claims, and specialty patterns
Med B Drugs Dashboard — annual Medicare Part B (clinician-administered) drug spending by HCPCS code, brand, and generic
Med B Drugs by State & Provider Specialty — state-aware Medicare Part B drug dashboard sourced from CMS Physician PUF rendering-provider data
Provider Search — drill down to individual Part D prescribers or Part B rendering providers by city, radius, or name

The shared Ask AI assistant can query the aggregate Part D, Part B, and Physician PUF dashboard rollups. Individual provider-level radius/name searches remain in the Provider Search page controls.

The UI uses a shared healthcare analytics design system with top navigation, light page headers, soft chart cards, modern table styling, and a prominent Ask AI command area.

Data is hosted on Hugging Face and downloaded once per container into a local cache, so cold starts pay a small one-time cost and subsequent reads are local-disk speed.

Features

Med D Drugs Dashboard

Global filters for year, state, and specialty, plus a Brand / Generic grouping toggle
Total drug cost and total claims summary cards
Part D cost totals are gross CMS Tot_Drug_Cst values from the provider-drug detail file; they are before rebates and are not Medicare net program spending
Top drugs by year (brand or generic), with bar / treemap toggle
Top specialties by year
Yearly trend charts for selected drugs
Modern chart-detail tables are collapsed behind focused "View detailed rows" expanders to reduce dashboard clutter
US state choropleth with a Total Cost / Per Capita Cost toggle (Census Bureau Vintage 2023 population estimates)
Ask AI chatbot (Claude Haiku 4.5) that writes DuckDB SQL across the shared dashboard rollups and explains results in plain English

Med B Drugs Dashboard

Segmented year selector and Brand / Generic grouping toggle (defaults to brand)
Four metric cards: total spend, total claims, distinct drugs, and first→last-year spend growth
Total yearly spending chart with year-over-year change callouts
Top drugs by spend (Bar / Treemap toggle) with per-section Top N control (5 / 10 / 20)
Top specialties by spend (Bar / Treemap toggle), built from CMS Physician PUF drug-HCPCS rows
Yearly trend chart for selected drugs
Modern HCPCS-level drill-down table (one row per HCPCS code × brand × year) with formatted metrics and spend-share context
Indigo header banner and multi-hue chart palettes that keep the page visually distinct from Part D

Med B Drugs by State & Provider Specialty

2021-2023 Physician PUF drug-HCPCS dashboard with segmented year selector, state filter, and Brand / Generic grouping
Same metric-card and chart structure as the national Med B dashboard, using Total Services instead of Total Claims
Total yearly spending, top drugs, top rendering specialties, yearly trends, and HCPCS-level drill-down filtered to selected states
Modern detail tables use consistent formatting, summary chips, and spend-share context across dashboards
Caveats called out in-app: totals exclude facility-billed administrations and beneficiary counts are summed across rendering providers

Provider Search

Part D / Part B toggle at the top of the page
City + Radius (5/10/25/50/100 mi) + Drug → top prescribers within a centroid-distance radius
Provider Name → top drugs for matching prescribers or rendering providers
Hover help explains the centroid-based, all-or-nothing-per-city radius behavior
Part B drug input accepts brand (e.g. Keytruda), generic (e.g. pembrolizumab), HCPCS code (e.g. J9271), or HCPCS description; ranked result tables show Brand Name and Generic Name columns. Brand/Generic mapping is joined at query time from part_b_drug_spending.parquet.

Data Architecture

Source of truth for all production data is the public Hugging Face dataset mbateya/medicare_part_d_prescribers. The app fetches files via huggingface_hub.hf_hub_download, which caches under ~/.cache/huggingface/hub/, so each file is downloaded at most once per container.

File on HF	Size	Purpose
`processed/medicare_partd_2021_2023.parquet`	69 MB	Aggregated rows used by the Med D dashboard and Ask AI's Part D DuckDB view
`processed/medicare_partd_top_providers_by_drug_2021_2023.parquet`	14 MB	Top-providers-by-drug summary used by the Dashboard
`prescribers/year=YYYY/State=XX/data_0.parquet`	partitioned ~78M rows	Per-prescriber rows used by Provider Search Part D mode
`partb_prescribers/year=YYYY/State=XX/data_0.parquet`	partitioned ~1.5M rows	Per-rendering-provider Part B rows (drug HCPCS only); built from CMS Physician PUF; powers Provider Search Part B mode. Note: "Prescriber NPI/Name" columns are kept for partition-naming parallelism with Part D, but the values are the rendering clinician (who administered the drug and billed Medicare), not necessarily the ordering clinician. The UI surfaces these as "Rendering Provider" / "Rendering NPI".
`partb_drug_by_specialty.parquet`	~50 KB	Year × Specialty × HCPCS rollup for the Med B Drugs Dashboard's Top Specialties chart
`partb_drug_by_state.parquet`	~452 KB	Year × State × HCPCS rollup with Brand/Generic joined; powers Med B Drugs by State & Provider Specialty
`partb_drug_by_state_specialty.parquet`	~1.4 MB	Year × State × Specialty × HCPCS rollup for the Med B Drugs by State & Provider Specialty Top Specialties chart
`cities.parquet`	403 KB	(State, City) → (Latitude, Longitude) lookup for radius search; built from GeoNames
`state_population.parquet`	4.5 KB	Census Bureau Vintage 2023 state populations for per-capita map
`drug_atc.parquet`	82 KB	Generic Name → WHO ATC Levels 1-4 codes/names; built from NLM RxNav
`part_b_drug_spending.parquet`	229 KB	Annual Medicare Part B drug spending by HCPCS code, brand, generic (long format, 2019-2023)

Local-only artifacts (gitignored):

data/processed/*.parquet — optional dev cache; if present, the app prefers these over HF downloads
MUP_DPR_*.csv — raw CMS files used only by the offline build scripts
hf_staging/ — output of build scripts before upload
~/.cache/huggingface/hub/ — automatic HF download cache

Source-controlled data:

data/drug_atc_overrides.csv — manual ATC assignments for drugs the algorithmic pipeline can't resolve (TPN solutions, vaccines without ATC links, complex combos, etc.)

Setup

Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Configure secrets at .streamlit/secrets.toml (gitignored):

ANTHROPIC_API_KEY = "sk-ant-..."   # required for the Ask AI chatbot
HF_TOKEN          = "hf_..."        # required only for build scripts that upload to HF

Run the app:

streamlit run app.py

The first page load downloads ~83 MB of dashboard parquets from HF into the local cache. Provider Search downloads each state-year partition lazily on demand (~5-50 MB per state).

Project Structure

.
├── app.py                          # Streamlit entry point; declares pages + nav
├── Med_D_dashboard.py              # Med D Drugs Dashboard page
├── dashboard_ai.py                 # shared Ask AI dialog + DuckDB rollup views
├── dashboard_design.py             # shared visual system: headers, cards, chart shells
├── dashboard_tables.py             # shared modern table rendering + column formatting helpers
├── build_provider_summary.py       # offline ETL: top-providers-by-drug rollup
├── pages/
│   ├── 1_Provider_Search.py        # Provider Search page
│   ├── 2_Part_B_Drugs.py           # Med B Drugs Dashboard page
│   └── 3_Med_B_Drugs_State.py      # Med B Drugs by State & Provider Specialty page
├── data/
│   ├── drug_atc_overrides.csv      # tracked manual ATC overrides
│   └── processed/                  # gitignored local parquet cache (optional)
├── scripts/
│   ├── build_hf_prescriber_dataset.py   # builds the HF prescribers/year=…/State=… partitions
│   ├── upload_processed_to_hf.py        # uploads aggregated dashboard parquets to HF
│   ├── build_geocoded_cities.py         # builds cities.parquet from GeoNames
│   ├── build_state_population.py        # builds state_population.parquet from Census Vintage 2023
│   ├── build_drug_atc.py                # builds drug_atc.parquet via NLM RxNav + overrides
│   ├── build_part_b_drug_spending.py    # builds part_b_drug_spending.parquet from CMS Part B
│   └── build_partb_prescriber_dataset.py # builds partb_prescribers/ + Part B Physician PUF rollups
├── requirements.txt
└── README.md

Streamlit Cloud Deployment

The deployed app entry point is app.py. Streamlit Cloud's app settings should have:

Main file path: app.py
Python version: 3.12 (or any 3.11+ that satisfies requirements.txt)
Secrets: paste the contents of your local .streamlit/secrets.toml into the Secrets tab

Rebuilding the HF dataset

The Hugging Face artifacts only change when the underlying data changes (annual CMS release, new provider-level partitions, or refreshed lookups). The relevant scripts are:

Script	When to run
`scripts/build_hf_prescriber_dataset.py`	New year of CMS Part D Prescriber data
`build_provider_summary.py`	Same trigger; produces the top-providers-by-drug rollup
`scripts/upload_processed_to_hf.py`	After regenerating the rolled-up dashboard parquets locally
`scripts/build_geocoded_cities.py`	Rare — only if GeoNames updates significantly
`scripts/build_state_population.py`	Annually, when Census releases new population estimates
`scripts/build_drug_atc.py`	After adding new entries to `data/drug_atc_overrides.csv`, or annually for new drug approvals
`scripts/build_part_b_drug_spending.py`	Annually, when CMS releases the new run-year Part B Drug Spending file (around April-May). Update `CMS_URL` in the script to the new run-year URL.
`scripts/build_partb_prescriber_dataset.py`	Annually, when CMS releases new run-year Physician & Other Practitioners (Provider × Service) files. Downloads ~6 GB of CSVs, filters to drug HCPCS, partitions per state-year, and uploads provider partitions plus Year×Specialty×HCPCS, Year×State×HCPCS, and Year×State×Specialty×HCPCS rollups. Use `--rollup-only` to rebuild/upload rollups from existing local partitions. Update `CMS_PUF_URLS` in the script when CMS publishes new files.

All scripts read HF_TOKEN from the environment or .streamlit/secrets.toml. The drug-ATC builder caches every RxNav response under hf_staging/rxnav_cache/, so iterative re-runs after editing the override CSV are nearly instant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medicare Drug Dashboards

Features

Med D Drugs Dashboard

Med B Drugs Dashboard

Med B Drugs by State & Provider Specialty

Provider Search

Data Architecture

Setup

Project Structure

Streamlit Cloud Deployment

Rebuilding the HF dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.devcontainer		.devcontainer
data		data
pages		pages
scripts		scripts
.gitignore		.gitignore
MUP_DPR_RY22_20220715_DD_PRV_Drug.pdf		MUP_DPR_RY22_20220715_DD_PRV_Drug.pdf
Med_D_dashboard.py		Med_D_dashboard.py
README.md		README.md
app.py		app.py
build_provider_summary.py		build_provider_summary.py
dashboard_ai.py		dashboard_ai.py
dashboard_design.py		dashboard_design.py
dashboard_tables.py		dashboard_tables.py
medicare_D_dashboard.ipynb		medicare_D_dashboard.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Medicare Drug Dashboards

Features

Med D Drugs Dashboard

Med B Drugs Dashboard

Med B Drugs by State & Provider Specialty

Provider Search

Data Architecture

Setup

Project Structure

Streamlit Cloud Deployment

Rebuilding the HF dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages