Skip to content

feat: comprehensive wandb logging (main-outcomes, panel routing, diag norms)#15

Open
RyanKim17920 wants to merge 2 commits into
MedARC-AI:mainfrom
RyanKim17920:feat/wandb-logging-only
Open

feat: comprehensive wandb logging (main-outcomes, panel routing, diag norms)#15
RyanKim17920 wants to merge 2 commits into
MedARC-AI:mainfrom
RyanKim17920:feat/wandb-logging-only

Conversation

@RyanKim17920

@RyanKim17920 RyanKim17920 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Panel routing: all wandb keys are now routed into logical panels instead of a single flat namespace. Headline losses (dino, ibot, kde, total, grad_norm, lr, and their val_* counterparts) land in main-train-results/; bookkeeping metrics (throughput, memory, ETA, etc.) land in train-tracking/; diagnostic keys keep their diag/ prefix; probe keys keep their probe/ prefix.
  • main-outcomes/ panel: after each probe run, headline downstream scores (linear, kNN, 16-shot, segmentation, progression, mutation, survival, robustness, mean) are logged to a clean main-outcomes/ panel and per-dataset breakdowns to main-outcomes-datasets/. This surfaces the numbers that actually matter at a glance rather than hunting through flat probe/* keys.
  • Diagnostic token/attention norms (diag/): when train.log_diagnostics_every is set in the config (default 0 = off), the student backbone runs a diagnostic forward pass that captures per-layer q/k/v norms, qk similarity, and cls/reg/patch token norms every N log steps. Four evenly-spaced layers are sampled plus a per-metric mean across all layers.

What was NOT changed

No architecture, dataset, batch size, probe logic, or config defaults were modified. The diagnostic forward path is opt-in and unreachable unless log_diagnostics_every > 0. The normal (non-diagnostic) forward path in both model.py and train.py is identical to before this PR.

Test plan

  • Smoke-test passes (WANDB_MODE=disabled python train.py ...) — existing training loop unchanged for default config
  • With log_diagnostics_every: 10 in config, confirm diag/ keys appear in wandb at the correct steps
  • After a probe run, confirm main-outcomes/mean and main-outcomes-datasets/ keys appear in wandb
  • Val loop still works (unpacks 4 values from compute_losses, discards diag)

🤖 Generated with Claude Code

… diag norms)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@PaulScotti

Copy link
Copy Markdown
Contributor

Thanks for the PR! I appreciate trying to clean up our logging/plotting to be more informative

I have several concerns though, you made lots of changes and I'm not sure all are good/intended:

  1. model.py (line 168): diagnostic mode bypasses activation checkpointing. With checkpoint=True, normal forward called checkpoint 4 times
  2. train.py (line 571): if log_diagnostics_every: 1, diagnostics run on step 1 while FlopCounterMode measures measured_flops_per_step, so the permanent FLOP estimate includes diagnostic-only work
  3. train.py (line 477): log_main_outcomes() runs every time log_probe_results() runs, not only when a new probe result is collected. Testing it out, two calls logged twice at the same W&B step
  4. train.py (line 508): not sure why you need to rename these probe outputs, isnt "probe/bracs_score" fine?
  5. train.py (line 488): usage of broad try/except conflicts with our preferred fail-loudly rule and MedARC coding standards
  6. train.py (line 297): renames existing W&B keys from stable train/* and val/* namespaces into new namespaces? Why do this
  7. train.py (line 536): train.log_diagnostics_every is an undocumented .get(..., 0) fallback, again not desirably coding style

Keep the new main-outcomes/ panel; fix the diagnostic path and stop renaming
stable namespaces per review:

- Keep train/* and val/* in their stable namespaces (cross-run history stays
  intact). Additionally mirror the headline losses into a clean
  main-train-results/ glance panel (dino/ibot/kde/total/grad_norm/lr +
  val_*), as an additive copy, not a rename. Bookkeeping stays under
  train/*. (point 6)
- log_main_outcomes now emits each probe result exactly once via a
  last-logged train_step guard instead of re-logging the latest at the
  same W&B step on every train log step. (point 3)
- Drop the broad try/except in log_main_outcomes to fail loudly, matching
  collect_probe_results. (point 5)
- main-outcomes-datasets/<dataset>_score keeps the _score suffix and, like
  collect_probe_results, strips the probe_ prefix so the keys match the
  existing probe/<dataset>_score names. (point 4)
- Diagnostic forward now honors activation checkpointing exactly like the
  normal path. (point 1)
- Diagnostics never run on the FLOP-measurement step, so the reused per-step
  FLOP estimate excludes diagnostic-only work. (point 2)
- log_diagnostics_every is now a documented config key (configs/main.yaml,
  configs/smoke.yaml) read via direct indexing, no .get(...,0) fallback. (point 7)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@RyanKim17920

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback! I've updated the code accordingly and addressed all your major points:

  1. The diagnostic forward now runs through the activation-checkpoint path too (matches the normal forward), verified on a GPU smoke test
  2. Diagnostics no longer run on the FLOP-measurement step
  3. Double logging is fixed now, main-outcomes logs on probe results instead of train log steps
  4. We use the same names now (keeping _score for consistency)
  5. Removed the try/excepts to match with code standards
  6. train/* and val/* are unchanged from beforehand, but I mirror the main losses into a main-train-results/ panel for visibility since train/* had lots of diagnostic/throughput metrics.
  7. log_diagnostics_every is defaulted to 0 in the yaml and uses this instead of a get fallback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants