Strict type-checking + unit tests for foundry.utils; fix dataset fallback weights; drop dead rf3 module#325
Open
lyskov-ai wants to merge 4 commits into
Conversation
instantiators is already fully annotated, so this adds it to the per-module strict mypy override (a lock-in for future defs) and fills the test gap with tests/test_instantiators.py covering the callback/logger instantiation control flow. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…st logging Adds the three remaining small foundry.utils modules to the per-module strict mypy override. squashfs was already fully annotated (pure lock-in); ddp needed one annotation (RankedLogger.log varargs); logging needed five. New tests/test_logging.py covers the two pure helpers (CachedDataFilter.filter and condense_count_columns_of_grouped_df). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
rf3.data.paired_msa no longer imports against the installed atomworks: MultiInputDatasetWrapper subclasses StructuralDatasetWrapper, which atomworks turned into a deprecated factory function, so subclassing it raises TypeError at import. The module was reachable only through domain_distillation.yaml, which is itself referenced only in commented-out lines of pdb_and_distillation.yaml, and its LoadPairedMSAs class was used nowhere. Remove the module, its orphaned domain_distillation.yaml config, the dangling commented references in pdb_and_distillation.yaml, and the stale pyproject mypy-exemption comment. rf3 now type-checks fully with no in-file suppressions. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…+ tests In wrap_dataset_and_sampler_with_fallbacks the fallback-weights branch was gated on `"weights" in sampler_to_fallback_to`. Sampler defines no __contains__, so `in` iterates the sampler's integer indices and never matches the string — the documented "use the sampler's weights" branch was dead and the membership test needlessly drew samples. Switch to `hasattr(sampler_to_fallback_to, "weights")` (the idiomatic form atomworks itself uses); mypy then narrows the type so the `# type: ignore[attr-defined]` on `.weights` is dropped. Behaviour change: assemble_distributed_loader's reachable samplers (DistributedSampler / DistributedMixedSampler) have no `.weights`, so it stays uniform (behaviour-neutral). But models/mpnn/src/mpnn/train.py passes a WeightedRandomSampler built from computed PDB weights as the fallback sampler, so its training fallback sampling now uses those weights instead of silently-uniform ones — the function's documented intent. mpnn train.py is a cluster-coupled script (not in CI), so this is unverified here; flagged for mpnn owners to validate. Also add foundry.utils.datasets to the strict mypy override (already fully annotated — a pure lock-in) and add tests/test_datasets.py (18 tests, no prior coverage) covering the fallback-weights fix, the config-key sampler dispatch, and the distributed-loader sampler conversions + validation guards. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bundles four related changes that tighten type-checking and test coverage on the shared
foundrylayer.Strict type-checking for the rest of
foundry.utilsEnables
disallow_untyped_defs+check_untyped_defsfordatasets,instantiators,ddp,squashfs, andlogging, completing strict type-checking acrosssrc/foundry/utils/. These modules were already annotated, so this is mostly a lock-in (new untyped defs now fail the gate);ddpandloggingneeded a handful of return/param annotations.Fix: dataset fallback sampler weights
wrap_dataset_and_sampler_with_fallbacksselected the fallback sampler's weights with"weights" in sampler. ASamplerhas no__contains__, soiniterates the sampler's integer indices and the string never matches — the "use the sampler's weights" branch was effectively dead, and the test needlessly drew samples. Switched tohasattr(sampler, "weights")(the form atomworks uses internally); the type checker then narrows the type, so a now-redundant suppression was removed.Reviewer / mpnn maintainers — behaviour change: for the distributed dataloader path this is a no-op (its samplers have no
weights, so still uniform). Butmodels/mpnn/src/mpnn/train.pypasses aWeightedRandomSamplerbuilt from computed PDB weights as the fallback sampler — so its training fallback sampling now uses those weights instead of silently-uniform ones (the function's documented intent).mpnn/train.pyis a cluster-only script and is not exercised in CI, so this could not be verified here; please validate when running mpnn training.New unit tests (CPU-only, no cluster data)
tests/test_datasets.py(18) — the fallback-weights selection (incl. a regression test for the fix), the config-key sampler dispatch ininstantiate_single_dataset_and_sampler, and the sampler conversions + input-validation guards inassemble_distributed_loader.tests/test_instantiators.py(10) andtests/test_logging.py(6) — the instantiation control flow and the two pure logging helpers.Remove dead, broken
rf3.data.paired_msaThe module subclassed an atomworks class that has since become a factory function, so it raised
TypeErroron import, and it was reachable only through a commented-out (disabled) dataset config. Removed the module, its orphaneddomain_distillation.yamlconfig, and the dangling commented references.Verification:
ruff format/ruff checkclean,mypyclean (152 files),pytest356 passed.