Skip to content

Pull requests: mosaicml/streaming

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Fix dataframe_to_mds for non-nullable ArrayType columns
#984 opened Jun 13, 2026 by discobot Loading…
6 of 8 tasks
Fix merge_index dropping nested directories from shard paths
#983 opened Jun 13, 2026 by discobot Loading…
6 of 8 tasks
Update streaming dependencies
#982 opened May 19, 2026 by bfontain Loading…
Upgrade Transformers to v5
#977 opened Feb 26, 2026 by sumedh-shenoy Loading…
3 of 7 tasks
Megatron streaming dataset
#976 opened Feb 12, 2026 by bodsul Loading…
[HF Datasets] Improve file download
#975 opened Feb 11, 2026 by lhoestq Loading…
Update pytest-cov requirement from <7,>=4 to >=4,<8 dependencies Pull requests that update a dependency file python Pull requests that update python code
#972 opened Feb 2, 2026 by dependabot Bot Loading…
Bump pytest from 8.4.1 to 9.0.2 dependencies Pull requests that update a dependency file python Pull requests that update python code
#969 opened Jan 26, 2026 by dependabot Bot Loading…
Bump pydantic from 2.11.7 to 2.12.5 dependencies Pull requests that update a dependency file python Pull requests that update python code
#968 opened Jan 26, 2026 by dependabot Bot Loading…
Update google-cloud-storage requirement from <3.3.0,>=2.9.0 to >=2.9.0,<3.9.0 dependencies Pull requests that update a dependency file python Pull requests that update python code
#967 opened Jan 26, 2026 by dependabot Bot Loading…
Make SparkConnect the data source
#934 opened Jun 25, 2025 by XiaohanZhangCMU Contributor Loading…
8 tasks
Update numpy requirement from <2.2.0,>=1.21.5 to >=1.21.5,<2.3.0 dependencies Pull requests that update a dependency file python Pull requests that update python code
#896 opened Apr 7, 2025 by dependabot Bot Loading…
Add upper bound for prefix_int
#823 opened Nov 5, 2024 by XiaohanZhangCMU Contributor Loading…
8 tasks
add jpeg quality option
#818 opened Oct 28, 2024 by cabreraalex Loading…
8 tasks
Refactor spanner to avoid creating large array
#773 opened Sep 3, 2024 by XiaohanZhangCMU Contributor Loading…
8 tasks done
Heterogeneous
#684 opened May 24, 2024 by XiaohanZhangCMU Contributor Draft
8 tasks
Column logical (not physical) type and allow_schema_mismatch
#606 opened Feb 22, 2024 by knighton Contributor Loading…
parallel merge index
#590 opened Feb 5, 2024 by XiaohanZhangCMU Contributor Loading…
8 tasks
Add varint to MDS
#574 opened Jan 23, 2024 by knighton Contributor Loading…
Add options to precompute the epoch
#569 opened Jan 20, 2024 by knighton Contributor Loading…
Nuke 1) torch dist, 2) shared memory, and 3) filelock
#556 opened Dec 30, 2023 by knighton Contributor Loading…
Add fine-grained timings to Writers
#555 opened Dec 30, 2023 by knighton Contributor Loading…
Let's blow away dist, and also shared memory
#552 opened Dec 26, 2023 by knighton Contributor Draft
2 of 3 tasks
Parquet streaming [WIP]
#538 opened Dec 15, 2023 by knighton Contributor Loading…
ProTip! Exclude everything labeled bug with -label:bug.