Blazing fast data loading with HuggingFace Dataset and Ray Data
-
Updated
Jan 12, 2024
Blazing fast data loading with HuggingFace Dataset and Ray Data
Pipeline distribuido en Hadoop + Spark que cuantifica el impacto de la brecha digital sobre los resultados Saber 11 en municipios de Colombia. Procesa 14M registros con MLlib (regresión, clustering y red neuronal)
TPC-H Data Migration to NoSQL Pipeline with Benchmarking
A highly scalable, distributed Natural Language Processing (NLP) and machine learning pipeline built on Apache PySpark to perform binary sentiment classification on large-scale text corpora.
Scalable distributed data storage system for healthcare data.
Distributed Data Processing Project
Add a description, image, and links to the distributed-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the distributed-data-processing topic, visit your repo's landing page and select "manage topics."