ray-data
Orchestra-Research/AI-Research-SKILLs
Ray Data is a scalable distributed data processing library for ML workloads, streaming execution across CPU/GPU, integrates with Ray Train, PyTorch, TensorFlow, and handles Parquet/CSV/JSON/images for preprocessing, ETL, or batch inference from single node to cluster.