ray-train
Orchestra-Research/AI-Research-SKILLs
Ray Train orchestrates distributed PyTorch, TensorFlow, and HuggingFace workloads from laptops to clusters, automating GPU placement, fault tolerance, checkpointing, and hyperparameter sweeps so teams can scale massive model training with minimal code changes.