experiment-queue
wanshuiyin/Auto-claude-code-research-in-sleep
A robust scheduler designed for orchestrating large-scale, multi-stage Machine Learning experiments on remote GPU servers. It manages complex workflows such as multi-seed grid sweeps, wave transitions, and sequential job chains (e.g., teacher-student distillation). Key features include OOM-aware retries, stale screen cleanup, and race condition prevention, making it ideal when basic single-run scheduling is insufficient for complex batch processing.