training-check
wanshuiyin/Auto-claude-code-research-in-sleep
This skill provides interactive, periodic monitoring of deep learning training runs. It continuously checks critical metrics—such as loss trends, evaluation scores, and gradient norms—against defined baselines to detect signs of divergence, NaN values, plateaus, or general instability. It delivers a structured report and recommends whether to continue, wait, or stop the training process, ensuring computational resources are not wasted on failing models.