nemo-curator
Orchestra-Research/AI-Research-SKILLs
NeMo Curator runs GPU-accelerated data curation for LLM training, covering text, image, video, and audio pipelines with fuzzy/semantic deduplication, quality classifiers, and PII/NSFW filters before scaling across RAPIDS clusters.