nemo-curator
Orchestra-Research/AI-Research-SKILLs
NeMo Curator streams GPU-accelerated curation for LLM training, with super-fast fuzzy deduplication, multimodal quality filters, semantic dedupl, PII redaction, NSFW checks, and RAPIDS-scaled pipelines for prepping massive corpora.