sparse-autoencoder-training
Orchestra-Research/AI-Research-SKILLs
SAELens provides a framework for training and analyzing Sparse Autoencoders (SAEs). SAEs decompose the dense, often polysemantic activations of large language models into sparse, monosemantic features. Use this when you need to discover the discrete, interpretable concepts a model has learned, study feature superposition, or analyze specific safety-relevant behaviors (like bias or deception) within deep neural networks.