llama-cpp
Orchestra-Research/AI-Research-SKILLs
llama.cpp is a pure C/C++ library designed for efficient LLM inference, optimized to run on resource-constrained environments. It excels in scenarios where high-end NVIDIA GPUs (CUDA) are unavailable, making it ideal for Apple Silicon Macs, AMD/Intel GPUs, or edge deployment devices like Raspberry Pi. It supports GGUF quantization formats, enabling significant memory reduction and speed improvements for local, cross-platform AI applications.