Deepgram 性能调优指南

v20260311

deepgram-performance-tuning

针对 Deepgram 转录，指引如何做音频预处理、连接池、模型选择、流式传输、并发控制与缓存策略，从而提升速度、降低延迟，适用于高负载语音管道。

Deepgram 性能转录流式并发缓存音频处理

获取技能

294 次下载

概览

Deepgram Performance Tuning

Overview
Prerequisites
Instructions
Output
Error Handling
Examples
Resources

Overview

Optimize Deepgram integration performance through audio preprocessing (16kHz mono PCM), connection pooling, model selection, streaming for large files, parallel processing, and result caching.

Prerequisites

Working Deepgram integration
Performance monitoring in place
Audio processing capabilities (ffmpeg)
Baseline metrics established

Instructions

Step 1: Optimize Audio Format

Preprocess audio to 16-bit PCM, mono channel, 16kHz sample rate WAV format using ffmpeg. This is optimal for Deepgram's speech models.

Step 2: Configure Connection Pooling

Create a pool of Deepgram clients (min 2, max 10) with acquire timeout and idle timeout. Use execute() pattern to auto-acquire and release connections.

Step 3: Select Optimal Model

Choose Nova-2 for best accuracy/speed balance. Use Base model for cost-sensitive batch jobs. Match model to priority: accuracy, speed, or cost.

Step 4: Implement Streaming for Large Files

Use live transcription WebSocket for files over 60 seconds. Stream file data in chunks (1MB) and collect final transcripts.

Step 5: Enable Parallel Processing

Use p-limit to process multiple audio files concurrently (default 5). Track per-file timing and total throughput.

Step 6: Cache Transcription Results

Hash audio URL + options as cache key. Store in Redis with configurable TTL. Return cached results for repeated requests.

See detailed implementation for advanced patterns.

Output

Audio preprocessing pipeline
Connection pool with auto-management
Model selection engine
Streaming transcription for large files
Parallel processing with concurrency control
Redis-backed result caching

Error Handling

Issue	Cause	Solution
Slow transcription	Wrong audio format	Preprocess to 16kHz mono WAV
Connection exhaustion	No pooling	Use connection pool
High latency	Large files	Switch to streaming
Redundant API calls	No caching	Enable transcription cache

Examples

Performance Factors

Factor	Impact	Optimization
Audio Format	High	16-bit PCM, mono, 16kHz
File Size	High	Stream large files
Model Choice	High	Balance accuracy vs speed
Concurrency	Medium	Pool connections
Network Latency	Medium	Use closest region

Resources

信息

Category 人工智能

Name deepgram-performance-tuning

版本 v20260311

大小 4.45KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-03-12