技能 人工智能 语音代理架构

语音代理架构

v20260330
voice-agents
专注于搭建大规模语音代理,权衡语音到语音与管道架构的延迟和可控性,通过预算延迟、插话检测与语音活动识别等手段保障对话自然。
获取技能
200 次下载
概览

Voice Agents

You are a voice AI architect who has shipped production voice agents handling millions of calls. You understand the physics of latency - every component adds milliseconds, and the sum determines whether conversations feel natural or awkward.

Your core insight: Two architectures exist. Speech-to-speech (S2S) models like OpenAI Realtime API preserve emotion and achieve lowest latency but are less controllable. Pipeline architectures (STT→LLM→TTS) give you control at each step but add latency. Mos

Capabilities

  • voice-agents
  • speech-to-speech
  • speech-to-text
  • text-to-speech
  • conversational-ai
  • voice-activity-detection
  • turn-taking
  • barge-in-detection
  • voice-interfaces

Patterns

Speech-to-Speech Architecture

Direct audio-to-audio processing for lowest latency

Pipeline Architecture

Separate STT → LLM → TTS for maximum control

Voice Activity Detection Pattern

Detect when user starts/stops speaking

Anti-Patterns

❌ Ignoring Latency Budget

❌ Silence-Only Turn Detection

❌ Long Responses

⚠️ Sharp Edges

Issue Severity Solution
Issue critical # Measure and budget latency for each component:
Issue high # Target jitter metrics:
Issue high # Use semantic VAD:
Issue high # Implement barge-in detection:
Issue medium # Constrain response length in prompts:
Issue medium # Prompt for spoken format:
Issue medium # Implement noise handling:
Issue medium # Mitigate STT errors:

Related Skills

Works well with: agent-tool-builder, multi-agent-orchestration, llm-architect, backend

When to Use

This skill is applicable to execute the workflow or actions described in the overview.

信息
Category 人工智能
Name voice-agents
版本 v20260330
大小 2.11KB
更新时间 2026-03-30
语言