Real-Time Transcription and AI Analysis

v20260423

assemblyai-core-workflow-b

This workflow provides comprehensive audio processing capabilities, covering both real-time streaming transcription (for live captions and voice agents) and advanced post-processing using LeMUR. Use it for summarizing long meetings, answering questions based on transcripts, extracting action items, and running custom LLM analyses on recorded audio.

AssemblyAI AI Streaming Speech-to-Text LLM LeMUR Real-Time Transcription

Get Skill

416 downloads

Overview

AssemblyAI Core Workflow B — Streaming & LeMUR

Overview

Two advanced workflows: (1) real-time streaming transcription via WebSocket for live captioning and voice agents, and (2) LeMUR for applying LLMs to transcripts — summarization, Q&A, action items, and custom tasks.

Prerequisites

assemblyai package installed (npm install assemblyai)
API key configured in ASSEMBLYAI_API_KEY
For streaming: microphone or audio stream source

Part 1: Real-Time Streaming Transcription

Step 1: Basic Streaming Setup

import { AssemblyAI } from 'assemblyai';

const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY!,
});

const transcriber = client.streaming.createService({
  // Model options: 'nova-3' (default), 'nova-3-pro' (highest accuracy)
  speech_model: 'nova-3',
  sample_rate: 16000,
});

transcriber.on('open', ({ sessionId }) => {
  console.log('Session opened:', sessionId);
});

transcriber.on('transcript', (message) => {
  // message_type: 'PartialTranscript' or 'FinalTranscript'
  if (message.message_type === 'FinalTranscript') {
    console.log('[Final]', message.text);
  } else {
    process.stdout.write(`\r[Partial] ${message.text}`);
  }
});

transcriber.on('error', (error) => {
  console.error('Streaming error:', error);
});

transcriber.on('close', (code, reason) => {
  console.log('Session closed:', code, reason);
});

await transcriber.connect();

// Send audio chunks (16-bit PCM, 16kHz mono)
// transcriber.sendAudio(audioBuffer);

// When done:
// await transcriber.close();

Step 2: Stream from Microphone (Node.js)

import { AssemblyAI } from 'assemblyai';
import { spawn } from 'child_process';

const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY!,
});

const transcriber = client.streaming.createService({
  speech_model: 'nova-3',
  sample_rate: 16000,
});

transcriber.on('transcript', (msg) => {
  if (msg.message_type === 'FinalTranscript' && msg.text) {
    console.log(msg.text);
  }
});

await transcriber.connect();

// Use SoX to capture microphone audio as raw PCM
const mic = spawn('sox', [
  '-d',                  // default audio device
  '-t', 'raw',           // raw PCM output
  '-b', '16',            // 16-bit
  '-r', '16000',         // 16kHz sample rate
  '-c', '1',             // mono
  '-e', 'signed-integer',
  '-',                   // pipe to stdout
]);

mic.stdout.on('data', (chunk: Buffer) => {
  transcriber.sendAudio(chunk);
});

mic.on('close', async () => {
  await transcriber.close();
});

// Handle Ctrl+C
process.on('SIGINT', async () => {
  mic.kill();
  await transcriber.close();
  process.exit(0);
});

Step 3: Browser-Safe Temporary Token

// Server-side: generate a short-lived token for the browser
const token = await client.streaming.createTemporaryToken({
  expires_in_seconds: 300, // 5 minutes
});

// Send `token` to your frontend
// Client-side uses token instead of API key:
// const transcriber = new StreamingTranscriber({ token: receivedToken });

Step 4: Streaming with Word Boost and Speaker Labels

const transcriber = client.streaming.createService({
  speech_model: 'nova-3-pro',
  sample_rate: 16000,
  word_boost: ['AssemblyAI', 'LeMUR', 'transcription'],
  enable_extra_session_information: true,
});

transcriber.on('turn', (turn) => {
  // Speaker-labeled turns (available with nova-3-pro)
  console.log(`Speaker ${turn.speaker}: ${turn.transcript}`);
});

Part 2: LeMUR — LLM-Powered Audio Analysis

Step 5: Summarize a Transcript

// First transcribe (or use an existing transcript_id)
const transcript = await client.transcripts.transcribe({
  audio: 'https://example.com/meeting.mp3',
});

// Summarize with LeMUR
const { response } = await client.lemur.summary({
  transcript_ids: [transcript.id],
  context: 'This is a weekly engineering standup meeting.',
  answer_format: 'bullet points',
});

console.log('Summary:', response);

Step 6: Ask Questions About Audio

const { response: answers } = await client.lemur.questionAnswer({
  transcript_ids: [transcript.id],
  questions: [
    { question: 'What decisions were made?', answer_format: 'list' },
    { question: 'Were there any blockers discussed?', answer_format: 'short sentence' },
    { question: 'Who owns the next action items?', answer_format: 'list' },
  ],
});

for (const qa of answers) {
  console.log(`Q: ${qa.question}`);
  console.log(`A: ${qa.answer}\n`);
}

Step 7: Extract Action Items

const { response: actionItems } = await client.lemur.actionItems({
  transcript_ids: [transcript.id],
  context: 'This is a product planning meeting.',
  answer_format: 'Each action item should include the owner and deadline.',
});

console.log('Action Items:', actionItems);

Step 8: Custom LeMUR Task

const { response } = await client.lemur.task({
  transcript_ids: [transcript.id],
  prompt: `Analyze this customer support call and provide:
    1. Customer sentiment (positive/neutral/negative)
    2. Issue category
    3. Resolution status
    4. CSAT prediction (1-5)
    Format as JSON.`,
});

const analysis = JSON.parse(response);
console.log(analysis);

Step 9: Multi-Transcript Analysis

// LeMUR can analyze up to 100 hours of audio in a single request
const transcriptIds = [
  'transcript-1', 'transcript-2', 'transcript-3',
];

const { response } = await client.lemur.task({
  transcript_ids: transcriptIds,
  prompt: 'Compare themes across these three customer interviews. What patterns emerge?',
});

console.log(response);

Streaming Specifications

Spec	Value
Audio format	16-bit PCM, mono
Sample rates	8000, 16000, 22050, 44100, 48000 Hz
Latency (P50)	~300ms
Max concurrent streams (free)	5 new/min
Max concurrent streams (paid)	100 new/min, auto-scales 10%/60s
Languages	99+ (with Universal-3)
Models	`nova-3` (default), `nova-3-pro` (highest accuracy)

Output

Real-time partial and final transcripts via WebSocket
Speaker-labeled streaming turns (nova-3-pro)
LeMUR summaries, Q&A responses, action items
Custom LLM analysis with structured output

Error Handling

Error	Cause	Solution
`Session limit reached`	Too many concurrent streams	Wait or upgrade plan
`Invalid audio encoding`	Wrong PCM format	Use 16-bit signed integer, mono
`WebSocket disconnected`	Network interruption	Implement reconnection logic
`LeMUR context too long`	>100 hours of audio	Split into smaller batches
`transcript not found`	Invalid transcript_id	Verify ID exists via `client.transcripts.get()`

Resources

Next Steps

For error troubleshooting, see assemblyai-common-errors.

Info

Category Artificial Intelligence

Name assemblyai-core-workflow-b

Version v20260423

Size 7.66KB

Source jeremylongshore/claude-code-plugins-plus-skills

Updated At 2026-04-28