Primary workflow for Speak: AI-powered conversation practice with real-time pronunciation feedback and adaptive tutoring. Speak uses GPT-4o for conversation generation and OpenAI's Realtime API for speech processing, delivering sub-second response times.
speak-install-auth setupimport { SpeakClient } from '@speak/language-sdk';
const client = new SpeakClient({
apiKey: process.env.SPEAK_API_KEY!,
appId: process.env.SPEAK_APP_ID!,
language: 'es',
});
// Start a restaurant ordering scenario in Spanish
const session = await client.startConversation({
scenario: 'ordering-food',
language: 'es',
level: 'intermediate',
nativeLanguage: 'en',
maxTurns: 10,
feedbackDetail: 'phoneme', // 'word' or 'phoneme'
});
console.log('Session started:', session.id);
console.log('AI Tutor:', session.firstPrompt.text);
// "Bienvenido al restaurante. Soy tu camarero. Que le gustaria ordenar?"
// Submit audio for pronunciation scoring
const turn1 = await client.sendTurn(session.id, {
audioPath: './recordings/student-response-1.wav',
});
console.log('Tutor:', turn1.tutorText);
console.log('Pronunciation:', turn1.pronunciationScore); // 0-100
console.log('Grammar:', turn1.corrections);
// [{original: "yo quiero", suggestion: "quisiera", note: "More polite form for ordering"}]
console.log('Vocabulary:', turn1.vocabularyNotes);
// ["camarero = waiter", "ordenar = to order"]
// Or submit text (skips pronunciation scoring)
const turn2 = await client.sendTurn(session.id, {
text: 'Quisiera una ensalada y un vaso de agua, por favor.',
});
async function runConversationLesson(
client: SpeakClient,
scenario: string,
language: string,
level: string,
) {
const session = await client.startConversation({
scenario, language, level, nativeLanguage: 'en',
});
const turns: TurnResult[] = [];
let isComplete = false;
while (!isComplete && turns.length < 10) {
// Display tutor prompt
const prompt = turns.length === 0
? session.firstPrompt.text
: turns[turns.length - 1].tutorText;
console.log(`\nTutor: ${prompt}`);
// Get student audio (mic input or file)
const audioPath = await recordStudentAudio();
// Submit and get feedback
const turn = await client.sendTurn(session.id, { audioPath });
turns.push(turn);
// Show feedback
if (turn.pronunciationScore < 60) {
console.log(`Pronunciation needs work: ${turn.pronunciationScore}/100`);
console.log('Try again with this phrase.');
}
if (turn.corrections.length > 0) {
console.log('Grammar notes:', turn.corrections.map(c => c.note).join('; '));
}
isComplete = turn.sessionComplete;
}
// End session and get summary
const summary = await client.endSession(session.id);
return summary;
}
const topics = ['greetings', 'directions', 'ordering-food', 'shopping'];
const results: SessionSummary[] = [];
for (const topic of topics) {
console.log(`\n=== ${topic.toUpperCase()} ===`);
const summary = await runConversationLesson(client, topic, 'es', 'intermediate');
results.push(summary);
console.log(`Score: ${summary.avgPronunciationScore}/100`);
}
// Overall progress report
console.log('\n=== Session Report ===');
console.table(results.map(r => ({
topic: r.scenario,
pronunciation: r.avgPronunciationScore,
grammar: r.grammarAccuracy + '%',
newWords: r.newWords.length,
duration: r.durationMinutes + 'min',
})));
| Category | Scenarios | Level |
|---|---|---|
| Daily Life | greetings, introductions, weather | Beginner |
| Travel | directions, hotel, airport, transport | Beginner-Intermediate |
| Food & Drink | ordering-food, grocery, cooking | Intermediate |
| Business | meeting, presentation, negotiation | Intermediate-Advanced |
| Social | party, dating, opinions, debate | Advanced |
| Error | Cause | Solution |
|---|---|---|
| Session timeout | Exceeded 30 min | Auto-end with summary, start new session |
| Audio processing failed | Invalid format | Convert to WAV 16kHz mono |
| Tutor not responding | API latency | Implement 10s timeout with retry |
| Recognition failed | Poor audio quality | Prompt user to re-record in quiet environment |
For pronunciation-focused training, see speak-core-workflow-b.
Quick test: Start a greetings scenario with level: 'beginner', send 3 text responses, end session, and review the summary scores.
Full lesson: Run 4 topics in sequence, track pronunciation improvement across topics, and generate a progress report.