Complete checklist for deploying Perplexity Sonar API integrations to production. Perplexity-specific concerns: every API call performs a live web search (variable latency), citations link to third-party sites (must validate), and costs scale per-request plus per-token.
PERPLEXITY_API_KEY in secret manager (not env file)pplx- and has credits loadedhttps://api.perplexity.ai (not localhost/proxy)sonar for fast, sonar-pro for deepmax_tokens set on all requests (prevents runaway costs)search_domain_filter used where appropriate (reduces search time)sonar, complex to sonar-pro
max_tokens capped per endpointasync function searchWithFallback(query: string) {
try {
// Primary: sonar-pro for deep answers
return await perplexity.chat.completions.create({
model: "sonar-pro",
messages: [{ role: "user", content: query }],
max_tokens: 2048,
});
} catch (err: any) {
if (err.status === 429 || err.status >= 500) {
// Fallback: sonar for faster, cheaper response
return await perplexity.chat.completions.create({
model: "sonar",
messages: [{ role: "user", content: query }],
max_tokens: 512,
});
}
throw err;
}
}
app.get("/health/perplexity", async (req, res) => {
const start = Date.now();
try {
const response = await perplexity.chat.completions.create({
model: "sonar",
messages: [{ role: "user", content: "ping" }],
max_tokens: 5,
});
res.json({
status: "healthy",
latencyMs: Date.now() - start,
model: response.model,
});
} catch (err: any) {
res.status(503).json({
status: "unhealthy",
error: err.status || err.message,
latencyMs: Date.now() - start,
});
}
});
| Alert | Condition | Severity |
|---|---|---|
| API Unreachable | Health check fails 3x | P1 |
| High Error Rate | 429/5xx > 5% over 5min | P2 |
| High Latency | p95 > 15s for sonar | P2 |
| Budget Exceeded | Monthly cost > 80% cap | P2 |
| Auth Failure | Any 401/402 error | P1 |
| Issue | Cause | Solution |
|---|---|---|
| Variable latency | Web search per request | Set appropriate timeouts per model |
| Broken citations | Source pages changed | Validate citation URLs before displaying |
| Cost overrun | No model routing | Route simple queries to sonar |
| Rate limit spikes | Burst traffic | Queue requests with p-queue |
For version upgrades, see perplexity-upgrade-migration.