Deployment architectures for Perplexity Sonar search API at different scales. Perplexity's search-augmented generation model fits different patterns from simple search widgets to full research automation pipelines.
Best for: Adding AI search to an app, < 500 queries/day.
@app.route('/ask')
def ask():
response = pplx_client.chat.completions.create(
model="sonar", messages=[{"role": "user", "content": request.args["q"]}]
)
return jsonify({
"answer": response.choices[0].message.content,
"citations": response.citations
})
Best for: Repeated queries, 500-5K queries/day, research tools.
class CachedResearch:
def __init__(self, client, cache, ttl=1800): # 1800: timeout: 30 minutes
self.client = client
self.cache = cache
self.ttl = ttl
def search(self, query: str, model: str = "sonar"):
key = f"pplx:{hashlib.sha256(query.encode()).hexdigest()}"
cached = self.cache.get(key)
if cached:
return json.loads(cached)
result = self.client.chat.completions.create(
model=model, messages=[{"role": "user", "content": query}]
)
data = {"answer": result.choices[0].message.content, "citations": result.citations}
self.cache.setex(key, self.ttl, json.dumps(data))
return data
Best for: Automated research, 5K+ queries/day, report generation.
class ResearchPipeline:
async def research_topic(self, topic: str) -> dict:
# Decompose into sub-questions
sub_questions = await self.decompose(topic)
# Run parallel searches
results = await asyncio.gather(*[
self.search_with_cache(q) for q in sub_questions
])
# Synthesize into report
report = await self.synthesize(topic, results)
return {"topic": topic, "sections": results, "synthesis": report}
async def decompose(self, topic: str) -> list[str]:
r = self.client.chat.completions.create(
model="sonar", messages=[
{"role": "system", "content": "Break this topic into 3-5 specific research questions."},
{"role": "user", "content": topic}
])
return r.choices[0].message.content.strip().split("\n")
| Factor | Direct Widget | Cached Layer | Research Pipeline |
|---|---|---|---|
| Volume | < 500/day | 500-5K/day | 5K+/day |
| Use Case | Quick answers | Repeated queries | Deep research |
| Latency | 2-5s | 50ms (cached) | 10-30s |
| Model | sonar | sonar | sonar-pro |
| Issue | Cause | Solution |
|---|---|---|
| Slow in UI | No caching | Cache repeated queries |
| High cost | sonar-pro everywhere | Route by complexity |
| Stale answers | Long cache TTL | Reduce TTL for current events |
Basic usage: Apply perplexity architecture variants to a standard project setup with default configuration options.
Advanced scenario: Customize perplexity architecture variants for production environments with multiple constraints and team-specific requirements.