Together AI provides OpenAI-compatible inference, fine-tuning, and batch processing across 100+ open-source models (Llama, Mixtral, Qwen, FLUX). Common errors include model-not-available failures when requesting deprecated or gated models, token limit violations that differ per model architecture, and fine-tune job failures from dataset formatting issues. The API is compatible with any OpenAI client library at base_url = 'https://api.together.xyz/v1'. Model IDs use the full namespace format (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct) and must match exactly. This reference covers inference, fine-tuning, and deployment errors.
| Code | Message | Cause | Fix |
|---|---|---|---|
401 |
Unauthorized |
Invalid or missing TOGETHER_API_KEY |
Verify key at api.together.xyz > Settings |
400 |
Model not found |
Wrong model ID or model deprecated | Use client.models.list() to get valid model IDs |
400 |
Token limit exceeded |
Input + max_tokens exceeds model context | Reduce input length or lower max_tokens parameter |
400 |
Invalid fine-tune dataset |
JSONL format errors or missing required fields | Each line must be valid JSON with messages array |
402 |
Insufficient credits |
Account balance depleted | Add credits at api.together.xyz > Billing |
404 |
Fine-tune job not found |
Invalid job ID or job expired | List active jobs with client.fine_tuning.list() |
429 |
Rate limit exceeded |
Too many concurrent requests | Implement backoff; use batch API for 50% cost reduction |
500 |
Model overloaded |
High demand on specific model | Retry with backoff; try alternative model of same family |
interface TogetherError {
code: number;
message: string;
category: "auth" | "rate_limit" | "validation" | "billing";
}
function classifyTogetherError(status: number, body: string): TogetherError {
if (status === 401) {
return { code: 401, message: body, category: "auth" };
}
if (status === 402) {
return { code: 402, message: body, category: "billing" };
}
if (status === 429) {
return { code: 429, message: "Rate limit exceeded", category: "rate_limit" };
}
return { code: status, message: body, category: "validation" };
}
Together uses Bearer token authentication. Pass TOGETHER_API_KEY via Authorization: Bearer header or set it in the client constructor. Keys do not expire but can be revoked. If using the OpenAI client library, set base_url='https://api.together.xyz/v1' and pass the Together key as api_key.
Rate limits vary by plan tier and are enforced per-key. Free tier allows 5 requests/second; paid tiers scale higher. Use the batch inference API (/v1/batch) for non-real-time workloads at 50% cost reduction. Check X-RateLimit-Remaining header to monitor quota.
Model IDs must match exactly (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct). Use client.models.list() to enumerate available models. Token limits vary per model -- Llama 3.1 supports 128K context while older models may support only 4K. Fine-tune datasets must be JSONL with each line containing a messages array in chat format. Empty messages arrays or missing role fields cause silent validation failures. Validate each JSONL line independently before uploading.
| Scenario | Pattern | Recovery |
|---|---|---|
| Model deprecated | 400 with "not found" | Check model list; migrate to successor model |
| Token limit exceeded | 400 on long prompts | Truncate input or use model with larger context window |
| Fine-tune dataset rejected | JSONL validation errors | Validate each line independently; fix and re-upload |
| Credits depleted mid-batch | 402 after N successful calls | Add credits, resume from last successful request |
| Model overloaded at peak | 500 on popular models | Fall back to alternative model in same family |
# Verify API connectivity and list available models
curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
https://api.together.xyz/v1/models
See together-debug-bundle.