Using Together AI for Model Inference

v20260423

together-hello-world

This guide provides a comprehensive tutorial on running various AI inference tasks using the Together AI API, which is compatible with the OpenAI standard. It demonstrates key functionalities including chat completions, streaming responses, image generation, and creating embeddings. This is ideal for developers who need to test open-source models, compare performance across different LLMs (like Llama or Mixtral), or integrate powerful generative AI features into their applications using Python or Node.js.

AI Inference Python API Generative AI Together

Get Skill

335 downloads

Overview

Together AI Hello World

Overview

Run chat completions with open-source models via Together AI's OpenAI-compatible API. Supports Llama, Mixtral, Qwen, and 100+ models. Key endpoints: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/images/generations.

Instructions

Step 1: Chat Completions

from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to calculate fibonacci numbers"},
    ],
    max_tokens=500,
    temperature=0.7,
    top_p=0.9,
)

print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")

Step 2: Streaming

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,
    max_tokens=200,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Step 3: Image Generation

response = client.images.generate(
    model="black-forest-labs/FLUX.1-schnell-Free",
    prompt="A sunset over mountains, digital art style",
    width=1024, height=768,
    n=1,
)
print(f"Image URL: {response.data[0].url}")

Step 4: Embeddings

response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-8k-retrieval",
    input=["Hello world", "Together AI is great"],
)
print(f"Embedding dim: {len(response.data[0].embedding)}")

Step 5: Node.js with OpenAI Client

import OpenAI from 'openai';

const together = new OpenAI({
  apiKey: process.env.TOGETHER_API_KEY,
  baseURL: 'https://api.together.xyz/v1',
});

const chat = await together.chat.completions.create({
  model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
  messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(chat.choices[0].message.content);

Output

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

Tokens: 28 in, 45 out

Error Handling

Error	Cause	Solution
`Model not found`	Wrong model ID	Check docs.together.ai/docs/inference-models
Empty response	max_tokens too low	Increase max_tokens
`429 rate limit`	Too many requests	Implement backoff
Slow response	Large model	Try Turbo variant or smaller model

Resources

Next Steps

Proceed to together-local-dev-loop for development workflow.

Info

Category Artificial Intelligence

Name together-hello-world

Version v20260423

Size 3.41KB

Source jeremylongshore/claude-code-plugins-plus-skills

Updated At 2026-04-28