技能 人工智能 Groq API 聊天补全指南

Groq API 聊天补全指南

v20260423
groq-hello-world
本指南全面介绍了如何使用Groq的高速LPU进行聊天补全。内容包含TypeScript和Python代码示例,涵盖了基础聊天交互、流式响应处理,以及多模态和不同性能级别的模型使用。适用于希望快速构建高性能、低延迟AI应用的开发者。
获取技能
466 次下载
概览

Groq Hello World

Overview

Build a minimal chat completion with Groq's LPU inference API. Groq uses an OpenAI-compatible endpoint, so the API shape is familiar -- but responses arrive 10-50x faster than GPU-based providers.

Prerequisites

  • groq-sdk installed (npm install groq-sdk)
  • GROQ_API_KEY environment variable set
  • Completed groq-install-auth setup

Instructions

Step 1: Basic Chat Completion (TypeScript)

import Groq from "groq-sdk";

const groq = new Groq();

async function main() {
  const completion = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "What is Groq's LPU and why is it fast?" },
    ],
  });

  console.log(completion.choices[0].message.content);
  console.log(`Tokens: ${completion.usage?.total_tokens}`);
}

main().catch(console.error);

Step 2: Streaming Response

async function streamExample() {
  const stream = await groq.chat.completions.create({
    model: "llama-3.3-70b-versatile",
    messages: [
      { role: "user", content: "Explain quantum computing in 3 sentences." },
    ],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";
    process.stdout.write(content);
  }
  console.log(); // newline
}

Step 3: Python Equivalent

from groq import Groq

client = Groq()

completion = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Groq's LPU and why is it fast?"},
    ],
)

print(completion.choices[0].message.content)
print(f"Tokens: {completion.usage.total_tokens}")

Step 4: Try Different Models

// Speed tier -- fastest responses (~560 tok/s)
const fast = await groq.chat.completions.create({
  model: "llama-3.1-8b-instant",
  messages: [{ role: "user", content: "Hello!" }],
});

// Quality tier -- best reasoning (~280 tok/s)
const quality = await groq.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Explain monads in Haskell." }],
});

// Vision tier -- multimodal understanding
const vision = await groq.chat.completions.create({
  model: "meta-llama/llama-4-scout-17b-16e-instruct",
  messages: [{
    role: "user",
    content: [
      { type: "text", text: "Describe this image." },
      { type: "image_url", image_url: { url: "https://example.com/photo.jpg" } },
    ],
  }],
});

Available Models (Current)

Model ID Params Context Speed Best For
llama-3.1-8b-instant 8B 128K ~560 tok/s Classification, extraction, fast tasks
llama-3.3-70b-versatile 70B 128K ~280 tok/s General purpose, reasoning, code
llama-3.3-70b-specdec 70B 128K Faster Same quality, speculative decoding
meta-llama/llama-4-scout-17b-16e-instruct 17Bx16E 128K ~460 tok/s Vision, multimodal
meta-llama/llama-4-maverick-17b-128e-instruct 17Bx128E 128K Best multimodal quality

Response Structure

interface ChatCompletion {
  id: string;                    // "chatcmpl-xxx"
  object: "chat.completion";
  created: number;               // Unix timestamp
  model: string;                 // Actual model used
  choices: [{
    index: number;
    message: { role: "assistant"; content: string };
    finish_reason: "stop" | "length" | "tool_calls";
  }];
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
    queue_time: number;          // Groq-specific: seconds in queue
    prompt_time: number;         // Groq-specific: seconds for prompt
    completion_time: number;     // Groq-specific: seconds for completion
    total_time: number;          // Groq-specific: total processing seconds
  };
}

Error Handling

Error Cause Solution
401 Invalid API Key Key not set or invalid Check GROQ_API_KEY env var
model_not_found Typo in model ID or deprecated model Check model list at console.groq.com/docs/models
429 Rate limit Free tier: 30 RPM on large models Wait for retry-after header value
context_length_exceeded Prompt + max_tokens > model context Reduce prompt size or set lower max_tokens

Resources

Next Steps

Proceed to groq-local-dev-loop for development workflow setup.

信息
Category 人工智能
Name groq-hello-world
版本 v20260423
大小 5.21KB
更新时间 2026-04-26
语言