本地AI API开发循环

v20260423

together-local-dev-loop

这是一个为 Together AI 设计的本地开发环境，用于模拟其兼容 OpenAI 格式的 API 调用。它提供了聊天补全、嵌入和模型列表等核心功能的Mock服务，让开发者可以在不消耗真实 API 额度的情况下，快速构建和迭代 AI 应用，实现高效的本地开发流程。

AI 开发本地 API 模拟 OpenAI Together

获取技能

230 次下载

概览

Together AI Local Dev Loop

Overview

Local development workflow for Together AI inference API integration. Provides a fast feedback loop with mock chat completions, embeddings, and model listing endpoints so you can build AI-powered applications without consuming live API credits. Together AI is OpenAI-compatible, so the same client libraries work with both. Toggle between mock mode for rapid iteration and live mode for model evaluation.

Environment Setup

cp .env.example .env
# Set your credentials:
# TOGETHER_API_KEY=tog_xxxxxxxxxxxx
# TOGETHER_BASE_URL=https://api.together.xyz/v1
# MOCK_MODE=true
npm install express axios dotenv tsx typescript @types/node
npm install -D vitest supertest @types/express
# Or for Python: pip install together openai httpx pytest

Dev Server

// src/dev/server.ts
import express from "express";
import { createProxyMiddleware } from "http-proxy-middleware";
const app = express();
app.use(express.json());
const MOCK = process.env.MOCK_MODE === "true";
if (!MOCK) {
  app.use("/v1", createProxyMiddleware({
    target: process.env.TOGETHER_BASE_URL,
    changeOrigin: true,
    headers: { Authorization: `Bearer ${process.env.TOGETHER_API_KEY}` },
  }));
} else {
  const { mountMockRoutes } = require("./mocks");
  mountMockRoutes(app);
}
app.listen(3009, () => console.log(`Together dev server on :3009 [mock=${MOCK}]`));

Mock Mode

// src/dev/mocks.ts — OpenAI-compatible mock responses for inference
export function mountMockRoutes(app: any) {
  app.post("/v1/chat/completions", (req: any, res: any) => res.json({
    id: "chatcmpl-mock-001", object: "chat.completion", model: req.body.model || "meta-llama/Llama-3-70b-chat-hf",
    choices: [{ index: 0, message: { role: "assistant", content: "This is a mock response from Together AI." }, finish_reason: "stop" }],
    usage: { prompt_tokens: 25, completion_tokens: 12, total_tokens: 37 },
  }));
  app.post("/v1/embeddings", (req: any, res: any) => res.json({
    object: "list", model: req.body.model || "togethercomputer/m2-bert-80M-8k-retrieval",
    data: [{ object: "embedding", index: 0, embedding: Array(768).fill(0).map(() => Math.random() * 2 - 1) }],
  }));
  app.get("/v1/models", (_req: any, res: any) => res.json({
    data: [
      { id: "meta-llama/Llama-3-70b-chat-hf", type: "chat", context_length: 8192 },
      { id: "mistralai/Mixtral-8x22B-Instruct-v0.1", type: "chat", context_length: 65536 },
      { id: "togethercomputer/m2-bert-80M-8k-retrieval", type: "embedding", context_length: 8192 },
    ],
  }));
}

Testing Workflow

npm run dev:mock &                    # Start mock server in background
npm run test                          # Unit tests with vitest
npm run test -- --watch               # Watch mode for rapid iteration
MOCK_MODE=false npm run test:integration  # Integration test against real API

Debug Tips

Together AI is OpenAI-compatible — set base_url to http://localhost:3009/v1 for local dev
Use /v1/models to discover available model IDs instead of hardcoding them
Monitor usage.total_tokens in responses to estimate costs before switching to live mode
Batch inference (/v1/batch) runs at 50% cost but is async — poll for completion
Set max_tokens explicitly to avoid unexpectedly large responses and costs

Error Handling

Issue	Cause	Fix
`401 Unauthorized`	Invalid API key	Regenerate at api.together.xyz dashboard
`404 Model not found`	Wrong model ID string	Use `client.models.list()` to verify
`429 Rate Limited`	Too many requests per minute	Implement exponential backoff
`500 Server Error`	Model overloaded or cold start	Retry with backoff after 5s
`ECONNREFUSED :3009`	Dev server not running	Run `npm run dev:mock` first

Resources

Next Steps

See together-debug-bundle.

信息

Category 编程开发

Name together-local-dev-loop

版本 v20260423

大小 4.42KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-04-28