Local AI API Development Loop

v20260423

together-local-dev-loop

This tool provides a local development loop for integrating with Together AI's OpenAI-compatible API. It mocks key functionalities like chat completions, embeddings, and model listing, allowing developers to build and test AI applications rapidly without consuming live API credits. Developers can toggle between mock mode for rapid iteration and live mode for final model evaluation, ensuring a seamless workflow.

AI Development Local API Mocking OpenAI Together

Get Skill

230 downloads

Overview

Together AI Local Dev Loop

Overview

Local development workflow for Together AI inference API integration. Provides a fast feedback loop with mock chat completions, embeddings, and model listing endpoints so you can build AI-powered applications without consuming live API credits. Together AI is OpenAI-compatible, so the same client libraries work with both. Toggle between mock mode for rapid iteration and live mode for model evaluation.

Environment Setup

cp .env.example .env
# Set your credentials:
# TOGETHER_API_KEY=tog_xxxxxxxxxxxx
# TOGETHER_BASE_URL=https://api.together.xyz/v1
# MOCK_MODE=true
npm install express axios dotenv tsx typescript @types/node
npm install -D vitest supertest @types/express
# Or for Python: pip install together openai httpx pytest

Dev Server

// src/dev/server.ts
import express from "express";
import { createProxyMiddleware } from "http-proxy-middleware";
const app = express();
app.use(express.json());
const MOCK = process.env.MOCK_MODE === "true";
if (!MOCK) {
  app.use("/v1", createProxyMiddleware({
    target: process.env.TOGETHER_BASE_URL,
    changeOrigin: true,
    headers: { Authorization: `Bearer ${process.env.TOGETHER_API_KEY}` },
  }));
} else {
  const { mountMockRoutes } = require("./mocks");
  mountMockRoutes(app);
}
app.listen(3009, () => console.log(`Together dev server on :3009 [mock=${MOCK}]`));

Mock Mode

// src/dev/mocks.ts — OpenAI-compatible mock responses for inference
export function mountMockRoutes(app: any) {
  app.post("/v1/chat/completions", (req: any, res: any) => res.json({
    id: "chatcmpl-mock-001", object: "chat.completion", model: req.body.model || "meta-llama/Llama-3-70b-chat-hf",
    choices: [{ index: 0, message: { role: "assistant", content: "This is a mock response from Together AI." }, finish_reason: "stop" }],
    usage: { prompt_tokens: 25, completion_tokens: 12, total_tokens: 37 },
  }));
  app.post("/v1/embeddings", (req: any, res: any) => res.json({
    object: "list", model: req.body.model || "togethercomputer/m2-bert-80M-8k-retrieval",
    data: [{ object: "embedding", index: 0, embedding: Array(768).fill(0).map(() => Math.random() * 2 - 1) }],
  }));
  app.get("/v1/models", (_req: any, res: any) => res.json({
    data: [
      { id: "meta-llama/Llama-3-70b-chat-hf", type: "chat", context_length: 8192 },
      { id: "mistralai/Mixtral-8x22B-Instruct-v0.1", type: "chat", context_length: 65536 },
      { id: "togethercomputer/m2-bert-80M-8k-retrieval", type: "embedding", context_length: 8192 },
    ],
  }));
}

Testing Workflow

npm run dev:mock &                    # Start mock server in background
npm run test                          # Unit tests with vitest
npm run test -- --watch               # Watch mode for rapid iteration
MOCK_MODE=false npm run test:integration  # Integration test against real API

Debug Tips

Together AI is OpenAI-compatible — set base_url to http://localhost:3009/v1 for local dev
Use /v1/models to discover available model IDs instead of hardcoding them
Monitor usage.total_tokens in responses to estimate costs before switching to live mode
Batch inference (/v1/batch) runs at 50% cost but is async — poll for completion
Set max_tokens explicitly to avoid unexpectedly large responses and costs

Error Handling

Issue	Cause	Fix
`401 Unauthorized`	Invalid API key	Regenerate at api.together.xyz dashboard
`404 Model not found`	Wrong model ID string	Use `client.models.list()` to verify
`429 Rate Limited`	Too many requests per minute	Implement exponential backoff
`500 Server Error`	Model overloaded or cold start	Retry with backoff after 5s
`ECONNREFUSED :3009`	Dev server not running	Run `npm run dev:mock` first

Resources

Next Steps

See together-debug-bundle.

Info

Category Development

Name together-local-dev-loop

Version v20260423

Size 4.42KB

Source jeremylongshore/claude-code-plugins-plus-skills

Updated At 2026-04-28