An engineer wires OpenTelemetry expecting to see prompts and responses in
Honeycomb. The traces land — but only timing, model name, and token counts
appear. The prompt body is blank. This is not a bug: it's the OTEL GenAI
semantic-conventions privacy-safe default (P27), where
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT is off. The instinct is to
flip it on and move on. On a multi-tenant workload that flip is a leak — the
next engineer to search traces for Tenant A sees Tenant B's PII in the results,
because redaction was supposed to happen upstream and never did.
A second trap lives inside LangGraph. A BaseCallbackHandler attached to the
parent runnable never fires on inner agent tool calls, because LangGraph
creates a child runtime per subgraph and callbacks do not inherit (P28). Spans
inside subgraphs appear orphaned in the waterfall — or they do not appear at
all — and SLO dashboards under-count latency on the exact calls that matter
most: the nested agent loops.
This skill wires LangChain 1.0 / LangGraph 1.0 into an OTEL-native backend
(Jaeger, Honeycomb, Grafana Tempo, Datadog) with a correct content-capture
policy, subgraph-aware span propagation, and five LLM-specific SLOs (p95 / p99
latency, error rate, cost-per-request, TTFT) with burn-rate alerts. Pin:
langchain-core 1.0.x, langgraph 1.0.x,
opentelemetry-instrumentation-langchain >= 0.33, OTEL GenAI semconv as of
2026-04. Pain-catalog anchors: P27, P28 (and cross-references P04, P34, P37).
langchain-core >= 1.0, < 2.0, langgraph >= 1.0, < 2.0
langchain-security-basics and langchain-middleware-patterns)OTLP_ENDPOINT, API keys)pip install \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp-proto-http \
"opentelemetry-instrumentation-langchain>=0.33"
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.langchain import LangchainInstrumentor
resource = Resource.create({
"service.name": "my-langchain-app",
"service.version": "1.0.0",
"deployment.environment": os.getenv("ENV", "dev"),
})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(
OTLPSpanExporter(
endpoint=os.environ["OTLP_ENDPOINT"], # per-backend; see matrix
headers=_parse_headers(os.getenv("OTLP_HEADERS", "")),
),
max_queue_size=2048, # spans buffered before drop; raise for high volume
max_export_batch_size=512, # batched export keeps per-span overhead under 1ms
))
trace.set_tracer_provider(provider)
LangchainInstrumentor().instrument() # emits gen_ai.* attrs on every run
BatchSpanProcessor keeps per-span overhead well under 1 ms. Use
SimpleSpanProcessor only in local dev — it blocks the call path per span.
Per-backend OTLP_ENDPOINT and header config lives in
Backend Setup Matrix — Jaeger,
Honeycomb, Grafana Tempo, Datadog.
Trigger one call and inspect what landed in the backend. LangChain 1.0 emits
these gen_ai.* attributes natively on every chat-model span:
| Attribute | Example |
|---|---|
gen_ai.system |
anthropic |
gen_ai.request.model |
claude-sonnet-4-6 |
gen_ai.request.temperature |
0.0 |
gen_ai.usage.input_tokens |
1234 |
gen_ai.usage.output_tokens |
567 |
gen_ai.response.finish_reasons |
["stop"] |
Missing anything? Likely a stale instrumentor version or an outdated provider
package. The full emitted-vs-custom matrix plus LangGraph's span taxonomy
(LangGraph.invoke → LangGraph.node.* → LangGraph.subgraph.*) is in
GenAI Semantic Conventions.
The engineer's instinct is to flip the capture flag to see prompts. Before flipping it, classify the workload into one of these buckets:
| Workload | Flag | Notes |
|---|---|---|
| Dev / staging with synthetic inputs | true |
Fine. Do not copy these traces to prod. |
| Single-tenant internal tool | true |
Fine if RBAC on backend is tight. |
| Single-tenant product, signed compliance artifacts | true |
BAA / DPIA in place; retention policy matches log retention. |
| Multi-tenant SaaS, no upstream redaction | false |
Hard no. Fix redaction first. |
| Multi-tenant SaaS, with upstream redaction | true |
Safe — the span sees the already-redacted text. |
| Healthcare / finance / legal without legal sign-off | false |
Hard no. |
# trusted single-tenant ONLY
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
export TRACELOOP_TRACE_CONTENT=true # OpenLLMetry alias; set both to be safe
Leave unset (default) anywhere else. To capture bodies in a multi-tenant
system, wire redaction middleware upstream of the model call first — see
Prompt Content Policy and cross-reference
pack siblings langchain-security-basics (PII redaction middleware pattern,
P34) and langchain-middleware-patterns (middleware order: redact → cache →
model, P24). Failure pattern P27 — prompts missing from traces because
capture was never opted in — is the #1 first-day OTEL complaint; make the
decision explicit instead of surprise-flipping the flag in prod.
LangGraph creates a child runtime per subgraph. Callbacks bound at the parent definition time do not inherit:
# WRONG — subagent spans orphaned or missing (P28)
agent = create_react_agent(model=llm, tools=tools).with_config(
callbacks=[my_handler] # bound at definition time; children do not see it
)
agent.invoke({"messages": [...]})
# RIGHT — pass callbacks at invocation via config; they propagate down
agent.invoke(
{"messages": [...]},
config={"callbacks": [my_handler]} # invocation-time; inherited by children
)
The same rule applies to custom attribute handlers (e.g. the
CostAttributeHandler in the semantic-conventions reference that stamps
gen_ai.usage.cost_usd on each model span). Attach via
config["callbacks"], never via .with_config(). Failure pattern P28
symptom: SLO dashboards show low latency because the slow nested spans are
missing entirely, not because the nested calls are fast.
Five SLIs matter from day one. All five derive from gen_ai.* span attributes
— no second pipeline required:
| SLI | Target example | Why |
|---|---|---|
| p95 latency (top-level chat) | < 5 s for chat UI |
Provider variance dominates |
| p99 latency | < 15 s |
Tail matters on chat; agents with tools live here |
| Error rate | < 0.5% |
Includes 429s + finish_reason IN ("length","content_filter") |
| Cost per request (p95) | < $0.05 |
Catches haiku→opus regressions |
| TTFT p95 (streaming) | < 2 s |
Perceived latency, not total duration |
Concrete Honeycomb / PromQL / Datadog queries for each SLI, plus multi-window multi-burn-rate alerts (14.4× / 1h fast burn, 6× / 6h slow burn), are in LLM SLO Dashboards.
Defaults are wrong for two ends of the volume spectrum:
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
# Low/medium volume — keep everything for debuggability
# (< ~100 req/s) — SDK default 100% is fine
# High volume — head-sample, but carve out errors + slow spans via tail sampling
# at the OTEL Collector (see references/llm-slo-dashboards.md)
provider = TracerProvider(
resource=resource,
sampler=TraceIdRatioBased(0.10), # 10% head sample
)
Watch out: head sampling at 10% means 90% of p99 outliers are discarded
before they reach the backend — p99 metrics become noisy and biased toward
the median. For tail-latency SLOs, move sampling to a Collector with
tailsamplingprocessor so errors and slow spans (latency > 5000ms) are
always kept while the rest is probabilistically sampled at 10%. Typical trace
overhead with BatchSpanProcessor at the 512-span batch size: under 1 ms
per span; recommended sampling rate for high-volume production is 1-10%.
opentelemetry-instrumentation-langchain emitting gen_ai.* attrs on every
LangChain and LangGraph spanconfig["callbacks"] at invocation time so
subgraph spans nest correctly under their parent node| Symptom | Cause | Fix |
|---|---|---|
| Traces land but prompt and completion bodies are empty | OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT unset (P27 — privacy-safe default) |
Set to true only for the workload buckets in Step 3; for multi-tenant, wire upstream redaction first |
| Subgraph / tool-call spans orphaned or missing | Callbacks bound via .with_config() at definition time (P28) |
Pass via config["callbacks"] at invocation time so children inherit |
gen_ai.usage.cache_read_input_tokens resets every call |
Per-call usage, aggregation is your job (P04) | Custom callback summing across calls keyed by session.id; see langchain-model-inference |
| p99 dashboard looks noisy and median-biased | 10% head sampling drops outliers before backend | Move to Collector tailsamplingprocessor — always keep errors and latency > 5000ms |
| Traces never appear | OTLPSpanExporter endpoint wrong protocol (gRPC on 4317 vs HTTP on 4318) |
Verify with curl -v $OTLP_ENDPOINT; swap to the proto-grpc exporter package if your backend expects gRPC |
| Cost attribute missing from spans | LangChain 1.0 does not emit gen_ai.usage.cost_usd natively |
Add a BaseCallbackHandler that computes from tokens × pricing; see semantic-conventions reference |
PR review flags sk-... in trace attributes |
Secrets in prompts captured via gen_ai.prompt.content (P37-adjacent) |
Upstream redactor must strip API-key patterns before model call; audit via 0.1% sampler |
| Exporter dropping spans silently | Queue overflow at high volume | Increase max_queue_size to 4096+; add Collector between SDK and backend |
Spin up Jaeger in Docker, point the SDK at http://localhost:4318/v1/traces,
leave content capture on (it's dev, inputs are synthetic). You get a generic
span waterfall — no LLM-specific UX, but good for verifying the instrumentor
emits what you expect before paying for a SaaS backend.
See Backend Setup Matrix for the
docker run command and SDK config.
Honeycomb's BubbleUp over gen_ai.request.model, gen_ai.usage.input_tokens,
and tool call count is the fastest path from "p95 spiked at 14:00" to "one
specific tool took 20 s because the vectorstore was slow." Requires
content-capture-off by default so you can turn the team loose on search
without PII-leak worries.
See LLM SLO Dashboards for the exact Honeycomb query shape.
Register two BatchSpanProcessors — one to LangSmith's OTLP endpoint, one to
Tempo. Run both for two weeks, compare waterfalls, cut over. LangSmith handles
LLM-specific analytics; Tempo handles unified trace search across LLM and
non-LLM services in your Grafana stack.
See Backend Setup Matrix dual-export section.
langchain-security-basics (redaction, P34),
langchain-middleware-patterns (order: redact → cache → model, P24),
langchain-model-inference (cost callback pattern, P04)docs/pain-catalog.md — P27 (content-capture default),
P28 (subgraph callback propagation), P04 (cache token aggregation),
P34 (prompt injection), P37 (secrets in env / prompts)