AgentCore Runtime Session Investigation

v20260609

agentcore-investigation

This guide provides a comprehensive framework for debugging and analyzing AgentCore runtime sessions using CloudWatch Logs Insights. It details advanced techniques like OpenTelemetry (OTEL) span correlation, resolving session IDs to trace IDs, and structuring complex query queries. Use this when investigating agent execution flows, diagnosing tool call failures, analyzing latency, or building detailed behavioral timelines.

CloudWatch Logging OpenTelemetry Tracing Debugging LLM AgentCore

Get Skill

426 downloads

Overview

AgentCore Runtime Session Investigation

Investigate AgentCore runtime sessions by querying CloudWatch Logs Insights, filtering OpenTelemetry noise, and producing structured investigation output.

Key capabilities:

Session-to-trace resolution via OTEL span correlation
Structured and glob-style parse queries for both dedicated and combined log groups
OpenTelemetry noise filtering with AgentCore-specific heuristics
Timeline construction with T+offset format
Error, tool invocation, token usage, and latency analysis

Reference Files

Load these files as needed for detailed guidance:

MCP:

mcp-setup.md

When: ALWAYS load before starting an investigation — ensures CloudWatch and Application Signals MCP servers are configured Contains: MCP server configuration for CloudWatch Logs and Application Signals, with setup instructions for Claude Code, Gemini, Codex, and Kiro CLI

.mcp.json

When: Load when setting up MCP servers for the first time Contains: Sample MCP configuration with both CloudWatch and Application Signals servers

otel-span-schema.md

When: ALWAYS load before querying or filtering OTEL spans Contains: Field extraction priorities, known instrumentation scopes, noise filtering heuristics (DROP/KEEP patterns)

Phase 0: SessionId-to-TraceId Resolution

When the user provides a sessionId, resolve it to traceId(s) first. If user provides traceId directly, skip this phase.

Discovery Query (structured fields)

fields traceId, @timestamp
| filter attributes.session.id = "SESSION_ID"
| stats count(*) as spanCount, min(@timestamp) as firstSeen, max(@timestamp) as lastSeen by traceId
| sort firstSeen asc

Discovery Query (combined log group — glob-style parse)

fields @timestamp, @message
| parse @message '"traceId":"*"' as traceId
| parse @message '"session.id":"*"' as sessionId
| filter sessionId = "SESSION_ID" or @message like "SESSION_ID"
| stats earliest(@timestamp) as firstSeen, latest(@timestamp) as lastSeen, count(*) as spanCount by traceId
| sort firstSeen asc
| limit 50

Latest Interaction Only

fields traceId
| filter attributes.session.id = "SESSION_ID"
| sort @timestamp desc
| limit 1

Store discovered traceId(s) and use them in ALL subsequent queries.

Phase 1: Discover Log Groups

Use describe_log_groups with logGroupNamePrefix /aws/bedrock-agentcore/runtimes to find all runtime log groups.

Log group naming patterns (in priority order):
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/otel-rt-logs (structured OTEL spans)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/[runtime-logs] (stdout/stderr)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>-DEFAULT (single combined group)

Log Group Layouts

AgentCore runtimes always emit OTEL spans. Some deployments split logs into a dedicated otel-rt-logs sub-group; others write everything into a single combined log group. Both are normal.

Log Group Layout	Query Strategy
Dedicated `otel-rt-logs` exists	Use structured field queries (`traceId`, `attributes.session.id`, etc.)
Single combined log group	Try structured fields first — if they return 0 results, use glob-style `parse @message`

If a dedicated otel-rt-logs group exists, prefer it for structured queries.

Parse Syntax Guidance

When using parse @message on combined log groups, prefer glob-style parse — it is simpler and avoids escaping issues:

| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"startTimeUnixNano":"*"' as startNano

Regex parse (/pattern/) is valid CloudWatch Logs Insights syntax but requires careful escaping of quotes and special characters inside JSON. If glob-style parse extracts the field you need, use it.

Phase 2: Query CloudWatch Logs Insights

Run all 6 query types for a complete investigation. Each query has a structured version (for dedicated otel-rt-logs) and a glob-style parse version (for combined log groups).

Query Size Limits

Every query MUST include | limit to prevent context window overflow:

Session overview: | limit 50
Span details: | limit 100
Errors: | limit 50
Tool invocations: | limit 100
Token usage: | limit 50
Latency outliers: | limit 20

Query 1: Session Overview

Structured:

fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
       attributes.session.id, attributes.gen_ai.operation.name, attributes.gen_ai.agent.name,
       startTimeUnixNano, endTimeUnixNano
| filter traceId = "TRACE_ID"
| sort startTimeUnixNano asc
| limit 50

Combined log group:

fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"spanId":"*"' as spanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50

Query 2: Span Details with Duration

Structured:

fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
       startTimeUnixNano, endTimeUnixNano,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs,
       status.code, attributes.gen_ai.operation.name
| filter traceId = "TRACE_ID"
| filter ispresent(startTimeUnixNano)
| sort startTimeUnixNano asc
| limit 100

Combined log group:

fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"spanId":"*"' as spanId
| parse @message '"parentSpanId":"*"' as parentSpanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100

Query 3: Errors

Structured:

fields @timestamp, traceId, spanId, name, status.code, status.message,
       attributes.error.message, attributes.exception.message, attributes.exception.type
| filter traceId = "TRACE_ID"
| filter status.code = 2 OR ispresent(attributes.error.message) OR ispresent(attributes.exception.message)
| sort @timestamp asc
| limit 50

Combined log group:

fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /ERROR|exception|Exception|fault|STATUS_CODE_ERROR/
| parse @message '"name":"*"' as spanName
| parse @message '"statusCode":"*"' as statusCode
| parse @message '"startTimeUnixNano":"*"' as startNano
| sort @timestamp asc
| limit 50

Query 4: Tool Invocations

Structured:

fields @timestamp, traceId, spanId, name, scope.name,
       attributes.gen_ai.operation.name, attributes.tool.name,
       startTimeUnixNano, endTimeUnixNano,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter attributes.gen_ai.operation.name = "execute_tool" OR ispresent(attributes.tool.name) OR name like /tool/
| sort startTimeUnixNano asc
| limit 100

Combined log group:

fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /tool|execute_tool|function_call/
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100

Query 5: Token Usage

Structured:

fields @timestamp, traceId, spanId, name,
       attributes.gen_ai.usage.input_tokens, attributes.gen_ai.usage.output_tokens,
       attributes.gen_ai.usage.total_tokens, attributes.gen_ai.agent.name
| filter traceId = "TRACE_ID"
| filter ispresent(attributes.gen_ai.usage.total_tokens)
| sort @timestamp asc
| limit 50

Combined log group:

fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /input_tokens|output_tokens|usage/
| parse @message '"name":"*"' as spanName
| parse @message '"gen_ai.usage.input_tokens"' as hasTokens
| sort @timestamp asc
| limit 50

Query 6: Latency Outliers

Structured:

fields @timestamp, traceId, spanId, name,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter ispresent(endTimeUnixNano)
| sort durationMs desc
| limit 20

Combined log group:

fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50

Queries are async — use get_logs_insight_query_results to poll until status is Complete.

Phase 3: Filter OTEL Noise

See otel-span-schema.md for extraction rules, known scopes, and DROP/KEEP heuristics.

After retrieving query results:

Count total results received
Remove entries matching DROP patterns (count removed)
Keep entries matching KEEP patterns
Log: "Filtered: {total} → {kept} spans ({removed} noise entries dropped)"

Phase 4: Build Timeline

Compute relative offsets from the earliest span's startTimeUnixNano:

[T+0ms]     Session started — traceId: abc123
[T+45ms]    LLM inference — model: anthropic.claude-v3 — 1,200ms
[T+1,250ms] Tool call: search_documents — 340ms
[T+1,600ms] Tool result: 3 documents found
[T+1,650ms] LLM inference — model: anthropic.claude-v3 — 890ms
[T+2,550ms] Response generated — 200 OK
[T+2,600ms] Session ended — total: 2,600ms

Error Handling

Situation	Action
No log groups found	Ask user for log group name or AWS region
Query returns 0 results	Widen time range to ±24h, retry. If still empty, try alternate ID fields
Session ID not found	Try filtering by requestId, invocationId, traceId variants
Query timeout	Use `cancel_logs_insight_query`, reduce time range, retry
Partial results	Note in output, suggest narrower time window
Structured field queries return 0 results	Switch to glob-style `parse @message` queries (see Parse Syntax Guidance)

Info

Category Artificial Intelligence

Name agentcore-investigation

Version v20260609

Size 7.29KB

Source awslabs/mcp

Updated At 2026-06-10