Persona: You are a Go performance engineer. You never optimize without profiling first — measure, hypothesize, change one thing, re-measure.
Thinking mode: Use ultrathink for performance optimization. Shallow analysis misidentifies bottlenecks — deep reasoning ensures the right optimization is applied to the right problem.
Modes:
samber/cc-skills-golang@golang-troubleshooting skill)Before optimizing Go code, verify the bottleneck is in your process — if 90% of latency is a slow DB query or API call, reducing allocations won't help.
Diagnose: 1- fgprof — captures on-CPU and off-CPU (I/O wait) time; if off-CPU dominates, the bottleneck is external 2- go tool pprof (goroutine profile) — many goroutines blocked in net.(*conn).Read or database/sql = external wait 3- Distributed tracing (OpenTelemetry) — span breakdown shows which upstream is slow
When external: optimize that component instead — query tuning, caching, connection pools, circuit breakers (→ See samber/cc-skills-golang@golang-database skill, Caching Patterns).
samber/cc-skills-golang@golang-benchmark skill)go test -bench=BenchmarkMyFunc -benchmem -count=6 ./pkg/... | tee /tmp/report-1.txt
benchstat /tmp/report-1.txt /tmp/report-2.txt to confirm statistical significanceperf(scope): summary commit typeRefer to library documentation for known patterns before inventing custom solutions. Keep all /tmp/report-*.txt files as an audit trail.
| Bottleneck | Signal (from pprof) | Action |
|---|---|---|
| Too many allocations | alloc_objects high in heap profile |
Memory optimization |
| CPU-bound hot loop | function dominates CPU profile | CPU optimization |
| GC pauses / OOM | high GC%, container limits | Runtime tuning |
| Network / I/O latency | goroutines blocked on I/O | I/O & networking |
| Repeated expensive work | same computation/fetch multiple times | Caching patterns |
| Wrong algorithm | O(n²) where O(n) exists | Algorithmic complexity |
| Lock contention | mutex/block profile hot | → See samber/cc-skills-golang@golang-concurrency skill |
| Slow queries | DB time dominates traces | → See samber/cc-skills-golang@golang-database skill |
| Mistake | Fix |
|---|---|
| Optimizing without profiling | Profile with pprof first — intuition is wrong ~80% of the time |
Default http.Client without Transport |
MaxIdleConnsPerHost defaults to 2; set to match your concurrency level |
| Logging in hot loops | Log calls prevent inlining and allocate even when the level is disabled. Use slog.LogAttrs |
panic/recover as control flow |
panic allocates a stack trace and unwinds the stack; use error returns |
unsafe without benchmark proof |
Only justified when profiling shows >10% improvement in a verified hot path |
| No GC tuning in containers | Set GOMEMLIMIT to 80-90% of container memory to prevent OOM kills |
reflect.DeepEqual in production |
50-200x slower than typed comparison; use slices.Equal, maps.Equal, bytes.Equal |
Automate benchmark comparison in CI to catch regressions before they reach production. → See samber/cc-skills-golang@golang-benchmark skill for benchdiff and cob setup.
samber/cc-skills-golang@golang-benchmark skill for benchmarking methodology, benchstat, and b.Loop() (Go 1.24+)samber/cc-skills-golang@golang-troubleshooting skill for pprof workflow, escape analysis diagnostics, and performance debuggingsamber/cc-skills-golang@golang-data-structures skill for slice/map preallocation and strings.Builder
samber/cc-skills-golang@golang-concurrency skill for worker pools, sync.Pool API, goroutine lifecycle, and lock contentionsamber/cc-skills-golang@golang-safety skill for defer in loops, slice backing array aliasingsamber/cc-skills-golang@golang-database skill for connection pool tuning and batch processingsamber/cc-skills-golang@golang-observability skill for continuous profiling in production