Log Investigation Techniques
Good log investigation is about narrowing the problem fast: identify time window, find the failing request, extract the stacktrace, remove noise, then group repeating patterns.
Core Workflow
- Start with the time window and service name.
- Search for
ERROR,Exception, request ID, transaction ID, username or endpoint. - Use context lines around the match to understand the sequence.
- Summarize duplicate failures to find the main issue.
- Correlate with service restarts, deployments, traffic spikes or database/network events.
Most Useful Commands
grep -n 'ERROR' app.log grep -n 'Exception' app.log grep -A 20 -B 5 'NullPointerException' app.log tail -f app.log | grep -E 'ERROR|WARN' sed -n '1200,1260p' app.log awk '/ERROR|WARN|Exception/' app.log zgrep -n 'ERROR' app.log.gz
Practical Patterns
1. Search huge logs safely
less app.log grep -n 'OutOfMemoryError' big.log
2. Show the stacktrace area
grep -n -A 30 -B 5 'Exception' app.log
3. Count repeated errors
grep 'ERROR' app.log | sort | uniq -c | sort -nr | head
4. Focus on one request or correlation ID
grep 'requestId=abc123' app.log
5. Investigate compressed archives
zgrep -n 'timeout' app.log.2.gz
6. See only today's lines
grep '2026-03-08' app.log
What to Look For
- First error before the cascade of follow-up errors
- Retry loops, timeouts and connection refused messages
- Memory or GC pressure before slow responses
- Restart events before or after the failure window
- Common thread names, request IDs or host names