The sed gotchas that bite in production: GNU vs BSD differences (`-i` syntax, `-E` support, `\b` …
Sed for Log Analysis: Extract Errors, Filter by Time, Find Patterns Sed for Log Analysis: Extract Errors, Filter by Time, Find Patterns

Summary
Late one night I was staring at a massive log file from a payment processing node. A new billing service had just rolled out, and a small percentage of transactions were silently failing. The logs were on a bastion host with no Splunk, no Datadog forwarder, and a strict “no installing tools” policy. Just the GNU coreutils that shipped with the AMI.
I had grep. I had awk. And I had sed. By morning I had the root cause — a mis-configured retry loop hitting a stale DNS cache — and the fix was deployed. Every pattern in this article is one I used that night, or one I refined in the weeks after when I rebuilt the same workflow as a reusable script.
The thing nobody tells you about log analysis is that grep runs out of road quickly. As soon as your problem touches a multi-line stack trace, a time window, or PII redaction before sharing logs with a vendor, you need stream editing — and sed is the tool that’s already there. This is a companion to the Sed Cheat Sheet: 30 One-Liners from Real Production Logs, but focused entirely on log work.
1. Extract Errors from Mixed-Severity Logs
The simplest log task: pull only the ERROR lines from a file that mixes INFO, WARN, DEBUG, and ERROR.
The naive approach is grep ERROR app.log. That works until you realize sed gives you something grep can’t: combine extraction with edits in a single pass. No re-piping, no buffering twice.
Input (app.log):
2024-10-22 14:02:11 INFO request_id=ab12 status=200 path=/health
2024-10-22 14:02:14 ERROR request_id=cd34 db connection refused (host=db-primary.local)
2024-10-22 14:02:14 INFO heartbeat ok
2024-10-22 14:02:18 WARN request_id=ef56 slow query 1240ms
2024-10-22 14:02:22 ERROR heartbeat probe failed (host=svc-b.local)
2024-10-22 14:02:25 ERROR request_id=gh78 payment gateway 502
Command:
sed -n '/ERROR/p' app.log
Output:
2024-10-22 14:02:14 ERROR request_id=cd34 db connection refused (host=db-primary.local)
2024-10-22 14:02:22 ERROR heartbeat probe failed (host=svc-b.local)
2024-10-22 14:02:25 ERROR request_id=gh78 payment gateway 502
The -n suppresses default printing, and /ERROR/p prints only matching lines. Now the bug: heartbeat probe failures are noise. A flaky internal health check fires constantly and drowns out the real signal.
Fixed command (exclude noisy errors):
sed -n '/ERROR/{/heartbeat/!p;}' app.log
The {...} groups commands. Inside, /heartbeat/!p means “if the line does not match heartbeat, print it.” So you get ERROR lines that aren’t heartbeat noise.
Output:
2024-10-22 14:02:14 ERROR request_id=cd34 db connection refused (host=db-primary.local)
2024-10-22 14:02:25 ERROR request_id=gh78 payment gateway 502
For incident triage, I usually want a count of distinct errors per minute. Sed pairs with cut and uniq to do this without awk:
sed -n '/ERROR/{/heartbeat/!p;}' app.log \
| cut -c1-16 \
| uniq -c \
| sort -rn \
| head -20
That gives you the top 20 minutes by error volume, which is usually the first thing on-call wants to see.
Use this when: you need extraction plus inline filtering in one pass, or when grep’s -v would chain awkwardly with other transforms.
Expand your knowledge with Docker Log Management: From docker logs to a Go Log Collector
2. Filter Logs by Time Range
The incident I opened with started in the small hours of the morning. I didn’t need the entire log — I needed a narrow window around the failure.
Sed handles this with an address range using two regex patterns. The first match opens the range, the second closes it. Everything between (inclusive) prints.
Command:
sed -n '/2024-10-22 02:00/,/2024-10-22 02:20/p' app.log > window.log
Input (excerpt):
2024-10-22 01:58:42 INFO cron job_id=4101 done
2024-10-22 02:00:11 INFO request_id=zz01 status=200
2024-10-22 02:08:14 ERROR request_id=zz77 retry budget exhausted
2024-10-22 02:14:02 ERROR request_id=zz92 dns lookup failed: payments.local
2024-10-22 02:20:00 INFO cron job_id=4102 start
2024-10-22 02:25:11 INFO request_id=ab44 status=200
Output (window.log):
2024-10-22 02:00:11 INFO request_id=zz01 status=200
2024-10-22 02:08:14 ERROR request_id=zz77 retry budget exhausted
2024-10-22 02:14:02 ERROR request_id=zz92 dns lookup failed: payments.local
2024-10-22 02:20:00 INFO cron job_id=4102 start
The first pattern that matches opens the range; the first match after that closes it. If the closing pattern never matches, sed reads to EOF — useful when you want “everything since 02:00” and don’t have a known end.
The bug I hit on this pattern: timestamp formats vary across services. The billing API logged ISO 8601 with millisecond precision (2024-10-22T02:08:14.331Z), the legacy worker logged syslog format (Oct 22 02:08:14), and Kubernetes events used RFC3339 with timezone (2024-10-22T02:08:14-06:00). A naive range pattern matches one and misses the others.
For mixed sources, anchor on the time portion only and accept any prefix:
sed -n '/02:0[0-9]:/,/02:2[0-9]:/p' merged.log
The character classes 0[0-9] and 2[0-9] match minutes 00-09 and 20-29 respectively. Less precise, but resilient when you can’t normalize timestamps upstream.
For live tailing within a known window, pipe tail -f into sed but use line buffering or you’ll wait forever for the buffer to flush:
tail -f app.log | stdbuf -oL sed -n '/14:00/,/15:00/p'
stdbuf -oL forces line-buffered output so each match prints immediately. Without it, sed buffers before flushing — which means in a slow log stream, your “live” filter is anything but.
Use this when: you have a known incident window and want to scope analysis to it, or when a downstream tool (less, awk, jq) would otherwise have to scan the entire file.
Deepen your understanding in Docker Log Management: From docker logs to a Go Log Collector
3. Parse Multi-Line Stack Traces
This is where sed earns its keep. A Java exception spans many lines. A Python traceback spans several. Grep treats every line independently, so when you grep for NullPointerException, you get the message line — but not the stack frames you need to see.
The classic approach is sed’s N command, which appends the next line to the pattern space, joined by an embedded newline. With a pattern that loops N while the next line looks like a continuation, you can collapse a whole stack trace onto one line.
Input (error.log, Java):
2024-10-22 02:14:14 ERROR request_id=cd34 NullPointerException at PaymentService
at com.example.PaymentService.charge(PaymentService.java:142)
at com.example.BillingHandler.process(BillingHandler.java:88)
at com.example.RequestRouter.dispatch(RequestRouter.java:201)
Caused by: java.lang.IllegalStateException: connection pool exhausted
at com.example.db.Pool.acquire(Pool.java:55)
2024-10-22 02:14:15 INFO request_id=ef56 status=200
Command:
sed ':a;N;/^[[:space:]]*at \|^Caused by:/{ba};s/\n/ | /g' error.log
Breaking this down:
:adefines a label calledaNappends the next line/^[[:space:]]*at \|^Caused by:/{ba}says “if the new line starts with whitespace+atORCaused by:, branch back to labela”s/\n/ | /greplaces embedded newlines with|separators
I use [[:space:]] here instead of \s because \s is a GNU extension. On macOS or Alpine busybox the script will silently fail to match. The portable POSIX class works on every sed.
Output:
2024-10-22 02:14:14 ERROR request_id=cd34 NullPointerException at PaymentService | at com.example.PaymentService.charge(PaymentService.java:142) | at com.example.BillingHandler.process(BillingHandler.java:88) | at com.example.RequestRouter.dispatch(RequestRouter.java:201) | Caused by: java.lang.IllegalStateException: connection pool exhausted | at com.example.db.Pool.acquire(Pool.java:55)
2024-10-22 02:14:15 INFO request_id=ef56 status=200
Now grep NullPointerException returns the whole stack trace as one line. Pipe it into log aggregators or feed it to sort | uniq -c to find duplicate exceptions.
For Python tracebacks, the continuation pattern is different. Python uses Traceback (most recent call last): followed by indented File "..." lines:
Input (Python):
2024-10-22 02:14:14 ERROR request_id=cd34
Traceback (most recent call last):
File "/srv/app/billing.py", line 142, in charge
response = api.post(url, json=payload)
File "/srv/app/api.py", line 88, in post
raise ConnectionError("dns lookup failed")
ConnectionError: dns lookup failed
2024-10-22 02:14:15 INFO request_id=ef56 status=200
Command:
sed ':a;N;/^Traceback\|^ File\|^ /{ba};s/\n/ | /g' error.log
This loops N while the appended line begins with Traceback, two-space-File, or four-space indent (the actual source line). The output collapses the whole traceback onto one line, ready for grep or aggregation.
For deeper coverage of these patterns including hold-space tricks, see Sed Multiline Patterns: How to Match Across Lines.
Use this when: you need to grep, count, or alert on exceptions and the message line alone isn’t enough — you need the frames too.
Explore this further in Sed Multiline Patterns: How to Match Across Lines
4. Anonymize PII Before Sharing Logs
When you handle real payment data, you periodically have to share log excerpts with a vendor for debugging a third-party SDK issue. Vendor support contracts are not the place to leak customer email addresses, card numbers, or source IPs.
The pattern: sed substitution with capture groups, applied as a redaction pass before the file ever leaves your laptop.
Input (raw.log):
2024-10-22 14:02:14 INFO user=alice.smith@acme.example.com login from 142.103.18.55
2024-10-22 14:02:18 INFO charge attempted card=4532-1488-0343-6467 amount=$129.00
2024-10-22 14:02:22 ERROR user=bob_42@gmail.com 3ds verification failed from 99.21.4.211
Email redaction:
sed -E 's/[A-Za-z0-9._+-]+@([A-Za-z0-9.-]+)/***@\1/g' raw.log
The capture group \1 keeps the domain (useful for support: they can still see “the bug only hits gmail addresses”) while erasing the local part.
Credit card redaction (16-digit, dashed or spaced):
sed -E 's/([0-9]{4})[ -]?([0-9]{4})[ -]?([0-9]{4})[ -]?([0-9]{4})/XXXX-XXXX-XXXX-\4/g' raw.log
PCI-DSS allows the last four digits to remain; this preserves them for support correlation while masking the rest.
IP redaction (keep first two octets for ASN/region context):
sed -E 's/([0-9]{1,3}\.[0-9]{1,3})\.[0-9]{1,3}\.[0-9]{1,3}/\1.x.x/g' raw.log
Combined into a single redaction pass:
sed -E -e 's/[A-Za-z0-9._+-]+@([A-Za-z0-9.-]+)/***@\1/g' \
-e 's/([0-9]{4})[ -]?([0-9]{4})[ -]?([0-9]{4})[ -]?([0-9]{4})/XXXX-XXXX-XXXX-\4/g' \
-e 's/([0-9]{1,3}\.[0-9]{1,3})\.[0-9]{1,3}\.[0-9]{1,3}/\1.x.x/g' \
raw.log > redacted.log
Output:
2024-10-22 14:02:14 INFO user=***@acme.example.com login from 142.103.x.x
2024-10-22 14:02:18 INFO charge attempted card=XXXX-XXXX-XXXX-6467 amount=$129.00
2024-10-22 14:02:22 ERROR user=***@gmail.com 3ds verification failed from 99.21.x.x
One portability gotcha: BSD sed (the macOS default) doesn’t always handle -E the same way GNU sed does, and word boundaries (\b) are inconsistent. If you’re sharing redaction scripts across a team that mixes Linux and macOS, run them with --posix on GNU sed to catch compatibility issues early:
sed --posix -E -e '...' raw.log
For pipelines that must work on both, use [[:space:]] instead of \s, avoid \b, and stick to BRE (basic regex) when possible.
Use this when: you’re about to attach logs to a support ticket, paste them into a Slack channel, or store them in a system with broader access than the source.
Discover related concepts in Docker Log Management: From docker logs to a Go Log Collector
5. Pre-Clean Logs for grep/awk Pipelines
A real chunk of “my pipeline doesn’t work” bugs come down to two things: ANSI color codes from a tool that thought it was talking to a terminal, and inconsistent whitespace.
When you tail a Docker container’s logs through a CI runner, then pipe them to a log aggregator, you’ll often see this:
Input (ci.log, with hidden ANSI codes):
2024-10-22 14:02:14 \x1b[31mERROR\x1b[0m build failed
2024-10-22 14:02:15 \x1b[33mWARN\x1b[0m deprecation: foo()
2024-10-22 14:02:16 INFO step complete
The escape sequences (shown here as \x1b[31m for red) are invisible in your terminal but real bytes in the file. When you pipe to grep, your “exact match” on ERROR works. When you pipe to awk and split on whitespace, the third field is \x1b[31mERROR\x1b[0m instead of ERROR, and your downstream conditions silently never match.
ANSI strip:
sed 's/\x1b\[[0-9;]*m//g' ci.log
\x1b is the escape character (ESC, 0x1B), followed by [, then any number of digits and semicolons, then m. That’s the SGR (Select Graphic Rendition) escape sequence. The g flag handles multiple per line.
Trim leading and trailing whitespace:
sed -E 's/^[[:space:]]+//;s/[[:space:]]+$//' ci.log
Two substitutions in one sed call, separated by ;. The first strips leading whitespace; the second strips trailing.
Normalize tabs to single spaces (so awk’s default FS works predictably):
sed 's/\t/ /g' ci.log
The reusable normalizer I keep in ~/bin/log-normalize.sed:
# log-normalize.sed - clean log streams before pipelining
s/\x1b\[[0-9;]*m//g
s/\r$//
s/^[[:space:]]+//
s/[[:space:]]+$//
s/\t/ /g
s/ +/ /g
Used as:
kubectl logs deploy/billing-api --tail=10000 \
| sed -E -f ~/bin/log-normalize.sed \
| grep ERROR \
| awk '{print $3, $4}'
Now grep and awk work on a known, clean stream. The afternoon I spent writing this normalizer paid back many times over in “why is my pipeline returning empty?” debugging sessions saved.
For a deeper look at when sed beats grep and when it doesn’t, see Sed vs Awk vs Grep. The short version: if you need to transform before filtering, sed comes first.
Use this when: your downstream tool gives empty or inconsistent results despite the input “looking right” in your terminal.
Uncover more details in Mastering NGINX Logs: A Detailed Guide to Configuration and Analysis
Real Story: The Late-Night Pipeline
The incident I opened with: a large log set across multiple services, the failure window was unknown but the customer-impact window was a narrow band of minutes, and the bug only hit a small fraction of transactions.
Here’s the pipeline I built that night, with patterns 1-5 stacked:
# Step 1: scope to the customer-impact window
sed -n '/2024-10-22 02:0[0-9]:/,/2024-10-22 02:3[0-9]:/p' app.log > window.log
# Step 2: collapse Java stack traces to one line each
sed ':a;N;/^[[:space:]]*at \|^Caused by:/{ba};s/\n/ | /g' window.log > collapsed.log
# Step 3: extract errors, drop heartbeat noise
sed -n '/ERROR/{/heartbeat/!p;}' collapsed.log > errors.log
# Step 4: redact PII before sharing with the upstream API vendor
sed -E -e 's/[A-Za-z0-9._+-]+@([A-Za-z0-9.-]+)/***@\1/g' \
-e 's/([0-9]{1,3}\.[0-9]{1,3})\.[0-9]{1,3}\.[0-9]{1,3}/\1.x.x/g' \
errors.log > shareable.log
# Step 5: count distinct exception types
grep -oE 'Caused by: [A-Za-z.]+Exception' shareable.log \
| sort | uniq -c | sort -rn
That last sort | uniq -c produced output dominated by a single exception type — java.net.UnknownHostException — clustered in the impact window, against a hostname that should have been cached. That was the breadcrumb. The retry loop in the new billing service was bypassing the system resolver and hammering an upstream DNS server that had rate-limited us.
A large log set scoped down to one number that pointed straight at the bug. From sed -n '/.../,/.../p' to root cause in a fraction of the time line-by-line grepping would have taken. If I’d been grepping manually, I’d still have been there at sunrise.
This kind of work is also why getting these patterns right in Sed in CI/CD Pipelines: Safe Patterns matters — the line between “useful log filter” and “accidental data corruption” is one missing escape character.
Journey deeper into this topic with Boto3 + AWS Lambda: A Production Serverless Pipeline
Related
- Sed Cheat Sheet: 30 One-Liners from Real Production Logs — the pillar reference with all 30 patterns indexed by use case
- Sed Multiline Patterns: How to Match Across Lines — deeper coverage of
N,P,D, and hold space for stack traces - Sed in CI/CD Pipelines: Safe Patterns — when in-place sed edits are safe and when they corrupt your repo
- Sed vs Awk vs Grep — picking the right tool when your pipeline could use any of them
Similar Articles
Related Content
More from devops
How to use sed safely in CI/CD pipelines: idempotent edits, exit-code checks, dry-run patterns, and …
Sed multiline patterns explained: the hold space, the N/D/P commands, address ranges, and how to …
You Might Also Like
No related topic suggestions found.

