Real sed patterns for log analysis: extract errors, filter time ranges, anonymize PII, parse …
Sed Gotchas: GNU vs BSD, In-Place Backup, and Safety Patterns Sed Gotchas: GNU vs BSD, In-Place Backup, and Safety Patterns

Summary
The script worked on my MacBook. It worked when the dev who wrote it ran it locally. It even worked the first few times in CI. Then a junior on the team cut a release late one night and the build failed with the kind of error that makes you wonder if you’ve forgotten how to read a stack trace:
sed: -i may not be used with stdin
sed: 1: "config.json": invalid command code c
The pipeline ran on Alpine. The dev’s machine ran macOS. My machine ran Ubuntu. Three different sed implementations, three different sets of rules, one script that pretended they were the same tool.
I’ve been doing DevOps in Calgary for years now, and this exact failure mode has burned me on multiple teams. The fix is never hard once you know the trick. Knowing the trick is the whole problem. Here are the differences that matter when your sed script runs across Linux, macOS, and busybox containers — and the patterns I now use by default so I don’t get paged for a regex difference at 2 AM.
GNU vs BSD vs busybox: Three Different Tools
When a tutorial says “sed does X,” the honest answer is “which sed?” The big three you’ll run into are GNU sed (most Linux distros), BSD sed (macOS, FreeBSD), and busybox sed (Alpine, embedded systems, scratch CI containers). They share a POSIX core and then disagree on most of the things you’d want to use.
| Feature | GNU (Linux) | BSD (macOS) | busybox (Alpine) |
|---|---|---|---|
-i (in-place) | -i | -i '' (mandatory empty arg) | -i |
-E (extended regex) | yes | yes (since 10.13) | partial |
\b word boundary | yes | no (use [[:<:]]/[[:>:]]) | no |
\d digit shortcut | no | no | no |
| Multi-letter flags | freely combinable | order-sensitive | varies |
Q (quit + exit code) | yes | no | no |
A few of these deserve a sentence each because they look minor and aren’t.
The -i difference is the headline. GNU treats -i as a flag that takes an optional suffix glued to it (-i.bak means “edit in place, save backup as .bak”, -i alone means “no backup”). BSD insists -i always takes a suffix as a separate argument, even when you want no backup, in which case you must pass an empty string: sed -i '' 's/x/y/' f. Forget the empty string on BSD and the next argument (your script body) gets interpreted as the backup suffix, which gives you the unhelpful error invalid command code because BSD sed then tries to parse your filename as a script.
The \b word boundary thing has bitten me more than any other regex difference. I had a deploy script that replaced release with prod only when it was a whole word: \brelease\b. Worked fine on Ubuntu CI. The first time someone ran it on a Mac to test locally, the substitution silently did nothing — \b is just two literal backslash-b characters in BSD sed. No error. No warning. Just a config file that didn’t get the change it needed.
The \d confusion is a JavaScript/PCRE thing leaking in. Nobody’s sed has it. Use [0-9] or [[:digit:]]. Always.
Expand your knowledge with Linux Automation Tools: From Cron to a Custom Go Runner
The Portability Patterns
These four patterns are what I write by default now. They’re slightly less elegant than the GNU-only versions and a lot more likely to still work next year on a platform I haven’t predicted.
1. The Universal -i Wrapper
The most useful trick I know here: pass a backup suffix to -i so the syntax works on both GNU and BSD. Then delete the backup after.
# Works on Linux, macOS, busybox
sed -i.bak 's/foo/bar/' config.conf && rm config.conf.bak
sed -i.bak is parsed identically by GNU and busybox (suffix glued to flag) and is acceptable to BSD (BSD only needs the suffix as a separate argument when it’s empty). The trailing rm removes the backup once the substitution succeeds. The && matters — if sed fails, we don’t delete the backup, because the original file may already be partially mangled.
The pure BSD form, when you genuinely want no backup, is:
# BSD-only
sed -i '' 's/foo/bar/' config.conf
That empty string '' is mandatory and is the difference between a working command and a screen full of invalid command code errors. I’ve never seen this form survive a port to GNU without breakage, so I avoid it in shared scripts and use the -i.bak-then-rm pattern instead.
2. Avoid \b — Use Bracket Expressions or Anchors
\b is GNU sed’s word-boundary escape. BSD doesn’t know about it. busybox doesn’t know about it. The closest thing BSD offers is [[:<:]] and [[:>:]] for left and right word boundaries, which look like they were designed by someone who lost a bet — and they don’t exist on GNU.
There’s no escape that means “word boundary on every sed.” So I just don’t use word boundaries. I rewrite around them.
# Brittle: GNU-only
sed 's/\brelease\b/prod/g' config.conf
# Portable: anchor with surrounding characters
sed 's/\([^a-zA-Z_]\)release\([^a-zA-Z_]\)/\1prod\2/g; s/^release\([^a-zA-Z_]\)/prod\1/' config.conf
# Often best: just use grep + xargs or use awk
awk '{
for (i=1; i<=NF; i++) if ($i == "release") $i = "prod"
print
}' config.conf
The portable sed version is ugly. That ugliness is a signal — when sed forces you to write capture groups around character classes that are basically negated word characters, you’re at the threshold where awk is the better tool. I’ll come back to this in the “When to Just Use Python” section, but for word boundaries, awk’s $i == "word" field comparison is the cleaner cross-platform answer.
3. Always Use -E Explicitly for Extended Regex
The default sed regex flavor is BRE (basic regular expressions), where +, ?, and {n,m} are literal characters and you have to escape them as \+, \?, \{n,m\} to get the quantifier meaning. ERE (extended) flips this — quantifiers work without backslashes, but you’d escape them to make them literal.
GNU sed will sometimes accept ERE-style patterns under BRE because it has compatibility extensions. BSD won’t. busybox is somewhere in between depending on the build.
# Ambiguous — works on GNU by accident
sed 's/[0-9]+/N/g' input.txt
# Explicit ERE — works everywhere it claims to support -E
sed -E 's/[0-9]+/N/g' input.txt
# Explicit BRE — also portable, but harder to read
sed 's/[0-9]\+/N/g' input.txt
I default to -E and write quantifiers without backslashes. The script is more readable and there’s less ambiguity about what flavor the regex engine is parsing. The one tradeoff: -E was originally a GNU extension that BSD adopted. On macOS 10.13 and later it’s there. On busybox it’s there in most modern builds. If you’re targeting truly ancient systems (Solaris, AIX older than a teenager), you’ll need to fall back to escaped BRE.
4. Escape & in Replacement Even When You Don’t Need To
In a sed substitution’s replacement string, & is shorthand for “the entire matched text.” Most of the time this doesn’t bite you because your replacement string doesn’t contain a literal ampersand. Then one day you’re rewriting a URL with query parameters and & shows up and your output is a mess.
# Replacement contains literal &, sed thinks "insert whole match"
echo 'foo' | sed 's/foo/A & B/'
# Output: A foo B ← probably not what you wanted
# Always escape & in replacements
echo 'foo' | sed 's/foo/A \& B/'
# Output: A & B
The cost of escaping & is one backslash. The cost of not escaping it is a config file with https://api.example.compath?q=1foopath?q=2 instead of https://api.example.com/path?q=1&path?q=2, found well after deploy when traffic flatlined. I escape every & in every replacement now, even when the surrounding text makes it obviously safe.
Deepen your understanding in Sed for Log Analysis: Extract Errors, Filter by Time, Find Patterns
In-Place Editing Safety
In-place editing is the single most dangerous mode of sed. It rewrites a file with no second chance. Here’s how I use it without losing sleep.
The Cardinal Rule: Never -i Without a Backup
This is the 2 AM story I owe you. I had a sed command meant to delete commented-out lines from an nginx config: sed -i '/^#/d' /etc/nginx/sites-enabled/api.conf. Looked harmless. Tested it on a sample file the day before. Worked fine.
The actual production file had a section header # === API endpoints === followed by a long stretch of location blocks. The header started with #. Got deleted. Fine, that’s just a comment. But the version of the file I tested against had blank lines separating sections. Production didn’t. Without the comment header acting as a visual anchor, when something broke at 2 AM I couldn’t tell which file had been the source of truth, because I’d run the command across the fleet in parallel and none of them had backups. Restoring took the rest of that morning, correlating ansible facts with config history.
The defense is one suffix:
# Always carry a parachute
sed -i.bak '/^#/d' /etc/nginx/sites-enabled/api.conf
Now api.conf.bak exists. If the substitution destroyed something, I mv api.conf.bak api.conf and I’m back to the previous state. Disk is cheap. The few extra characters it takes to write .bak are the cheapest insurance you’ll ever buy.
The only time I omit the backup is when the file is itself generated from a source of truth (like a Helm-rendered manifest about to be kubectl apply’d) and I literally don’t care about the previous version. Even then I usually keep it for the duration of the operation and clean up afterward.
Atomic In-Place via Temp + mv
Some shops disallow -i entirely, and they have a point. -i isn’t atomic on most platforms — sed writes the new content to a temp file, fsyncs, and moves it over the original. During that window a reader can see partial state, and on some filesystems the inode changes. If a tool is holding the file open by inode (a tail-following daemon, a config watcher), it loses track of the file.
The portable alternative is to do it yourself, explicitly:
sed 's/old/new/' config.conf > config.conf.tmp \
&& mv config.conf.tmp config.conf
The mv is atomic on the same filesystem. If sed fails, the && short-circuits and config.conf.tmp is left as evidence — the original is untouched. Add a trap 'rm -f config.conf.tmp' EXIT if you want cleanup on early exit.
The trade-off: this changes the file’s inode. If you have a process watching the original inode (some log aggregators, some filesystem-watch tools), it won’t notice the change. sed -i on GNU preserves the inode in the common case; this temp-and-mv pattern doesn’t. For config files this is almost always fine. For files that some daemon is actively mmaping, think harder.
The Read-Only Audit Pattern
The pattern I now use for any sed-driven change that touches more than a handful of files: build a wrapper that runs in dry-run mode by default and only mutates with an explicit flag.
#!/bin/bash
# fix-config.sh — replace stage URLs with prod across all configs
set -euo pipefail
APPLY=0
[ "${1:-}" = "--apply" ] && APPLY=1
for f in /etc/myapp/*.conf; do
if [ "$APPLY" -eq 1 ]; then
sed -i.bak 's|stage-api\.internal|api.internal|g' "$f"
else
diff <(cat "$f") <(sed 's|stage-api\.internal|api.internal|g' "$f") || true
fi
done
Run without --apply, you get a unified diff per file showing exactly what would change. Run with --apply, you mutate. Compose this with git in a config-as-code repo and you have a third undo layer (revert the commit). The cost is a handful of lines of bash. The benefit is that the script becomes safe to copy-paste into a Slack thread without anyone accidentally trashing prod.
Explore this further in Jenkins LTS vs Weekly: Which Version Should You Use?
The Top 10 Most Common Bugs
These are the ones I’ve debugged repeatedly enough that I now scan for them on sight.
Using
\dfor digits. Doesn’t exist in any sed. Use[0-9]or[[:digit:]].Forgetting BSD wants
-i ''. On macOS,sed -i 's/x/y/' fwill eat your script as the backup suffix and complain. Either pass-i ''(BSD-only) or-i.bak(universal).sed -i.bakwithout quoting filenames. If the filename has spaces or globs, the shell expands first and sed sees the wrong arguments. Always quote:sed -i.bak 's/x/y/' "$file".Using
&in replacement without escaping.&means “whole match.” ReplacefoowithMr. & Mrs. \1and you’ll getMr. foo Mrs. \1. Escape it:\&.Variable expansion with single quotes.
sed 's/x/$VAR/'writes the literal$VAR, not the value. Single quotes never expand. Use double quotes when you need substitution.Variable expansion with double quotes and special characters.
sed "s/x/$VAR/"looks fine until$VARcontains a/or&or\. Sanitize the variable first or switch delimiters:sed "s|x|$VAR|". For untrusted input, escape it explicitly with another sed pass.Forgetting
-Efor+,?,{n,m}. Without-E, those quantifiers are literal characters in basic regex.sed 's/[0-9]+/N/'matches the literal string+, not “one or more digits.” Either add-Eor escape:\+.Misreading the
gflag.gins/x/y/gmeans “all occurrences on each line,” not “all occurrences in the file.” Sed always processes the whole file;gonly changes per-line behavior. Withoutg, only the first match per line is replaced.Confusing address range
1,$with regex.sed '1,$s/x/y/'is “lines 1 through last, do substitution.”sed '/1$/,/end/s/x/y/'is “from a line ending in 1 to the next line containing ’end’, do substitution.” Different worlds. Read the address before the command.Multi-line edits without
N. Sed processes one line at a time by default. To match across two lines, you need to pull the next line into the pattern space withN. To match across many, you need a loop with labels (:a; N; $!ba). If you’re writing this loop, see the next section.Discover related concepts in Troubleshooting common EC2 issues
When to Just Use Python
The honest threshold for switching tools, calibrated by years of regretted sed scripts:
- Branching logic. If your transformation says “if line contains X then do Y, else do Z,” you’re past sed’s comfort zone. Python’s
if/elifis clearer than sed’sbandtbranches. - Multiple multi-line operations. One
N-loop is fine. Two stacked together with conditional substitutions is unmaintainable. Python or awk. - Anything you need to test. Sed scripts have no good unit-testing story. You can pipe inputs and diff outputs, but for non-trivial logic you want pytest, not bash + diff.
- Anything that no longer fits on a screen. This is the rule that has saved me the most time. If the sed script doesn’t fit on a screen, by tomorrow morning you won’t remember how it works. A few weeks from now nobody will. Rewrite it as a short Python script that reads top to bottom.
I love sed. I keep a cheat sheet of patterns from real production logs and use most of them regularly. But I’ve also rewritten enough clever sed pipelines as small Python scripts after the original author left to know that the cleverness has a half-life. Sed’s job is line-oriented substitution and deletion. The moment you’re reaching for hold space, branch labels, or multi-line slurps, you’re using sed as a programming language, and there are better programming languages.
Uncover more details in Boto3 + AWS Lambda: A Production Serverless Pipeline
Related
Similar Articles
Related Content
More from devops
How to use sed safely in CI/CD pipelines: idempotent edits, exit-code checks, dry-run patterns, and …
Sed multiline patterns explained: the hold space, the N/D/P commands, address ranges, and how to …
You Might Also Like
No related topic suggestions found.

