Skip main navigation
/user/kayd @ devops :~$ cat sed-vs-awk-vs-grep.md

Sed vs Awk vs Grep: When to Use Which (with Decision Matrix) Sed vs Awk vs Grep: When to Use Which (with Decision Matrix)

QR Code linking to: Sed vs Awk vs Grep: When to Use Which (with Decision Matrix)
Karandeep Singh
Karandeep Singh
• 10 minutes

Summary

A practical comparison of sed, awk, and grep with a decision tree, performance observations on real log data, and ten side-by-side examples of the same task in all three tools.

I have spent years writing log-processing pipelines in Calgary, and the question I get most from junior engineers is always the same: “Should I use sed, awk, or grep here?” Their pipelines work. They are usually much longer than they need to be, and meaningfully slower.

I used to chain grep | sed | sed | awk for everything because I never sat down and learned which tool did which job. The day I processed a large log file and watched a sluggish pipeline get rewritten as a single awk one-liner that finished many times faster, I stopped guessing.

This is the cheat sheet I wish someone had handed me. A decision matrix, a three-line rule, ten head-to-head examples, and observations from real-sized files. No academic detours.

The Decision Matrix

Taskgrepsedawk
Find lines matching a pattern⚠️⚠️
Substitute text⚠️
Work with fields/columns
Multi-line patterns
Conditional logic⚠️
Stream editing in place⚠️
Stateful processing⚠️
Math / aggregation
Recursive search

A few things the table flattens:

  • grep is a filter. It reads lines and keeps the ones that match. It does not transform anything. The -r flag for recursive directory search is its single biggest superpower.
  • sed can find lines too, but using sed to do what grep does is wasted typing. Sed earns its keep on substitution and on stateful tricks like the hold space. The ⚠️ on multi-line means yes, sed can do it, but the syntax (:a;N;$!ba;) is not something you want to read at 2 AM.
  • awk is a programming language with a regex front door. It can do what grep and sed do. The cost is a slightly heavier startup and more characters to type for trivial tasks. The ⚠️ on substitute and stream-edit means awk has gsub and gensub, but you don’t want awk’s -i inplace quirks unless you already need awk for the rest of the job.

The Rule of Thumb

  • grep when you only need to find lines.
  • sed when you need to transform text but don’t care about fields.
  • awk when you need fields, math, or state across lines.

That is the whole rule. Tape it to your monitor. The next ten examples are just this rule applied.

Side-by-Side Examples

For each task: same input, three solutions where applicable, then I declare a winner.

1. Count error lines in a log file

# grep
grep -c ERROR app.log

# sed
sed -n '/ERROR/p' app.log | wc -l

# awk
awk '/ERROR/{c++} END{print c+0}' app.log

Winner: grep. The -c flag is built for counting and avoids spawning a second process. On large log files, grep finishes ahead of the sed pipeline (which adds a wc -l process) and noticeably ahead of awk for this kind of pure match-and-count. Stop reaching for awk when grep already has a flag for it.

2. Extract the 5th comma-separated field

# grep — not applicable

# sed
sed -E 's/^([^,]*,){4}([^,]*).*/\2/' data.csv

# awk
awk -F, '{print $5}' data.csv

Winner: awk, by a mile. This is what awk was built for. The sed version works but you cannot read it a week later. If the task involves a field number, awk is already the answer.

3. Find all lines starting with a timestamp

# grep
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' app.log

# sed
sed -n -E '/^[0-9]{4}-[0-9]{2}-[0-9]{2}/p' app.log

# awk
awk '/^[0-9]{4}-[0-9]{2}-[0-9]{2}/' app.log

Winner: grep. Pure filtering. None of the other tools offer anything grep does not, and grep is faster for this case (see the benchmarks).

4. Replace IP addresses with ***

# grep — not applicable, grep does not transform

# sed
sed -E 's/[0-9]{1,3}(\.[0-9]{1,3}){3}/***/g' app.log

# awk
awk '{gsub(/[0-9]{1,3}(\.[0-9]{1,3}){3}/, "***"); print}' app.log

Winner: sed. Substitution is sed’s primary job. Awk’s gsub works but you pay for awk’s record-splitting overhead on every line for no gain. For in-place anonymization across a directory of log files, sed -i is one flag away. With awk you are negotiating with gawk -i inplace.

5. Sum the bytes column in nginx access logs

The bytes field is column 10 in the standard combined log format.

# grep — not applicable
# sed — not applicable in any sane way

# awk
awk '{s += $10} END{print s}' access.log

Winner: awk. Don’t argue. This is the entire reason awk exists. Even on large access logs it finishes quickly because the field-splitter and the running sum are doing exactly one pass. Doing this in sed would require pulling out the hold space and writing line-by-line arithmetic — at which point you should just open Python.

6. Print only the line BEFORE a match

# grep
grep -B 1 ERROR app.log | grep -v ERROR | grep -v '^--$'

# sed
sed -n '/ERROR/{x;p;d;};h' app.log

# awk
awk '/ERROR/{print prev} {prev=$0}' app.log

Winner: awk. The grep version works using -B 1 (lines before) but you have to filter out the match itself and the -- separators grep adds. The sed version uses the hold space — clever, but write-only code. The awk version reads exactly like the English description: keep the previous line, when a match comes, print the previous line.

7. Print 3 lines after each match

# grep
grep -A 3 ERROR app.log

# sed
sed -n '/ERROR/{p;n;p;n;p;n;p;}' app.log

# awk
awk '/ERROR/{c=4} c-->0' app.log

Winner: grep. This is what -A was built for. The awk version is a fun party trick (decrement-then-test on c--) but for production code, the flag wins.

8. Convert a CSV to TSV

Assume no embedded commas inside quoted fields. (If you have embedded commas, you are out of bash territory anyway.)

# grep — not applicable

# sed
sed 's/,/\t/g' data.csv

# awk
awk -F, 'BEGIN{OFS="\t"} {$1=$1; print}' data.csv

Winner: sed. Pure character replacement, no field logic needed. The awk version is more “correct” in the sense that it goes through the field-splitting machinery, but it does more work for the same output. Use sed and move on.

9. Find duplicate lines within a 100-line window

# grep — not applicable

# sed — not really applicable; you would be reinventing awk inside the hold space

# awk
awk '{
  if (seen[$0] && NR - seen[$0] <= 100) print
  seen[$0] = NR
}' app.log

Winner: awk. Anything that requires “remember what you saw N lines ago” needs an associative array, and only awk gives you one. This is the “stateful processing” row in the matrix made concrete.

10. Print unique lines, preserving order

The classic sort -u discards order. sort | uniq does too. You want first-occurrence order.

# grep — not applicable

# sed — possible with hold space, but unreadable

# awk
awk '!seen[$0]++' app.log

Winner: awk. This one-liner shows up in roughly half my deployment scripts. Eight characters of payload, an associative array doing the bookkeeping, and the post-increment evaluating to 0 on first sight. If awk had a hall of fame, this is the entry.

Performance Observations

Across real log data — a mix of nginx access logs and Java application logs at production scale — the relative performance of these tools is consistent enough that you can plan for it. The exact wall-clock numbers shift with hardware, locale settings, and warm-vs-cold cache, but the ordering does not.

Substitution-heavy task: anonymize all IP addresses

sed -E 's/[0-9]{1,3}(\.[0-9]{1,3}){3}/***/g' app.log > /dev/null
awk '{gsub(/[0-9]{1,3}(\.[0-9]{1,3}){3}/, "***"); print}' app.log > /dev/null

Sed is meaningfully faster than awk on pure substitution. Awk pays for its record-splitting on every line even when you don’t use the fields. If all you’re doing is substitution, use sed.

Field extraction: pull the 7th column

grep -oE '"[A-Z]+ [^"]+"' access.log | wc -l    # different work, included for shape
awk '{print $7}' access.log > /dev/null

Awk crushes sed on field extraction because the field-splitter is C code with a single pass per line. A sed equivalent with a backreference regex does far more work for the same output.

Multi-line pattern: collect Java stack traces

Stack frames begin with \tat and follow an exception line. The job is to print the exception line plus all its frames.

sed -n '/Exception/,/^[^[:space:]]/p' app.log > /dev/null
awk '/Exception/{p=1} /^[^[:space:]]/{p=0} p' app.log > /dev/null

Awk tends to edge out sed here because the address-range trick in sed has to evaluate two regexes per line. Awk’s stateful flag pattern only evaluates one regex per state transition. For range extraction over many lines, awk takes the lead.

Pure search: find all 5xx responses

grep -E ' 5[0-9]{2} ' access.log | wc -l
awk '$9 ~ /^5[0-9]{2}$/' access.log | wc -l

Grep wins on pure search by a wide margin, which makes sense — grep does nothing else. For “is this string in this file” questions on large data, grep is the answer.

The shape of the results matters more than any exact timing: grep is fastest at finding, sed is fastest at substituting, and awk is fastest the moment fields or state enter the picture.

When Each Wins

grep wins on pure searching and recursive directory search. grep -r 'TODO' . across a Go monorepo is unbeatable — sed and awk have nothing comparable. It also wins on counting (grep -c), inverted search (grep -v), and showing context (-A, -B, -C). If your problem ends with “find lines that say X,” stop reading and use grep.

sed wins on stream editing where the unit is the line and not the field. The -i.bak flag is the cleanest in-place edit in bash — I have used it to rotate API keys across many gigabytes of logs in a single pass (see the sed cheat sheet for that war story). Sed also shines on the line-range form /start/,/end/ for extracting blocks. If the problem is “rewrite this text,” sed is the answer.

awk wins the moment you say “field,” “column,” “sum,” or “remember.” Field extraction ($5), aggregation ({s+=$10}), associative arrays (seen[$0]++), and stateful flags (/start/{p=1}) are awk’s territory. Anything where the input is roughly tabular and the output involves arithmetic or memory across lines belongs to awk. For me that covers a sizable share of all log-processing work in production.

When to Use a Real Language Instead

There is a point where one-liners stop earning their keep, and you need to spot it before you ship something nobody can debug.

My rule: when the bash one-liner crosses ~80 characters, or you start writing if/else branches inside awk, switch to Python or Go. Specific signals:

  • You are about to write a second BEGIN block in awk. Stop.
  • You are nesting sed -e more than three deep. Stop.
  • You need to handle JSON, XML, or YAML properly. Stop. Use jq, yq, or a real parser. (I have written about the emergency exception for sed and JSON, and the lesson is: emergency exception, not strategy.)
  • The same pipeline runs in production every day and someone other than you has to maintain it.

For production tooling at the Calgary shop, the rough split: bash one-liners for ad-hoc analysis at the terminal, Python for anything scheduled or shared, Go for anything that needs to scale or run as a long-lived service. The bash-to-Python jump usually pays for itself the first time you have to add error handling.

The thing nobody tells you: the most experienced engineers I work with reach for grep, sed, and awk less often than the juniors do. Not because the tools are bad, but because they spot the threshold sooner. They write the awk one-liner, see it grow past 80 characters, and rewrite it in twenty lines of Python before lunch. That is the move.

Similar Articles

More from devops

No related topic suggestions found.