Skip main navigation
/user/kayd @ devops :~$ cat sed-multiline-patterns.md

Sed Multiline Patterns: How to Match Across Lines Sed Multiline Patterns: How to Match Across Lines

QR Code linking to: Sed Multiline Patterns: How to Match Across Lines
Karandeep Singh
Karandeep Singh
• 12 minutes

Summary

A walkthrough of sed multiline editing — pattern space, hold space, N/D/P commands, address ranges — built up step by step with real DevOps examples (stack traces, YAML blocks, SQL).

The first time I hit the multiline wall was during a late-night payment-service incident in Calgary. Our log parser was choking on Java stack traces — a long stack trace of at com.example... frames per error, and grep ERROR only ever showed me the first line. I needed each exception collapsed onto a single line so our SIEM could ingest them.

I reached for sed the way I always do. The way every script in /usr/local/bin/ reaches for sed.

sed 's/ERROR.*/&/' app.log

That matches one line at a time. Sed reads a line, runs your script, prints, moves on. The newline between “ERROR:” and the first at frame is a wall. Plain s/// cannot see across it.

This article is about breaking that wall. Pattern space, hold space, the four commands you need, then five patterns I’ve used in production: Java stack traces, Kubernetes YAML blocks, SQL statements, paired log entries, and the grep -A trick in pure sed.

The Multi-Line Wall

Most sed tutorials show this:

echo "hello world" | sed 's/world/sed/'
# hello sed

One line in, one line out. The mental model is “search and replace, line by line.” That model is correct, but incomplete. Sed has two buffers, not one. It also has a tiny set of commands that let you stitch lines together, swap them, stash them, and replay them.

Real text doesn’t respect line boundaries:

  • Java stack traces span many lines
  • YAML blocks live under indented keys
  • SQL statements wrap across SELECT, FROM, WHERE, GROUP BY
  • Multi-line log entries with header + payload formats
  • Configuration files with start/end markers

The moment you try to grep for these, you get the first line and lose the context. Plain sed substitution doesn’t help either, because by the time s/// runs on line 2, line 1 has already been printed and forgotten.

The fix isn’t more regex. It’s understanding what sed does on every cycle.

Mental Model: Two Buffers

Every sed cycle has two buffers. If you remember nothing else from this article, remember these.

Pattern space is the working buffer. Sed reads a line from input, drops it into pattern space, runs your script against it, and (unless you suppressed output with -n) prints pattern space at the end. Then the cycle restarts.

Hold space is your stash. It starts empty. You put things in it explicitly with h, H, or x. You pull things out with g, G, or x. Sed never touches hold space on its own.

Here’s the data flow for a normal cycle:

input file:  line 1
             line 2     <-- next to read
             line 3

pattern space:  [ line 1 ]    <-- current cycle
hold space:     [ empty   ]

After s/// on line 1, pattern space gets printed, the buffer empties, and sed reads line 2. That’s the loop you’ve always been in.

Multiline work changes the loop. The N command reads the next input line and appends it to pattern space without starting a new cycle:

Before N:
  pattern space: [ line 1 ]
  next input:    line 2

After N:
  pattern space: [ line 1\nline 2 ]
  next input:    line 3

Now your s/// can match across that embedded \n. That’s the whole trick.

The hold/pattern dance for stashing context looks like this:

Start:      pattern: [ line 1 ]    hold: [ empty   ]
After h:    pattern: [ line 1 ]    hold: [ line 1  ]
Read line 2: pattern: [ line 2 ]    hold: [ line 1  ]
After G:    pattern: [ line 2\nline 1 ]   hold: [ line 1 ]

G appends hold to pattern with a \n between them. g overwrites pattern with hold. x swaps them. Five letters — h H g G x — and you can carry context across as many lines as you need.

The Four Multiline Commands

More than four sed commands deal with multiline work, but these are the ones you’ll reach for. Learn these and you’ll handle 95% of multiline tasks.

N — Append Next Line to Pattern Space

N reads the next input line and appends it to pattern space, separated by an embedded \n. The cycle does not restart, so any commands after N see the joined buffer.

Pattern space before N: [ ERROR: NPE ]
                          ^read next^
Pattern space after N:  [ ERROR: NPE\n    at Foo.java:42 ]

Simple example — collapse pairs of lines:

Input (pairs.txt):

header-1
value-1
header-2
value-2

Command:

sed 'N;s/\n/ = /' pairs.txt

Output:

header-1 = value-1
header-2 = value-2

N joined each pair into pattern space, then s/\n/ = / replaced the embedded newline with =. If N runs at the last line and there’s no next line to read, GNU sed quits the script silently — that’s usually what you want.

P — Print First Line of Pattern Space

P (capital) prints pattern space up to and including the first embedded \n. Compare with lowercase p, which prints the entire pattern space.

Pattern space: [ line A\nline B ]
After P:       prints "line A" (the part before \n)

This is mostly useful inside loops with N and D, where pattern space has accumulated multiple lines and you want to flush only the first one.

D — Delete First Line of Pattern Space, Restart

D (capital) deletes pattern space up to and including the first \n, then restarts the script without reading new input. Compare with lowercase d, which deletes everything and reads a fresh line.

Pattern space: [ line A\nline B ]
After D:       [ line B ]    <-- script restarts here

D is what makes the N;P;D loop work. You append a line with N, print the first half with P, drop the first half with D, and re-enter the script with the second half still in pattern space — ready to grow again with another N.

That triple is the canonical “sliding window” of two lines:

sed -n 'N;P;D' file.txt

This prints every line except the last (because the last N fails to find a next line and quits). It’s a building block more than a useful command on its own.

x — Swap Pattern and Hold Space

x exchanges the contents of pattern space and hold space. Both buffers always contain something, even if hold space is empty.

Before x:  pattern: [ current   ]   hold: [ stashed  ]
After x:   pattern: [ stashed   ]   hold: [ current  ]

A classic use: print the line before every match. You stash the previous line in hold space, and when the current line matches, you swap and print:

sed -n '/MATCH/{x;p;x;d};h' file.txt

This says: for matching lines, swap (now pattern has the previous line), print, swap back, delete; for non-matching lines, copy current to hold for next time.

That’s the toolkit. N to read forward, P to flush half, D to restart with leftovers, x to juggle context. Combined with the address ranges you already know (/start/,/end/), every multiline pattern below is a recombination of these.

Real Patterns

1. Collapse Java Stack Trace Into One Line

This is the original Calgary problem. Each Java exception spans many lines:

Input (error.log):

2026-05-01 09:15:32 ERROR NullPointerException
    at com.example.Foo.bar(Foo.java:42)
    at com.example.Baz.qux(Baz.java:17)
    at com.example.Main.run(Main.java:8)
2026-05-01 09:15:33 INFO request completed
2026-05-01 09:15:35 ERROR IllegalStateException
    at com.example.Cache.get(Cache.java:104)
    at com.example.Service.fetch(Service.java:55)

The goal: one line per error, stack frames joined by |.

Command:

sed ':a;N;$!ba;s/\n[[:space:]]\+at /  |  at /g' error.log

Breaking that down:

  • :a defines a label called a
  • N appends the next line to pattern space
  • $!ba says: if not the last line, branch back to label a
  • After the loop, pattern space holds the entire file
  • s/\n[[:space:]]\+at / | at /g replaces every newline-followed-by-whitespace-followed-by-at with | at

Output:

2026-05-01 09:15:32 ERROR NullPointerException  |  at com.example.Foo.bar(Foo.java:42)  |  at com.example.Baz.qux(Baz.java:17)  |  at com.example.Main.run(Main.java:8)
2026-05-01 09:15:33 INFO request completed
2026-05-01 09:15:35 ERROR IllegalStateException  |  at com.example.Cache.get(Cache.java:104)  |  at com.example.Service.fetch(Service.java:55)

The :a;N;$!ba idiom slurps the entire file into pattern space. After that, you can run any substitution as if newlines were ordinary characters. On a real-sized log it ran in seconds, not minutes — fine for batch jobs, painful for streaming. For streaming, use awk with a custom record separator.

2. Extract Multi-Line YAML Block

Kubernetes manifests are full of indented blocks you need to pull out for inspection. Common task: grab the containers: block from a Deployment without dragging along the rest.

Input (deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        image: registry/api:v1.4
        ports:
        - containerPort: 8080
      - name: sidecar
        image: registry/sidecar:v0.2
      volumes:
      - name: config
        configMap:
          name: api-config

Goal: extract everything from containers: up to (but not including) the next sibling key (volumes:).

Command:

sed -n '/^      containers:/,/^      [a-z]/{/^      [a-z]/{/^      containers:/!d;};p;}' deployment.yaml

Walking through it:

  • -n suppresses default printing
  • /^ containers:/,/^ [a-z]/ selects the range from containers: to the next line starting with the same indentation and a lowercase letter
  • The inner block deletes the closing line of the range (the next sibling key) unless it’s the opening containers: itself
  • p prints what survives

Output:

      containers:
      - name: api
        image: registry/api:v1.4
        ports:
        - containerPort: 8080
      - name: sidecar
        image: registry/sidecar:v0.2

This relies on the indentation level being consistent — exactly six spaces in this manifest. YAML’s indentation-based scoping is what makes this work and what makes it brittle. If your manifest mixes 2-space and 4-space indentation, switch to a YAML-aware tool (yq).

3. Find SQL Statement Spanning Multiple Lines

Audit logs sometimes capture SQL across multiple lines. You need to extract a complete statement up to its trailing ;.

Input (audit.log):

2026-05-02 14:11 user=admin action=query
SELECT u.id, u.email
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
  AND o.status = 'shipped'
ORDER BY u.id;
2026-05-02 14:12 user=admin action=disconnect

Goal: pull out the complete SELECT ... ; statement as one block.

Command:

sed -n '/^SELECT/,/;[[:space:]]*$/p' audit.log

The address range /^SELECT/,/;[[:space:]]*$/ selects from a line starting with SELECT to the next line ending with ; (allowing trailing whitespace). The default behavior of address ranges is to include both endpoints, which is exactly what we want.

Output:

SELECT u.id, u.email
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
  AND o.status = 'shipped'
ORDER BY u.id;

If you want to flatten it onto one line for ingestion into another tool:

sed -n '/^SELECT/,/;[[:space:]]*$/p' audit.log | sed ':a;N;$!ba;s/\n/ /g'

The second sed slurps the extracted block and replaces every \n with a space.

4. Match Pattern X Followed By Pattern Y on Next Line

Some log formats emit a header line and a payload line as a pair:

Input (pairs.log):

[2026-05-03 09:01] REQUEST GET /api/users
{"trace_id":"abc123","duration_ms":42}
[2026-05-03 09:01] REQUEST POST /api/login
{"trace_id":"def456","duration_ms":118}
[2026-05-03 09:02] HEALTH check ok
{"trace_id":"ghi789","duration_ms":3}

Goal: print only the request/payload pairs (skip HEALTH lines and their payloads).

Command:

sed -n '/^\[.*REQUEST/{N;p;}' pairs.log

This matches lines beginning with [...] REQUEST, runs N to append the next line (the JSON payload), then p prints the joined pattern space.

Output:

[2026-05-03 09:01] REQUEST GET /api/users
{"trace_id":"abc123","duration_ms":42}
[2026-05-03 09:01] REQUEST POST /api/login
{"trace_id":"def456","duration_ms":118}

A subtler variant: match a pattern only when the next line also matches a different pattern. For example, only flag REQUEST lines whose JSON payload contains "duration_ms":1[0-9][0-9] (slow requests):

sed -n '/^\[.*REQUEST/{N;/duration_ms":1[0-9][0-9]/p;}' pairs.log

N joins the two lines, then the address /duration_ms":1[0-9][0-9]/ matches against the combined pattern space.

Output:

[2026-05-03 09:01] REQUEST POST /api/login
{"trace_id":"def456","duration_ms":118}

5. Print N Lines After Match

grep -A 3 ERROR prints each match plus the next 3 lines. Sometimes grep isn’t available — minimal containers, embedded systems, or you’re already inside a sed pipeline. Here’s how to do it in pure sed.

Input (server.log):

INFO 09:00 startup ok
ERROR 09:01 db timeout
  endpoint: postgres://primary
  retry: 3
  giving up
INFO 09:02 fallback to read-replica
INFO 09:03 healthy
ERROR 09:05 cache miss
  key: user:42
  source: redis

Goal: for each ERROR, print the matching line plus the next 3 lines.

Command:

sed -n '/ERROR/{p;n;p;n;p;n;p;}' server.log

/ERROR/ matches the line. The brace block runs p;n;p;n;p;n;p. p prints pattern space. n reads the next input line into pattern space, replacing the current one. So: print the match, advance, print, advance, print, advance, print. Match plus three more lines.

Output:

ERROR 09:01 db timeout
  endpoint: postgres://primary
  retry: 3
  giving up
ERROR 09:05 cache miss
  key: user:42
  source: redis

The catch: if the match is on the last line and there are no “next 3” lines, n quits the script. For a version that handles end-of-file cleanly, use awk:

awk '/ERROR/{c=4} c-->0' server.log

That’s three characters longer than the sed version and handles all edge cases. Which is foreshadowing.

When to Stop Using sed for Multiline Work

I have a rule. Three multiline operations chained in one sed script and I switch tools. Here’s the threshold and what to switch to.

One multiline operation — sed is fine. N;s/\n/ / to join two lines is clearer than any alternative.

Two operations — sed is still fine but starting to look weird. :a;N;$!ba;s/.../.../g is recognizable as the slurp-and-replace idiom.

Three or more operations — you’re writing a parser, and sed isn’t a parser. Switch to one of:

  • awk with custom record separators. awk 'BEGIN{RS="";FS="\n"} ...' treats blank-line-separated paragraphs as records, which is what most “multiline sed” really wants.
  • Python with re.MULTILINE and re.DOTALL. re.findall(r'^SELECT.*?;', text, re.DOTALL | re.MULTILINE) is one line and explicit.
  • ripgrep with --multiline and --multiline-dotall. rg -U 'ERROR.*?(?=^[A-Z])' --multiline-dotall finds variable-length error blocks faster than any sed equivalent.

Concrete example. To extract all multi-line stack traces, count them, and group by exception type, you could write a sed pipeline. You’d hate it. The awk version is:

awk 'BEGIN{RS=""} /Exception/{
  match($0, /[A-Z][a-zA-Z]+Exception/);
  e=substr($0, RSTART, RLENGTH);
  c[e]++
} END{for (k in c) print c[k], k}' error.log

Records separated by blank lines (RS=""), so each stack trace is one $0. Match the exception name, count, print at end. That’s the kind of thing sed can do but shouldn’t.

The honest answer: most “multiline sed” code I’ve seen in production should have been awk. Sed is a stream editor. It excels at line-oriented transformations. The moment your unit of work isn’t a line, you’re fighting the tool.

I still write multiline sed all the time. For one-off pipelines, for SSH’d-into bastions where awk feels heavy, for the muscle memory of :a;N;$!ba. But in scripts that other people will read and debug, I default to awk now.

The four commands — N, P, D, x — plus h H g G for hold-space juggling are enough to get you out of any one-shot multiline jam. Beyond that, you’re just typing extra characters to feel clever.

Similar Articles

More from devops

No related topic suggestions found.