Build a multi-container app with Docker Compose, then build images with Docker Bake and push them to …
Bash String Functions: Trimming, Case, and Reversal Bash String Functions: Trimming, Case, and Reversal

Summary
Whitespace from vendor CSV exports and inconsistent log formats is a frequent source of silent data corruption. These trim and reversal functions clean strings reliably across mixed whitespace, with parameter-expansion approaches that avoid spawning external processes.
LTRIM: The First Attempt (That Failed)
A naive first solution looks like this:
# First attempt - doesn't work properly
function ltrim_broken {
echo "$1" | sed 's/^ *//'
}
This works for spaces but fails on tabs and other whitespace. When a CSV has mixed tabs and spaces, this function misses the tabs entirely.
Here’s what went wrong:
$ ltrim_broken " data" # Tab + spaces + text
data # Still has spaces!
The sed pattern ^ * only matches spaces, not tabs or other whitespace characters. A robust solution needs to handle all whitespace types.
Expand your knowledge with Build and Deploy a Go Lambda Function
LTRIM: Production Solution with Performance Testing
A reliable approach uses Bash parameter expansion:
function ltrim {
echo "${1#"${1%%[![:space:]]*}"}"
}
This uses two parameter expansion operations:
${1%%[![:space:]]*}- finds all leading whitespace${1#...}- removes it from the start of the string
Performance comparison processing 100,000 lines. The exact numbers below come from a rough local benchmark and will vary with your hardware and shell — treat them as illustrative of the relative ordering, not precise measurements:
# Test file: 100K lines with leading whitespace
seq 1 100000 | awk '{print " "$0}' > test_data.txt
# Method 1: ltrim with parameter expansion
time while read line; do ltrim "$line" > /dev/null; done < test_data.txt
# Real: ~8s
# Method 2: sed approach (spawns a process per line)
time while read line; do echo "$line" | sed 's/^[[:space:]]*//'; done < test_data.txt > /dev/null
# Real: ~40s
# Method 3: awk approach (single process)
time awk '{sub(/^[[:space:]]+/, ""); print}' test_data.txt > /dev/null
# Real: ~1s
The parameter expansion approach is several times faster than the per-line sed loop but slower than a single awk pass for bulk processing. However, for individual string operations in scripts, the function approach is more maintainable.
Deepen your understanding in Sed vs Awk vs Grep: When to Use Which (with Decision Matrix)
RTRIM: Log Format Normalization
A common need for rtrim comes from log aggregation. Different microservices use different logging formats, some adding trailing spaces, some adding newlines. This can break a log parsing regex.
Here’s the kind of bug this causes:
# Microservice A log format (no trailing space)
echo "2024-01-15 ERROR UserService failed"
# Microservice B log format (trailing spaces)
echo "2024-01-15 ERROR PaymentService failed "
# My regex pattern expected no trailing whitespace
if [[ $log_line =~ ^([0-9-]+)\ ([A-Z]+)\ (.+)$ ]]; then
# This captured "PaymentService failed " with spaces
# Breaking downstream JSON generation
fi
The rtrim function fixed this:
function rtrim {
echo "${1%"${1##*[![:space:]]}"}"
}
This mirrors ltrim but works from the right side:
${1##*[![:space:]]}- finds all trailing whitespace${1%...}- removes it from the end of the string
Example usage in a log parser:
while IFS= read -r line; do
clean_line=$(rtrim "$line")
if [[ $clean_line =~ ^([0-9-]+)\ ([A-Z]+)\ (.+)$ ]]; then
timestamp="${BASH_REMATCH[1]}"
level="${BASH_REMATCH[2]}"
message="${BASH_REMATCH[3]}"
# Generate JSON for log aggregator
echo "{\"ts\":\"$timestamp\",\"level\":\"$level\",\"msg\":\"$message\"}"
fi
done < service.log
This normalizes log lines across services, removing the whitespace inconsistencies that can cause logs to fail parsing.
Explore this further in Bash String Functions: Search, Split, Count, Extract
TRIM: A Workhorse Function
The trim function is a workhorse, called frequently in CSV processing:
function trim {
echo "$(rtrim "$(ltrim "$1")")"
}
This simply combines both operations. Why not use a single regex? Because combining the two parameter expansion approaches is actually faster than sed for individual calls:
In a rough local benchmark, the parameter expansion version finished roughly an order of magnitude faster than shelling out to sed in a loop (exact times depend on your machine):
# Benchmark: 10,000 trim operations
time for i in {1..10000}; do
trim " test data " > /dev/null
done
# Real: a few seconds
# Versus sed approach
time for i in {1..10000}; do
echo " test data " | sed 's/^[[:space:]]*//;s/[[:space:]]*$//' > /dev/null
done
# Real: roughly 10x longer
The parameter expansion approach wins by a wide margin because it doesn’t spawn external processes.
Here’s an example CSV import script that uses it:
#!/bin/bash
# Process vendor CSV with inconsistent whitespace
# Runs daily at 2 AM processing 50K+ rows
source string_functions.sh
while IFS=',' read -r id email status; do
# Trim all fields
clean_id=$(trim "$id")
clean_email=$(trim "$email")
clean_status=$(trim "$status")
# Validate and insert
if [[ $clean_id =~ ^[0-9]+$ ]] && [[ $clean_email =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$ ]]; then
echo "INSERT INTO users (id, email, status) VALUES ($clean_id, '$clean_email', '$clean_status');"
else
echo "WARN: Skipped invalid row: id=$id email=$email" >&2
fi
done < vendor_export.csv > import.sql
Trim functions like these guard against the whitespace issues that cause imports to fail silently.
Discover related concepts in Bash String Functions: Search, Split, Count, Extract
REVERSE: Palindrome Detection
The reverse function can solve a specific problem in a data validation pipeline: detecting palindromic transaction IDs that might be flagged as potential duplicates.
Here’s the function:
function reverse {
local str="$1"
local reversed=""
local len=${#str}
for ((i=$len-1; i>=0; i--))
do
reversed="$reversed${str:$i:1}"
done
echo "$reversed"
}
This uses a C-style for loop to iterate backwards through the string. It’s slower than the rev command but more portable (rev isn’t available in all environments).
Example usage:
# Check if transaction ID is palindrome (potential duplicate)
function is_palindrome {
local str="$1"
local rev=$(reverse "$str")
[[ "$str" == "$rev" ]]
}
# Process transaction file
while read -r txn_id amount status; do
if is_palindrome "$txn_id"; then
echo "WARN: Palindrome transaction ID $txn_id - flagging for review" >&2
fi
# Process transaction...
done < transactions.csv
Performance note: This function is slow for long strings because it walks the string character by character in pure Bash. As a rough illustration, processing 100,000 10-character strings, the character-by-character Bash loop is dramatically slower than delegating to a compiled tool:
# Bash loop approach: slowest by far (tens of seconds)
# rev command: fastest (well under a second)
# awk approach: in between (around a second or two)
This function is useful for portability in containerized environments where rev might not be available, but for performance-critical code, use rev:
function reverse_fast {
echo "$1" | rev
}
This is part of the Advanced Bash String Operations series.
What string manipulation challenges have you encountered in production data pipelines?
Similar Articles
Related Content
More from devops
Set up a Kubernetes cluster on AWS EKS with eksctl: prerequisites, one-command cluster creation, …
Kubernetes CrashLoopBackOff explained: a workflow to diagnose it and fix the six most common causes, …
You Might Also Like
Practical sed patterns for log analysis: extract errors, filter time ranges, anonymize PII, parse …
The sed gotchas that bite in production: GNU vs BSD differences, in-place editing safety, escape …
Use sed safely in CI/CD pipelines: idempotent edits, exit-code checks, dry-run patterns, and the …

