/user/kayd @ devops :~$ cat bash-string-trimming-and-case.md

Bash String Functions: Trimming, Case, and Reversal Bash String Functions: Trimming, Case, and Reversal

Karandeep Singh

Feb 20, 2023 • 5 minutes

Summary

Bash string functions for stripping leading and trailing whitespace (ltrim, rtrim, trim) and reversing strings, with performance comparisons against sed and awk.

Whitespace from vendor CSV exports and inconsistent log formats is a frequent source of silent data corruption. These trim and reversal functions clean strings reliably across mixed whitespace, with parameter-expansion approaches that avoid spawning external processes.

LTRIM: The First Attempt (That Failed)

A naive first solution looks like this:

# First attempt - doesn't work properly
function ltrim_broken {
    echo "$1" | sed 's/^ *//'
}

This works for spaces but fails on tabs and other whitespace. When a CSV has mixed tabs and spaces, this function misses the tabs entirely.

Here’s what went wrong:

$ ltrim_broken "	  data"    # Tab + spaces + text
  data                        # Still has spaces!

The sed pattern ^ * only matches spaces, not tabs or other whitespace characters. A robust solution needs to handle all whitespace types.

LTRIM: Production Solution with Performance Testing

A reliable approach uses Bash parameter expansion:

function ltrim {
    echo "${1#"${1%%[![:space:]]*}"}"
}

This uses two parameter expansion operations:

${1%%[![:space:]]*} - finds all leading whitespace
${1#...} - removes it from the start of the string

Performance comparison processing 100,000 lines. The exact numbers below come from a rough local benchmark and will vary with your hardware and shell — treat them as illustrative of the relative ordering, not precise measurements:

# Test file: 100K lines with leading whitespace
seq 1 100000 | awk '{print "   "$0}' > test_data.txt

# Method 1: ltrim with parameter expansion
time while read line; do ltrim "$line" > /dev/null; done < test_data.txt
# Real: ~8s

# Method 2: sed approach (spawns a process per line)
time while read line; do echo "$line" | sed 's/^[[:space:]]*//'; done < test_data.txt > /dev/null
# Real: ~40s

# Method 3: awk approach (single process)
time awk '{sub(/^[[:space:]]+/, ""); print}' test_data.txt > /dev/null
# Real: ~1s

The parameter expansion approach is several times faster than the per-line sed loop but slower than a single awk pass for bulk processing. However, for individual string operations in scripts, the function approach is more maintainable.

RTRIM: Log Format Normalization

A common need for rtrim comes from log aggregation. Different microservices use different logging formats, some adding trailing spaces, some adding newlines. This can break a log parsing regex.

Here’s the kind of bug this causes:

# Microservice A log format (no trailing space)
echo "2024-01-15 ERROR UserService failed"

# Microservice B log format (trailing spaces)
echo "2024-01-15 ERROR PaymentService failed  "

# A regex that expects no trailing whitespace
if [[ $log_line =~ ^([0-9-]+)\ ([A-Z]+)\ (.+)$ ]]; then
    # This captured "PaymentService failed  " with spaces
    # Breaking downstream JSON generation
fi

The rtrim function fixes this:

function rtrim {
    echo "${1%"${1##*[![:space:]]}"}"
}

This mirrors ltrim but works from the right side:

${1##*[![:space:]]} - finds all trailing whitespace
${1%...} - removes it from the end of the string

Example usage in a log parser:

while IFS= read -r line; do
    clean_line=$(rtrim "$line")
    if [[ $clean_line =~ ^([0-9-]+)\ ([A-Z]+)\ (.+)$ ]]; then
        timestamp="${BASH_REMATCH[1]}"
        level="${BASH_REMATCH[2]}"
        message="${BASH_REMATCH[3]}"
        # Generate JSON for log aggregator
        echo "{\"ts\":\"$timestamp\",\"level\":\"$level\",\"msg\":\"$message\"}"
    fi
done < service.log

This normalizes log lines across services, removing the whitespace inconsistencies that can cause logs to fail parsing.

TRIM: A Workhorse Function

The trim function is a workhorse, called frequently in CSV processing:

function trim {
    echo "$(rtrim "$(ltrim "$1")")"
}

This simply combines both operations. Why not use a single regex? Because combining the two parameter expansion approaches is actually faster than sed for individual calls:

In a rough local benchmark, the parameter expansion version finished roughly an order of magnitude faster than shelling out to sed in a loop (exact times depend on your machine):

# Benchmark: 10,000 trim operations
time for i in {1..10000}; do
    trim "  test data  " > /dev/null
done
# Real: a few seconds

# Versus sed approach
time for i in {1..10000}; do
    echo "  test data  " | sed 's/^[[:space:]]*//;s/[[:space:]]*$//' > /dev/null
done
# Real: roughly 10x longer

The parameter expansion approach wins by a wide margin because it doesn’t spawn external processes.

Here’s an example CSV import script that uses it:

#!/bin/bash
# Process a vendor CSV with inconsistent whitespace

source string_functions.sh

while IFS=',' read -r id email status; do
    # Trim all fields
    clean_id=$(trim "$id")
    clean_email=$(trim "$email")
    clean_status=$(trim "$status")

    # Validate and insert
    if [[ $clean_id =~ ^[0-9]+$ ]] && [[ $clean_email =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$ ]]; then
        echo "INSERT INTO users (id, email, status) VALUES ($clean_id, '$clean_email', '$clean_status');"
    else
        echo "WARN: Skipped invalid row: id=$id email=$email" >&2
    fi
done < vendor_export.csv > import.sql

Trim functions like these guard against the whitespace issues that cause imports to fail silently.

REVERSE: Palindrome Detection

The reverse function can solve a specific problem in a data validation pipeline: detecting palindromic transaction IDs that might be flagged as potential duplicates.

Here’s the function:

function reverse {
  local str="$1"
  local reversed=""
  local len=${#str}
  for ((i=$len-1; i>=0; i--))
  do
    reversed="$reversed${str:$i:1}"
  done
  echo "$reversed"
}

This uses a C-style for loop to iterate backwards through the string. It’s slower than the rev command but more portable (rev isn’t available in all environments).

Example usage:

# Check if transaction ID is palindrome (potential duplicate)
function is_palindrome {
    local str="$1"
    local rev=$(reverse "$str")
    [[ "$str" == "$rev" ]]
}

# Process transaction file
while read -r txn_id amount status; do
    if is_palindrome "$txn_id"; then
        echo "WARN: Palindrome transaction ID $txn_id - flagging for review" >&2
    fi
    # Process transaction...
done < transactions.csv

Performance note: This function is slow for long strings because it walks the string character by character in pure Bash. As a rough illustration, processing 100,000 10-character strings, the character-by-character Bash loop is dramatically slower than delegating to a compiled tool:

# Bash loop approach: slowest by far (tens of seconds)
# rev command: fastest (well under a second)
# awk approach: in between (around a second or two)

This function is useful for portability in containerized environments where rev might not be available, but for performance-critical code, use rev:

function reverse_fast {
    echo "$1" | rev
}

This is part of the Advanced Bash String Operations series.

Question

What string manipulation challenges have you encountered in production data pipelines?

More from devops

Upgrading Jenkins: Migration Strategies for Major Version Jumps

How to plan and execute Jenkins upgrades safely, including in-place, blue-green, and phased paths …

Docker Compose, Bake & ECR: Build and Ship Apps

Build a multi-container app with Docker Compose, then build images with Docker Bake and push them to …

Kubernetes on AWS: EKS Setup with eksctl

Set up a Kubernetes cluster on AWS EKS with eksctl: prerequisites, one-command cluster creation, …

Sed for Log Analysis: Errors, Time Filters, Patterns

Practical sed patterns for log analysis: extract errors, filter time ranges, anonymize PII, parse …

Sed Gotchas: GNU vs BSD and Safe In-Place Editing

The sed gotchas that bite in production: GNU vs BSD differences, in-place editing safety, escape …

Sed in CI/CD: Safe Patterns for GitHub Actions and Jenkins

Use sed safely in CI/CD pipelines: idempotent edits, exit-code checks, dry-run patterns, and the …