/user/kayd @ devops :~$ cat bash-string-validation-and-generation.md

Bash String Validation, Generation & a Library Bash String Validation, Generation & a Library

Karandeep Singh

Feb 20, 2023 • 9 minutes

Summary

Bash functions for random string generation, input sanitization, CSV parsing, password-strength checks, and slug generation, plus a complete production string library to source.

This part covers the heavier-duty string utilities: generating unique identifiers, sanitizing untrusted input, parsing CSV files, scoring password strength, and building URL slugs. It closes with a complete, sourceable string-functions library and the practical lessons that come with using functions like these in real scripts.

Additional Utility Functions

Here are additional helper functions that solve specific problems:

REPEAT: Generate Test Data

repeat() {
    local str="$1" count="$2"
    for ((i=1; i<=$count; i++)); do echo -n "$str"; done
    echo
}

# Generate separator lines in reports
repeat "=" 80  # Outputs 80 equal signs

String Case Conversion

# CamelCase to snake_case (for API field mapping)
camel_to_snake_case() {
    echo "$1" | sed -E 's/([a-z0-9])([A-Z])/\1_\L\2/g' | tr '[:upper:]' '[:lower:]'
}

# Example: UserId -> user_id
$ camel_to_snake_case "UserId"
user_id

Word Operations

# Count words (for content analysis)
count_words() {
    echo "$1" | wc -w
}

# Reverse word order (for RTL language processing)
reverse_words() {
    echo "$1" | awk '{ for (i=NF; i>0; i--) printf("%s ",$i); print "" }'
}

HTML/Special Character Handling

# Strip HTML tags (for plain text email generation)
strip_html_tags() {
    echo "$1" | sed -e 's/<[^>]*>//g'
}

# Remove special characters (for filename generation)
remove_special_chars() {
    echo "$1" | tr -d '[:punct:]'
}

These utility functions handle edge cases in content processing, particularly when generating plain-text email notifications from HTML templates.

RANDOM_STRING: Generating Unique Identifiers

The random_string function generates cryptographically random strings for unique IDs:

random_string() {
  local len="$1"
  local random_bytes="$(openssl rand -hex $len | tr -d '\n')"
  echo "${random_bytes:0:len}"
}

Example usage in a session management system:

# Generate unique session tokens for user authentication
function create_session {
    local user_id="$1"
    local session_token=$(random_string 32)
    local expires_at=$(date -d '+24 hours' '+%Y-%m-%d %H:%M:%S')

    # Store session in Redis
    redis-cli SETEX "session:$session_token" 86400 "$user_id" > /dev/null

    echo "$session_token"
}

# Generate temporary file paths
temp_file="/tmp/upload_$(random_string 16).tmp"

This is preferable to using $RANDOM, which has insufficient entropy and can produce session token collisions. The openssl-based approach draws from a stronger random source.

SANITIZE: Input Validation

The sanitize function removes potentially dangerous characters from user input:

sanitize() {
  local str="$1"
  local allowed="$2"
  local sanitized=$(echo "$str" | sed "s/[^[:alnum:]$allowed]//g")
  echo "$sanitized"
}

Used in filename generation from user input to prevent directory traversal:

# User uploads file, we need to create safe filename
user_provided_name="../../etc/passwd"  # Malicious input

# Sanitize allowing only alphanumeric, dash, underscore, dot
safe_filename=$(sanitize "$user_provided_name" "._-")
# Result: "etcpasswd"

# Generate final filename with random prefix
final_filename="$(random_string 8)_${safe_filename}"
# Result: "a7f2d9e1_etcpasswd"

upload_path="/var/uploads/$final_filename"

Sanitizing input this way guards a file upload endpoint against path traversal, where unsanitized filenames could otherwise escape the intended directory.

PARSE_CSV: Production CSV Processing

The parse_csv function processes CSV files with custom delimiters:

parse_csv() {
  local file="$1"
  local delimiter="${2:-,}"
  local line_num=0

  while IFS="$delimiter" read -ra fields; do
    line_num=$((line_num + 1))

    # Skip header row
    if [ $line_num -eq 1 ]; then
        continue
    fi

    # Trim all fields
    for i in "${!fields[@]}"; do
        fields[$i]=$(trim "${fields[$i]}")
    done

    # Process fields (example: insert into database)
    echo "Line $line_num: ${#fields[@]} fields -> ${fields[@]}"
  done < "$file"
}

Example usage in a daily ETL pipeline:

#!/bin/bash
# Example: scheduled vendor data import

source /opt/scripts/string_functions.sh

for csv_file in /data/imports/*.csv; do
    echo "Processing: $csv_file"

    line_count=0
    error_count=0

    while IFS=',' read -r id email status balance; do
        line_count=$((line_count + 1))

        # Skip header
        if [ $line_count -eq 1 ]; then
            continue
        fi

        # Trim and validate
        id=$(trim "$id")
        email=$(trim "$email" | lowercase)
        status=$(trim "$status" | uppercase)
        balance=$(trim "$balance")

        # Validate required fields
        if [ -z "$id" ] || [ -z "$email" ]; then
            echo "ERROR: Line $line_count missing required fields" >&2
            error_count=$((error_count + 1))
            continue
        fi

        # Generate SQL
        echo "INSERT INTO customers (id, email, status, balance) VALUES ($id, '$email', '$status', $balance) ON CONFLICT (id) DO UPDATE SET email='$email', status='$status', balance=$balance;"

    done < "$csv_file" > "/tmp/import_$(basename "$csv_file" .csv).sql"

    echo "Processed $line_count lines from $csv_file ($error_count errors)"
done

A pipeline like this handles vendor data from multiple sources, each with slightly different CSV formats (some with pipe delimiters, some with tabs). The trim and normalization functions help ensure clean data entry across sources.

CHECK_PASSWORD_STRENGTH: User Account Security

The check_password_strength function validates passwords during user registration:

check_password_strength() {
  local password="$1"
  local length=${#password}
  local upper=$(echo "$password" | grep -o "[A-Z]" | sort -u | wc -l)
  local lower=$(echo "$password" | grep -o "[a-z]" | sort -u | wc -l)
  local digits=$(echo "$password" | grep -o "[0-9]" | sort -u | wc -l)
  local special=$(echo "$password" | grep -o "[^a-zA-Z0-9]" | sort -u | wc -l)

  # Score based on password requirements
  local score=0

  # Length check (minimum 12 characters)
  if [ $length -ge 12 ]; then
    score=$((score + 3))
  elif [ $length -ge 8 ]; then
    score=$((score + 1))
  fi

  # Character variety
  [ $upper -gt 0 ] && score=$((score + 1))
  [ $lower -gt 0 ] && score=$((score + 1))
  [ $digits -gt 0 ] && score=$((score + 1))
  [ $special -gt 0 ] && score=$((score + 2))

  # Return score and recommendation
  if [ $score -lt 4 ]; then
    echo "WEAK|Password must be at least 12 characters with uppercase, lowercase, digit, and special character"
    return 1
  elif [ $score -lt 6 ]; then
    echo "MODERATE|Consider adding more character variety"
    return 0
  else
    echo "STRONG|Password meets security requirements"
    return 0
  fi
}

Example usage in a user registration script:

# User registration validation
read -sp "Enter password: " password
echo

result=$(check_password_strength "$password")
status="${result%%|*}"
message="${result##*|}"

if [ "$status" = "WEAK" ]; then
    echo "ERROR: $message" >&2
    exit 1
fi

if [ "$status" = "MODERATE" ]; then
    echo "WARNING: $message" >&2
    read -p "Continue anyway? (yes/no): " confirm
    if [ "$confirm" != "yes" ]; then
        exit 1
    fi
fi

# Password accepted, proceed with account creation
echo "Password strength: $status"

Enforcing password strength at registration reduces account lockouts caused by users forgetting weak passwords.

GENERATE_SLUG: URL Generation for Dynamic Content

The generate_slug function creates SEO-friendly URLs from user content:

generate_slug() {
  local string="$1"
  local slug=$(echo "$string" | tr -cd '[:alnum:][:space:]' | tr '[:space:]' '-' | tr '[:upper:]' '[:lower:]' | tr -s '-' | sed 's/-$//' | sed 's/^-//')
  echo "$slug"
}

Example usage in a content management system:

#!/bin/bash
# Generate blog post from user input

read -p "Enter blog post title: " title
slug=$(generate_slug "$title")

# Check for slug collisions
counter=1
final_slug="$slug"
while [ -f "/var/www/blog/posts/${final_slug}.html" ]; do
    final_slug="${slug}-${counter}"
    counter=$((counter + 1))
done

# Create blog post file
cat > "/var/www/blog/posts/${final_slug}.html" <<EOF
<!DOCTYPE html>
<html>
<head>
    <title>$title</title>
    <link rel="canonical" href="https://example.com/blog/${final_slug}" />
</head>
<body>
    <h1>$title</h1>
    <!-- Content here -->
</body>
</html>
EOF

echo "Blog post created: https://example.com/blog/${final_slug}"

Example inputs and outputs:

# Input: "How to Deploy Python Apps with Docker & Kubernetes"
# Output slug: "how-to-deploy-python-apps-with-docker-kubernetes"

# Input: "10 Best Practices for AWS Security (2024 Edition)"
# Output slug: "10-best-practices-for-aws-security-2024-edition"

# Input: "Understanding CPU vs. I/O Bound Operations"
# Output slug: "understanding-cpu-vs-io-bound-operations"

This function generates consistent, SEO-friendly URLs for blog posts and documentation pages across a content library.

REPLACE

This function replaces all occurrences of a specified substring with another substring.

replace () {
    local original="$1"
    local replacement="$2" 
    local input="$3" 
    echo "${input//$original/$replacement}" 
}

#Usage
result=$(replace "apple" "banana" "I like apple and apple pie.") 
echo "$result"
#Output: "I like banana and banana pie."

COUNT_WORDS

This function counts the number of words in a given string.

count_words(){
    local input="$1"
    local word_count=$(echo "$input" | wc -w)
    echo "$word_count"
}
 
count=$(count_words "Hello, how are you?") 
echo "Word count: $count"
# Output: "Word count: 4"

REMOVE_SPECIAL_CHARS

This function removes all special characters from a string.

remove_special_chars (){
    local input="$1"
    sanitized=$(echo "$input" | tr -d '[:punct:]')
    echo "$sanitized"
}

#Usage
  clean_string=$(remove_special_chars "Hello, @world!") 
  echo "$clean_string"
  #Output: "Hello world"

REVERSE_WORDS

This function reverses the order of words in a string.

reverse_words(){
    local input="$1" 
    reversed=$(echo "$input" | awk '{ for (i=NF; i>0; i--) printf("%s ",$i); print "" }') 
    echo "$reversed" 
}

#Usage
  reversed_sentence=$(reverse_words "This is a sentence.") 
  echo "$reversed_sentence"
  #Output: "sentence. a is This"

STRIP_HTML_TAGS

This function removes HTML tags from a given string.

strip_html_tags(){
    local input="$1"
    cleaned=$(echo "$input" | sed -e 's/<[^>]*>//g')
    echo "$cleaned"
}

#Usage
  text_without_tags=$(strip_html_tags "<p>This is <b>bold</b> text.</p>") 
  echo "$text_without_tags"
#Output: "This is bold text."

CAMEL_TO_SNAKE_CASE

This function converts a string from CamelCase to snake_case.

camel_to_snake_case() {
    local input="$1"
    snake_case=$(echo "$input" | sed -E 's/([a-z0-9])([A-Z])/\1_\L\2/g' | tr '[:upper:]' '[:lower:]')
    echo "$snake_case"
}
# Usage
snake_case_str=$(camel_to_snake_case "camelCaseString")
echo "$snake_case_str"  # Output: "camel_case_string"

COUNT_OCCURRENCES

This function counts the occurrences of a substring within a larger string.

count_occurrences() {
    local substring="$1"
    local input="$2"
    echo "$input" | grep -o "$substring" | wc -l
}
# Usage
count=$(count_occurrences "apple" "I like apple and apple pie.")
echo "Occurrences: $count"  # Output: "Occurrences: 2"

Production Function Library

Here’s a complete string functions library. Save this as string_functions.sh and source it in your scripts:

#!/bin/bash
# string_functions.sh - Reusable string manipulation library
# Author: Karandeep Singh
# Last Updated: 2026-02-20

# Whitespace trimming
ltrim() { echo "${1#"${1%%[![:space:]]*}"}"; }
rtrim() { echo "${1%"${1##*[![:space:]]}"}"; }
trim() { echo "$(rtrim "$(ltrim "$1")")"; }

# Case conversion
uppercase() { echo "$1" | tr '[:lower:]' '[:upper:]'; }
lowercase() { echo "$1" | tr '[:upper:]' '[:lower:]'; }
capitalize() { echo "$1" | sed 's/\b\([a-z]\)/\u\1/g'; }

# String info
len() { echo "${#1}"; }

# String transformation
reverse() {
    local str="$1" reversed="" len=${#str}
    for ((i=$len-1; i>=0; i--)); do
        reversed="$reversed${str:$i:1}"
    done
    echo "$reversed"
}

substitute_bash() { echo "${1//$2/$3}"; }
truncate() {
    local str="$1" len="$2"
    [ "${#str}" -gt "$len" ] && echo "${str:0:$len}..." || echo "$str"
}

# String extraction
substring() { echo "${1:$2:$3}"; }
split() {
    local IFS="$2"
    read -ra arr <<< "$1"
    echo "${arr[@]}"
}

# Utility functions
rot13() { echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m'; }
random_string() {
    local len="$1"
    openssl rand -hex $len | tr -d '\n' | cut -c1-$len
}
generate_slug() {
    echo "$1" | tr -cd '[:alnum:][:space:]' | tr '[:space:]' '-' | \
    tr '[:upper:]' '[:lower:]' | tr -s '-' | sed 's/-$//' | sed 's/^-//'
}

sanitize() {
    local str="$1" allowed="$2"
    echo "$str" | sed "s/[^[:alnum:]$allowed]//g"
}

count() { echo "$1" | awk -v FS="$2" '{print NF-1}'; }

Usage example:

#!/bin/bash
source /opt/scripts/string_functions.sh

# Process vendor CSV import
while IFS=',' read -r id email status; do
    id=$(trim "$id")
    email=$(lowercase "$(trim "$email")")
    status=$(uppercase "$(trim "$status")")

    [ $(len "$email") -gt 255 ] && continue

    echo "INSERT INTO users VALUES ($id, '$email', '$status');"
done < vendor_data.csv > import.sql

Lessons Learned

Key takeaways from using functions like these in data-cleanup and log-processing scripts:

Performance Matters

Bash parameter expansion is 5-10x faster than sed for simple operations
For bulk processing (100K+ lines), use awk instead of while loops
Avoid spawning external processes in tight loops

Error Handling is Critical

Always validate input before string operations
Check string length before substring extraction
Handle empty strings explicitly

Security Considerations

Never use unsanitized user input in SQL
Be careful with substitute - it doesn’t escape regex by default
ROT13 is obfuscation, not encryption
Use strong random sources (openssl, not $RANDOM)

Where These Functions Help

Applied consistently, these string functions can:

Speed up ETL processing: parameter expansion avoids spawning external processes
Reduce import errors: trimming and validation catch whitespace and field-length issues
Harden file handling: sanitizing input guards against path traversal
Improve data quality: case normalization avoids duplicate accounts

The key insight: simple string manipulation functions, when applied consistently across data pipelines, eliminate entire classes of data quality problems.

This is part of the Advanced Bash String Operations series.

Question

What string manipulation challenges have you encountered in production data pipelines?

More from devops

Upgrading Jenkins: Migration Strategies for Major Version Jumps

How to plan and execute Jenkins upgrades safely, including in-place, blue-green, and phased paths …

Docker Compose, Bake & ECR: Build and Ship Apps

Build a multi-container app with Docker Compose, then build images with Docker Bake and push them to …

Kubernetes on AWS: EKS Setup with eksctl

Set up a Kubernetes cluster on AWS EKS with eksctl: prerequisites, one-command cluster creation, …

Sed for Log Analysis: Errors, Time Filters, Patterns

Practical sed patterns for log analysis: extract errors, filter time ranges, anonymize PII, parse …

Sed Gotchas: GNU vs BSD and Safe In-Place Editing

The sed gotchas that bite in production: GNU vs BSD differences, in-place editing safety, escape …

Sed in CI/CD: Safe Patterns for GitHub Actions and Jenkins

Use sed safely in CI/CD pipelines: idempotent edits, exit-code checks, dry-run patterns, and the …