/user/kayd @ devops :~$ cat bash-string-validation-and-generation.md

Bash String Validation, Generation & a Library Bash String Validation, Generation & a Library

QR Code linking to: Bash String Validation, Generation & a Library
Karandeep Singh
Karandeep Singh
• 9 minutes

Summary

Bash functions for random string generation, input sanitization, CSV parsing, password-strength checks, and slug generation, plus a complete production string library to source.

This part covers the heavier-duty string utilities: generating unique identifiers, sanitizing untrusted input, parsing CSV files, scoring password strength, and building URL slugs. It closes with a complete, sourceable string-functions library and the lessons learned from running these in production.

Additional Utility Functions

Here are additional helper functions that solve specific problems:

REPEAT: Generate Test Data

repeat() {
    local str="$1" count="$2"
    for ((i=1; i<=$count; i++)); do echo -n "$str"; done
    echo
}

# Generate separator lines in reports
repeat "=" 80  # Outputs 80 equal signs

String Case Conversion

# CamelCase to snake_case (for API field mapping)
camel_to_snake_case() {
    echo "$1" | sed -E 's/([a-z0-9])([A-Z])/\1_\L\2/g' | tr '[:upper:]' '[:lower:]'
}

# Example: UserId -> user_id
$ camel_to_snake_case "UserId"
user_id

Word Operations

# Count words (for content analysis)
count_words() {
    echo "$1" | wc -w
}

# Reverse word order (for RTL language processing)
reverse_words() {
    echo "$1" | awk '{ for (i=NF; i>0; i--) printf("%s ",$i); print "" }'
}

HTML/Special Character Handling

# Strip HTML tags (for plain text email generation)
strip_html_tags() {
    echo "$1" | sed -e 's/<[^>]*>//g'
}

# Remove special characters (for filename generation)
remove_special_chars() {
    echo "$1" | tr -d '[:punct:]'
}

These utility functions handle edge cases in content processing, particularly when generating plain-text email notifications from HTML templates.

RANDOM_STRING: Generating Unique Identifiers

The random_string function generates cryptographically random strings for unique IDs:

random_string() {
  local len="$1"
  local random_bytes="$(openssl rand -hex $len | tr -d '\n')"
  echo "${random_bytes:0:len}"
}

Example usage in a session management system:

# Generate unique session tokens for user authentication
function create_session {
    local user_id="$1"
    local session_token=$(random_string 32)
    local expires_at=$(date -d '+24 hours' '+%Y-%m-%d %H:%M:%S')

    # Store session in Redis
    redis-cli SETEX "session:$session_token" 86400 "$user_id" > /dev/null

    echo "$session_token"
}

# Generate temporary file paths
temp_file="/tmp/upload_$(random_string 16).tmp"

This is preferable to using $RANDOM, which has insufficient entropy and can produce session token collisions. The openssl-based approach draws from a stronger random source.

SANITIZE: Input Validation

The sanitize function removes potentially dangerous characters from user input:

sanitize() {
  local str="$1"
  local allowed="$2"
  local sanitized=$(echo "$str" | sed "s/[^[:alnum:]$allowed]//g")
  echo "$sanitized"
}

Used in filename generation from user input to prevent directory traversal:

# User uploads file, we need to create safe filename
user_provided_name="../../etc/passwd"  # Malicious input

# Sanitize allowing only alphanumeric, dash, underscore, dot
safe_filename=$(sanitize "$user_provided_name" "._-")
# Result: "etcpasswd"

# Generate final filename with random prefix
final_filename="$(random_string 8)_${safe_filename}"
# Result: "a7f2d9e1_etcpasswd"

upload_path="/var/uploads/$final_filename"

Sanitizing input this way guards a file upload endpoint against path traversal, where unsanitized filenames could otherwise escape the intended directory.

PARSE_CSV: Production CSV Processing

The parse_csv function processes CSV files with custom delimiters:

parse_csv() {
  local file="$1"
  local delimiter="${2:-,}"
  local line_num=0

  while IFS="$delimiter" read -ra fields; do
    line_num=$((line_num + 1))

    # Skip header row
    if [ $line_num -eq 1 ]; then
        continue
    fi

    # Trim all fields
    for i in "${!fields[@]}"; do
        fields[$i]=$(trim "${fields[$i]}")
    done

    # Process fields (example: insert into database)
    echo "Line $line_num: ${#fields[@]} fields -> ${fields[@]}"
  done < "$file"
}

Example usage in a daily ETL pipeline:

#!/bin/bash
# Daily vendor data import - runs at 2 AM via cron
# Processes 50,000+ rows from multiple vendors

source /opt/scripts/string_functions.sh

for csv_file in /data/imports/*.csv; do
    echo "Processing: $csv_file"

    line_count=0
    error_count=0

    while IFS=',' read -r id email status balance; do
        line_count=$((line_count + 1))

        # Skip header
        if [ $line_count -eq 1 ]; then
            continue
        fi

        # Trim and validate
        id=$(trim "$id")
        email=$(trim "$email" | lowercase)
        status=$(trim "$status" | uppercase)
        balance=$(trim "$balance")

        # Validate required fields
        if [ -z "$id" ] || [ -z "$email" ]; then
            echo "ERROR: Line $line_count missing required fields" >&2
            error_count=$((error_count + 1))
            continue
        fi

        # Generate SQL
        echo "INSERT INTO customers (id, email, status, balance) VALUES ($id, '$email', '$status', $balance) ON CONFLICT (id) DO UPDATE SET email='$email', status='$status', balance=$balance;"

    done < "$csv_file" > "/tmp/import_$(basename "$csv_file" .csv).sql"

    echo "Processed $line_count lines from $csv_file ($error_count errors)"
done

A pipeline like this handles vendor data from multiple sources, each with slightly different CSV formats (some with pipe delimiters, some with tabs). The trim and normalization functions help ensure clean data entry across sources.

CHECK_PASSWORD_STRENGTH: User Account Security

The check_password_strength function validates passwords during user registration:

check_password_strength() {
  local password="$1"
  local length=${#password}
  local upper=$(echo "$password" | grep -o "[A-Z]" | sort -u | wc -l)
  local lower=$(echo "$password" | grep -o "[a-z]" | sort -u | wc -l)
  local digits=$(echo "$password" | grep -o "[0-9]" | sort -u | wc -l)
  local special=$(echo "$password" | grep -o "[^a-zA-Z0-9]" | sort -u | wc -l)

  # Score based on password requirements
  local score=0

  # Length check (minimum 12 characters)
  if [ $length -ge 12 ]; then
    score=$((score + 3))
  elif [ $length -ge 8 ]; then
    score=$((score + 1))
  fi

  # Character variety
  [ $upper -gt 0 ] && score=$((score + 1))
  [ $lower -gt 0 ] && score=$((score + 1))
  [ $digits -gt 0 ] && score=$((score + 1))
  [ $special -gt 0 ] && score=$((score + 2))

  # Return score and recommendation
  if [ $score -lt 4 ]; then
    echo "WEAK|Password must be at least 12 characters with uppercase, lowercase, digit, and special character"
    return 1
  elif [ $score -lt 6 ]; then
    echo "MODERATE|Consider adding more character variety"
    return 0
  else
    echo "STRONG|Password meets security requirements"
    return 0
  fi
}

Example usage in a user registration script:

# User registration validation
read -sp "Enter password: " password
echo

result=$(check_password_strength "$password")
status="${result%%|*}"
message="${result##*|}"

if [ "$status" = "WEAK" ]; then
    echo "ERROR: $message" >&2
    exit 1
fi

if [ "$status" = "MODERATE" ]; then
    echo "WARNING: $message" >&2
    read -p "Continue anyway? (yes/no): " confirm
    if [ "$confirm" != "yes" ]; then
        exit 1
    fi
fi

# Password accepted, proceed with account creation
echo "Password strength: $status"

Enforcing password strength at registration reduces account lockouts caused by users forgetting weak passwords.

GENERATE_SLUG: URL Generation for Dynamic Content

The generate_slug function creates SEO-friendly URLs from user content:

generate_slug() {
  local string="$1"
  local slug=$(echo "$string" | tr -cd '[:alnum:][:space:]' | tr '[:space:]' '-' | tr '[:upper:]' '[:lower:]' | tr -s '-' | sed 's/-$//' | sed 's/^-//')
  echo "$slug"
}

Example usage in a content management system:

#!/bin/bash
# Generate blog post from user input

read -p "Enter blog post title: " title
slug=$(generate_slug "$title")

# Check for slug collisions
counter=1
final_slug="$slug"
while [ -f "/var/www/blog/posts/${final_slug}.html" ]; do
    final_slug="${slug}-${counter}"
    counter=$((counter + 1))
done

# Create blog post file
cat > "/var/www/blog/posts/${final_slug}.html" <<EOF
<!DOCTYPE html>
<html>
<head>
    <title>$title</title>
    <link rel="canonical" href="https://example.com/blog/${final_slug}" />
</head>
<body>
    <h1>$title</h1>
    <!-- Content here -->
</body>
</html>
EOF

echo "Blog post created: https://example.com/blog/${final_slug}"

Example inputs and outputs:

# Input: "How to Deploy Python Apps with Docker & Kubernetes"
# Output slug: "how-to-deploy-python-apps-with-docker-kubernetes"

# Input: "10 Best Practices for AWS Security (2024 Edition)"
# Output slug: "10-best-practices-for-aws-security-2024-edition"

# Input: "Understanding CPU vs. I/O Bound Operations"
# Output slug: "understanding-cpu-vs-io-bound-operations"

This function generates consistent, SEO-friendly URLs for blog posts and documentation pages across a content library.

REPLACE

This function replaces all occurrences of a specified substring with another substring.

replace () {
    local original="$1"
    local replacement="$2" 
    local input="$3" 
    echo "${input//$original/$replacement}" 
}

#Usage
result=$(replace "apple" "banana" "I like apple and apple pie.") 
echo "$result"
#Output: "I like banana and banana pie."

COUNT_WORDS

This function counts the number of words in a given string.

count_words(){
    local input="$1"
    local word_count=$(echo "$input" | wc -w)
    echo "$word_count"
}
 
count=$(count_words "Hello, how are you?") 
echo "Word count: $count"
# Output: "Word count: 4"

REMOVE_SPECIAL_CHARS

This function removes all special characters from a string.

remove_special_chars (){
    local input="$1"
    sanitized=$(echo "$input" | tr -d '[:punct:]')
    echo "$sanitized"
}

#Usage
  clean_string=$(remove_special_chars "Hello, @world!") 
  echo "$clean_string"
  #Output: "Hello world"

REVERSE_WORDS

This function reverses the order of words in a string.

reverse_words(){
    local input="$1" 
    reversed=$(echo "$input" | awk '{ for (i=NF; i>0; i--) printf("%s ",$i); print "" }') 
    echo "$reversed" 
}

#Usage
  reversed_sentence=$(reverse_words "This is a sentence.") 
  echo "$reversed_sentence"
  #Output: "sentence. a is This"

STRIP_HTML_TAGS

This function removes HTML tags from a given string.

0
strip_html_tags(){
    local input="$1"
    cleaned=$(echo "$input" | sed -e 's/<[^>]*>//g')
    echo "$cleaned"
}

#Usage
  text_without_tags=$(strip_html_tags "<p>This is <b>bold</b> text.</p>") 
  echo "$text_without_tags"
#Output: "This is bold text."

CAMEL_TO_SNAKE_CASE

This function converts a string from CamelCase to snake_case.

1
camel_to_snake_case() {
    local input="$1"
    snake_case=$(echo "$input" | sed -E 's/([a-z0-9])([A-Z])/\1_\L\2/g' | tr '[:upper:]' '[:lower:]')
    echo "$snake_case"
}
# Usage
snake_case_str=$(camel_to_snake_case "camelCaseString")
echo "$snake_case_str"  # Output: "camel_case_string"

COUNT_OCCURRENCES

This function counts the occurrences of a substring within a larger string.

2
count_occurrences() {
    local substring="$1"
    local input="$2"
    echo "$input" | grep -o "$substring" | wc -l
}
# Usage
count=$(count_occurrences "apple" "I like apple and apple pie.")
echo "Occurrences: $count"  # Output: "Occurrences: 2"

Production Function Library

Here’s a complete string functions library. Save this as string_functions.sh and source it in your scripts:

#!/bin/bash
# string_functions.sh - Reusable string manipulation library
# Author: Karandeep Singh
# Last Updated: 2026-02-20

# Whitespace trimming
ltrim() { echo "${1#"${1%%[![:space:]]*}"}"; }
rtrim() { echo "${1%"${1##*[![:space:]]}"}"; }
trim() { echo "$(rtrim "$(ltrim "$1")")"; }

# Case conversion
uppercase() { echo "$1" | tr '[:lower:]' '[:upper:]'; }
lowercase() { echo "$1" | tr '[:upper:]' '[:lower:]'; }
capitalize() { echo "$1" | sed 's/\b\([a-z]\)/\u\1/g'; }

# String info
len() { echo "${#1}"; }

# String transformation
reverse() {
    local str="$1" reversed="" len=${#str}
    for ((i=$len-1; i>=0; i--)); do
        reversed="$reversed${str:$i:1}"
    done
    echo "$reversed"
}

substitute_bash() { echo "${1//$2/$3}"; }
truncate() {
    local str="$1" len="$2"
    [ "${#str}" -gt "$len" ] && echo "${str:0:$len}..." || echo "$str"
}

# String extraction
substring() { echo "${1:$2:$3}"; }
split() {
    local IFS="$2"
    read -ra arr <<< "$1"
    echo "${arr[@]}"
}

# Utility functions
rot13() { echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m'; }
random_string() {
    local len="$1"
    openssl rand -hex $len | tr -d '\n' | cut -c1-$len
}
generate_slug() {
    echo "$1" | tr -cd '[:alnum:][:space:]' | tr '[:space:]' '-' | \
    tr '[:upper:]' '[:lower:]' | tr -s '-' | sed 's/-$//' | sed 's/^-//'
}

sanitize() {
    local str="$1" allowed="$2"
    echo "$str" | sed "s/[^[:alnum:]$allowed]//g"
}

count() { echo "$1" | awk -v FS="$2" '{print NF-1}'; }

Usage example:

3
#!/bin/bash
source /opt/scripts/string_functions.sh

# Process vendor CSV import
while IFS=',' read -r id email status; do
    id=$(trim "$id")
    email=$(lowercase "$(trim "$email")")
    status=$(uppercase "$(trim "$status")")

    [ $(len "$email") -gt 255 ] && continue

    echo "INSERT INTO users VALUES ($id, '$email', '$status');"
done < vendor_data.csv > import.sql

Lessons Learned

Key takeaways from using these functions in ETL pipelines and log processing:

Performance Matters

  • Bash parameter expansion is 5-10x faster than sed for simple operations
  • For bulk processing (100K+ lines), use awk instead of while loops
  • Avoid spawning external processes in tight loops

Error Handling is Critical

  • Always validate input before string operations
  • Check string length before substring extraction
  • Handle empty strings explicitly

Security Considerations

  • Never use unsanitized user input in SQL
  • Be careful with substitute - it doesn’t escape regex by default
  • ROT13 is obfuscation, not encryption
  • Use strong random sources (openssl, not $RANDOM)

Where These Functions Help

Applied consistently, these string functions can:

  • Speed up ETL processing: parameter expansion avoids spawning external processes
  • Reduce import errors: trimming and validation catch whitespace and field-length issues
  • Harden file handling: sanitizing input guards against path traversal
  • Improve data quality: case normalization avoids duplicate accounts

The key insight: simple string manipulation functions, when applied consistently across data pipelines, eliminate entire classes of data quality problems.

This is part of the Advanced Bash String Operations series.

Question

What string manipulation challenges have you encountered in production data pipelines?

Similar Articles

More from devops