Build a multi-container app with Docker Compose, then build images with Docker Bake and push them to …
Bash String Validation, Generation & a Library Bash String Validation, Generation & a Library

Summary
This part covers the heavier-duty string utilities: generating unique identifiers, sanitizing untrusted input, parsing CSV files, scoring password strength, and building URL slugs. It closes with a complete, sourceable string-functions library and the lessons learned from running these in production.
Additional Utility Functions
Here are additional helper functions that solve specific problems:
REPEAT: Generate Test Data
repeat() {
local str="$1" count="$2"
for ((i=1; i<=$count; i++)); do echo -n "$str"; done
echo
}
# Generate separator lines in reports
repeat "=" 80 # Outputs 80 equal signs
String Case Conversion
# CamelCase to snake_case (for API field mapping)
camel_to_snake_case() {
echo "$1" | sed -E 's/([a-z0-9])([A-Z])/\1_\L\2/g' | tr '[:upper:]' '[:lower:]'
}
# Example: UserId -> user_id
$ camel_to_snake_case "UserId"
user_id
Word Operations
# Count words (for content analysis)
count_words() {
echo "$1" | wc -w
}
# Reverse word order (for RTL language processing)
reverse_words() {
echo "$1" | awk '{ for (i=NF; i>0; i--) printf("%s ",$i); print "" }'
}
HTML/Special Character Handling
# Strip HTML tags (for plain text email generation)
strip_html_tags() {
echo "$1" | sed -e 's/<[^>]*>//g'
}
# Remove special characters (for filename generation)
remove_special_chars() {
echo "$1" | tr -d '[:punct:]'
}
These utility functions handle edge cases in content processing, particularly when generating plain-text email notifications from HTML templates.
Expand your knowledge with Bash String Functions: Search, Split, Count, Extract
RANDOM_STRING: Generating Unique Identifiers
The random_string function generates cryptographically random strings for unique IDs:
random_string() {
local len="$1"
local random_bytes="$(openssl rand -hex $len | tr -d '\n')"
echo "${random_bytes:0:len}"
}
Example usage in a session management system:
# Generate unique session tokens for user authentication
function create_session {
local user_id="$1"
local session_token=$(random_string 32)
local expires_at=$(date -d '+24 hours' '+%Y-%m-%d %H:%M:%S')
# Store session in Redis
redis-cli SETEX "session:$session_token" 86400 "$user_id" > /dev/null
echo "$session_token"
}
# Generate temporary file paths
temp_file="/tmp/upload_$(random_string 16).tmp"
This is preferable to using $RANDOM, which has insufficient entropy and can produce session token collisions. The openssl-based approach draws from a stronger random source.
Deepen your understanding in Building a URL Shortener: From Linux Networking to Go
SANITIZE: Input Validation
The sanitize function removes potentially dangerous characters from user input:
sanitize() {
local str="$1"
local allowed="$2"
local sanitized=$(echo "$str" | sed "s/[^[:alnum:]$allowed]//g")
echo "$sanitized"
}
Used in filename generation from user input to prevent directory traversal:
# User uploads file, we need to create safe filename
user_provided_name="../../etc/passwd" # Malicious input
# Sanitize allowing only alphanumeric, dash, underscore, dot
safe_filename=$(sanitize "$user_provided_name" "._-")
# Result: "etcpasswd"
# Generate final filename with random prefix
final_filename="$(random_string 8)_${safe_filename}"
# Result: "a7f2d9e1_etcpasswd"
upload_path="/var/uploads/$final_filename"
Sanitizing input this way guards a file upload endpoint against path traversal, where unsanitized filenames could otherwise escape the intended directory.
Explore this further in Advanced Bash String Operations
PARSE_CSV: Production CSV Processing
The parse_csv function processes CSV files with custom delimiters:
parse_csv() {
local file="$1"
local delimiter="${2:-,}"
local line_num=0
while IFS="$delimiter" read -ra fields; do
line_num=$((line_num + 1))
# Skip header row
if [ $line_num -eq 1 ]; then
continue
fi
# Trim all fields
for i in "${!fields[@]}"; do
fields[$i]=$(trim "${fields[$i]}")
done
# Process fields (example: insert into database)
echo "Line $line_num: ${#fields[@]} fields -> ${fields[@]}"
done < "$file"
}
Example usage in a daily ETL pipeline:
#!/bin/bash
# Daily vendor data import - runs at 2 AM via cron
# Processes 50,000+ rows from multiple vendors
source /opt/scripts/string_functions.sh
for csv_file in /data/imports/*.csv; do
echo "Processing: $csv_file"
line_count=0
error_count=0
while IFS=',' read -r id email status balance; do
line_count=$((line_count + 1))
# Skip header
if [ $line_count -eq 1 ]; then
continue
fi
# Trim and validate
id=$(trim "$id")
email=$(trim "$email" | lowercase)
status=$(trim "$status" | uppercase)
balance=$(trim "$balance")
# Validate required fields
if [ -z "$id" ] || [ -z "$email" ]; then
echo "ERROR: Line $line_count missing required fields" >&2
error_count=$((error_count + 1))
continue
fi
# Generate SQL
echo "INSERT INTO customers (id, email, status, balance) VALUES ($id, '$email', '$status', $balance) ON CONFLICT (id) DO UPDATE SET email='$email', status='$status', balance=$balance;"
done < "$csv_file" > "/tmp/import_$(basename "$csv_file" .csv).sql"
echo "Processed $line_count lines from $csv_file ($error_count errors)"
done
A pipeline like this handles vendor data from multiple sources, each with slightly different CSV formats (some with pipe delimiters, some with tabs). The trim and normalization functions help ensure clean data entry across sources.
Discover related concepts in Filename Extraction: basename to a Production File Pipeline
CHECK_PASSWORD_STRENGTH: User Account Security
The check_password_strength function validates passwords during user registration:
check_password_strength() {
local password="$1"
local length=${#password}
local upper=$(echo "$password" | grep -o "[A-Z]" | sort -u | wc -l)
local lower=$(echo "$password" | grep -o "[a-z]" | sort -u | wc -l)
local digits=$(echo "$password" | grep -o "[0-9]" | sort -u | wc -l)
local special=$(echo "$password" | grep -o "[^a-zA-Z0-9]" | sort -u | wc -l)
# Score based on password requirements
local score=0
# Length check (minimum 12 characters)
if [ $length -ge 12 ]; then
score=$((score + 3))
elif [ $length -ge 8 ]; then
score=$((score + 1))
fi
# Character variety
[ $upper -gt 0 ] && score=$((score + 1))
[ $lower -gt 0 ] && score=$((score + 1))
[ $digits -gt 0 ] && score=$((score + 1))
[ $special -gt 0 ] && score=$((score + 2))
# Return score and recommendation
if [ $score -lt 4 ]; then
echo "WEAK|Password must be at least 12 characters with uppercase, lowercase, digit, and special character"
return 1
elif [ $score -lt 6 ]; then
echo "MODERATE|Consider adding more character variety"
return 0
else
echo "STRONG|Password meets security requirements"
return 0
fi
}
Example usage in a user registration script:
# User registration validation
read -sp "Enter password: " password
echo
result=$(check_password_strength "$password")
status="${result%%|*}"
message="${result##*|}"
if [ "$status" = "WEAK" ]; then
echo "ERROR: $message" >&2
exit 1
fi
if [ "$status" = "MODERATE" ]; then
echo "WARNING: $message" >&2
read -p "Continue anyway? (yes/no): " confirm
if [ "$confirm" != "yes" ]; then
exit 1
fi
fi
# Password accepted, proceed with account creation
echo "Password strength: $status"
Enforcing password strength at registration reduces account lockouts caused by users forgetting weak passwords.
Uncover more details in AWS Security Audit: From AWS CLI to a Go Security Scanner
GENERATE_SLUG: URL Generation for Dynamic Content
The generate_slug function creates SEO-friendly URLs from user content:
generate_slug() {
local string="$1"
local slug=$(echo "$string" | tr -cd '[:alnum:][:space:]' | tr '[:space:]' '-' | tr '[:upper:]' '[:lower:]' | tr -s '-' | sed 's/-$//' | sed 's/^-//')
echo "$slug"
}
Example usage in a content management system:
#!/bin/bash
# Generate blog post from user input
read -p "Enter blog post title: " title
slug=$(generate_slug "$title")
# Check for slug collisions
counter=1
final_slug="$slug"
while [ -f "/var/www/blog/posts/${final_slug}.html" ]; do
final_slug="${slug}-${counter}"
counter=$((counter + 1))
done
# Create blog post file
cat > "/var/www/blog/posts/${final_slug}.html" <<EOF
<!DOCTYPE html>
<html>
<head>
<title>$title</title>
<link rel="canonical" href="https://example.com/blog/${final_slug}" />
</head>
<body>
<h1>$title</h1>
<!-- Content here -->
</body>
</html>
EOF
echo "Blog post created: https://example.com/blog/${final_slug}"
Example inputs and outputs:
# Input: "How to Deploy Python Apps with Docker & Kubernetes"
# Output slug: "how-to-deploy-python-apps-with-docker-kubernetes"
# Input: "10 Best Practices for AWS Security (2024 Edition)"
# Output slug: "10-best-practices-for-aws-security-2024-edition"
# Input: "Understanding CPU vs. I/O Bound Operations"
# Output slug: "understanding-cpu-vs-io-bound-operations"
This function generates consistent, SEO-friendly URLs for blog posts and documentation pages across a content library.
Journey deeper into this topic with Jenkins UserRemoteConfig: Dynamic Git in Pipelines
REPLACE
This function replaces all occurrences of a specified substring with another substring.
Enrich your learning with How to Replace Text in Multiple Files with Sed
replace () {
local original="$1"
local replacement="$2"
local input="$3"
echo "${input//$original/$replacement}"
}
#Usage
result=$(replace "apple" "banana" "I like apple and apple pie.")
echo "$result"
#Output: "I like banana and banana pie."
COUNT_WORDS
This function counts the number of words in a given string.
Gain comprehensive insights from Sed in CI/CD: Safe Patterns for GitHub Actions and Jenkins
count_words(){
local input="$1"
local word_count=$(echo "$input" | wc -w)
echo "$word_count"
}
count=$(count_words "Hello, how are you?")
echo "Word count: $count"
# Output: "Word count: 4"
REMOVE_SPECIAL_CHARS
This function removes all special characters from a string.
Master this concept through Sed Multiline Patterns: How to Match Across Lines
remove_special_chars (){
local input="$1"
sanitized=$(echo "$input" | tr -d '[:punct:]')
echo "$sanitized"
}
#Usage
clean_string=$(remove_special_chars "Hello, @world!")
echo "$clean_string"
#Output: "Hello world"
REVERSE_WORDS
This function reverses the order of words in a string.
Delve into specifics at Sed vs Awk vs Grep: When to Use Which (with Decision Matrix)
reverse_words(){
local input="$1"
reversed=$(echo "$input" | awk '{ for (i=NF; i>0; i--) printf("%s ",$i); print "" }')
echo "$reversed"
}
#Usage
reversed_sentence=$(reverse_words "This is a sentence.")
echo "$reversed_sentence"
#Output: "sentence. a is This"
STRIP_HTML_TAGS
This function removes HTML tags from a given string.
Deepen your understanding in Building a URL Shortener: From Linux Networking to Go
strip_html_tags(){
local input="$1"
cleaned=$(echo "$input" | sed -e 's/<[^>]*>//g')
echo "$cleaned"
}
#Usage
text_without_tags=$(strip_html_tags "<p>This is <b>bold</b> text.</p>")
echo "$text_without_tags"
#Output: "This is bold text."
CAMEL_TO_SNAKE_CASE
This function converts a string from CamelCase to snake_case.
Deepen your understanding in Building a URL Shortener: From Linux Networking to Go
camel_to_snake_case() {
local input="$1"
snake_case=$(echo "$input" | sed -E 's/([a-z0-9])([A-Z])/\1_\L\2/g' | tr '[:upper:]' '[:lower:]')
echo "$snake_case"
}
# Usage
snake_case_str=$(camel_to_snake_case "camelCaseString")
echo "$snake_case_str" # Output: "camel_case_string"
COUNT_OCCURRENCES
This function counts the occurrences of a substring within a larger string.
Deepen your understanding in Building a URL Shortener: From Linux Networking to Go
count_occurrences() {
local substring="$1"
local input="$2"
echo "$input" | grep -o "$substring" | wc -l
}
# Usage
count=$(count_occurrences "apple" "I like apple and apple pie.")
echo "Occurrences: $count" # Output: "Occurrences: 2"
Production Function Library
Here’s a complete string functions library. Save this as string_functions.sh and source it in your scripts:
#!/bin/bash
# string_functions.sh - Reusable string manipulation library
# Author: Karandeep Singh
# Last Updated: 2026-02-20
# Whitespace trimming
ltrim() { echo "${1#"${1%%[![:space:]]*}"}"; }
rtrim() { echo "${1%"${1##*[![:space:]]}"}"; }
trim() { echo "$(rtrim "$(ltrim "$1")")"; }
# Case conversion
uppercase() { echo "$1" | tr '[:lower:]' '[:upper:]'; }
lowercase() { echo "$1" | tr '[:upper:]' '[:lower:]'; }
capitalize() { echo "$1" | sed 's/\b\([a-z]\)/\u\1/g'; }
# String info
len() { echo "${#1}"; }
# String transformation
reverse() {
local str="$1" reversed="" len=${#str}
for ((i=$len-1; i>=0; i--)); do
reversed="$reversed${str:$i:1}"
done
echo "$reversed"
}
substitute_bash() { echo "${1//$2/$3}"; }
truncate() {
local str="$1" len="$2"
[ "${#str}" -gt "$len" ] && echo "${str:0:$len}..." || echo "$str"
}
# String extraction
substring() { echo "${1:$2:$3}"; }
split() {
local IFS="$2"
read -ra arr <<< "$1"
echo "${arr[@]}"
}
# Utility functions
rot13() { echo "$1" | tr 'A-Za-z' 'N-ZA-Mn-za-m'; }
random_string() {
local len="$1"
openssl rand -hex $len | tr -d '\n' | cut -c1-$len
}
generate_slug() {
echo "$1" | tr -cd '[:alnum:][:space:]' | tr '[:space:]' '-' | \
tr '[:upper:]' '[:lower:]' | tr -s '-' | sed 's/-$//' | sed 's/^-//'
}
sanitize() {
local str="$1" allowed="$2"
echo "$str" | sed "s/[^[:alnum:]$allowed]//g"
}
count() { echo "$1" | awk -v FS="$2" '{print NF-1}'; }
Usage example:
Deepen your understanding in Building a URL Shortener: From Linux Networking to Go
#!/bin/bash
source /opt/scripts/string_functions.sh
# Process vendor CSV import
while IFS=',' read -r id email status; do
id=$(trim "$id")
email=$(lowercase "$(trim "$email")")
status=$(uppercase "$(trim "$status")")
[ $(len "$email") -gt 255 ] && continue
echo "INSERT INTO users VALUES ($id, '$email', '$status');"
done < vendor_data.csv > import.sql
Lessons Learned
Key takeaways from using these functions in ETL pipelines and log processing:
Performance Matters
- Bash parameter expansion is 5-10x faster than sed for simple operations
- For bulk processing (100K+ lines), use awk instead of while loops
- Avoid spawning external processes in tight loops
Error Handling is Critical
- Always validate input before string operations
- Check string length before substring extraction
- Handle empty strings explicitly
Security Considerations
- Never use unsanitized user input in SQL
- Be careful with substitute - it doesn’t escape regex by default
- ROT13 is obfuscation, not encryption
- Use strong random sources (openssl, not $RANDOM)
Where These Functions Help
Applied consistently, these string functions can:
- Speed up ETL processing: parameter expansion avoids spawning external processes
- Reduce import errors: trimming and validation catch whitespace and field-length issues
- Harden file handling: sanitizing input guards against path traversal
- Improve data quality: case normalization avoids duplicate accounts
The key insight: simple string manipulation functions, when applied consistently across data pipelines, eliminate entire classes of data quality problems.
This is part of the Advanced Bash String Operations series.
What string manipulation challenges have you encountered in production data pipelines?
Similar Articles
Related Content
More from devops
Set up a Kubernetes cluster on AWS EKS with eksctl: prerequisites, one-command cluster creation, …
Kubernetes CrashLoopBackOff explained: a workflow to diagnose it and fix the six most common causes, …
You Might Also Like
Practical sed patterns for log analysis: extract errors, filter time ranges, anonymize PII, parse …
The sed gotchas that bite in production: GNU vs BSD differences, in-place editing safety, escape …
Use sed safely in CI/CD pipelines: idempotent edits, exit-code checks, dry-run patterns, and the …
Contents
- Additional Utility Functions
- RANDOM_STRING: Generating Unique Identifiers
- SANITIZE: Input Validation
- PARSE_CSV: Production CSV Processing
- CHECK_PASSWORD_STRENGTH: User Account Security
- GENERATE_SLUG: URL Generation for Dynamic Content
- REPLACE
- COUNT_WORDS
- REMOVE_SPECIAL_CHARS
- REVERSE_WORDS
- STRIP_HTML_TAGS
- CAMEL_TO_SNAKE_CASE
- COUNT_OCCURRENCES
- Production Function Library
- Lessons Learned

