In 2021, I joined a Calgary-based infrastructure team where deployments took 4 hours and involved 23 manual steps. By 2024, deployments took 12 minutes and were fully automated with bash scripts. This article is that three-year journey condensed: every mistake I made, every trap I hit, every technique I learned.
No theory. Just the actual progression from echo "hello world" to production-grade automation.
Step 1: The Echo Script (Everyone Starts Here)
Your first bash script is always the same:
#!/bin/bash
echo "hello world"
Save as hello.sh:
chmod +x hello.sh
./hello.sh
Output:
hello world
Great. Now make it slightly useful. Print the current user:
#!/bin/bash
echo "hello world"
echo "Current user: $(whoami)"
echo "Current directory: $(pwd)"
./hello.sh
Output:
hello world
Current user: john
Current directory: /home/john
$(command) runs the command and substitutes its output. This is command substitution. You’ll use it constantly.
Step 2: Variables (And the Quoting Bug)
Variables make scripts flexible. Pass a name:
#!/bin/bash
NAME="John"
echo "Hello, $NAME"
Output:
Hello, John
Now accept it as an argument:
#!/bin/bash
NAME=$1
echo "Hello, $NAME"
./hello.sh Sarah
Output:
Hello, Sarah
$1 is the first argument. $2 is the second. $0 is the script name. $# is the count of arguments.
Now try a name with a space:
./hello.sh "John Smith"
Output:
Hello, John Smith
Works. But try this:
#!/bin/bash
NAME=$1
echo Hello, $NAME
./hello.sh "John Smith"
Output:
Hello, John
Lost “Smith”. Why? When you write echo Hello, $NAME without quotes, bash expands $NAME to John Smith, then splits it on whitespace. The command becomes echo Hello, John Smith. Echo prints all arguments separated by spaces: Hello, John Smith. But you want the comma to stay attached to Hello. The issue is the unquoted variable causes word splitting.
The fix: always quote variables:
echo "Hello, $NAME"
Rule: Quote every variable unless you specifically want word splitting.
Step 3: Error Handling (The set -e Trap)
Your script runs commands. Commands fail. Handle errors:
#!/bin/bash
NAME=$1
if [ -z "$NAME" ]; then
echo "Error: no name provided"
exit 1
fi
echo "Hello, $NAME"
./hello.sh
Output:
Error: no name provided
[ -z "$NAME" ] checks if the string is empty. -z means “zero length”. Exit code 1 indicates error. Exit code 0 indicates success.
Now let’s make the script do something real: create a backup directory and copy files.
#!/bin/bash
set -e
BACKUP_DIR="/tmp/backup-$(date +%Y%m%d)"
SOURCE_DIR="/home/user/documents"
echo "Creating backup directory: $BACKUP_DIR"
mkdir "$BACKUP_DIR"
echo "Copying files from $SOURCE_DIR"
cp -r "$SOURCE_DIR" "$BACKUP_DIR"
echo "Backup complete"
set -e means “exit immediately if any command fails”. This is the standard error handling approach.
Run it:
./backup.sh
Output:
Creating backup directory: /tmp/backup-20260220
Copying files from /home/user/documents
Backup complete
Works. But what if /home/user/documents doesn’t exist?
./backup.sh
Output:
Creating backup directory: /tmp/backup-20260220
cp: cannot stat '/home/user/documents': No such file or directory
The script exited (because of set -e), but the backup directory was already created. Now /tmp/backup-20260220 is empty. Next time you run it, mkdir will fail because the directory exists.
Step 4: The set -e Bug (Pipelines Fail Silently)
You add logging to the backup script:
#!/bin/bash
set -e
BACKUP_DIR="/tmp/backup-$(date +%Y%m%d)"
SOURCE_DIR="/home/user/documents"
echo "Creating backup directory: $BACKUP_DIR"
mkdir "$BACKUP_DIR"
echo "Copying files from $SOURCE_DIR"
cp -r "$SOURCE_DIR" "$BACKUP_DIR" | tee -a backup.log
echo "Backup complete"
tee -a backup.log writes to both stdout and the log file.
Test with a non-existent source:
SOURCE_DIR="/nonexistent"
./backup.sh
Output:
Creating backup directory: /tmp/backup-20260220
Copying files from /nonexistent
cp: cannot stat '/nonexistent': No such file or directory
Backup complete
Wait. cp failed, but the script printed “Backup complete”. The script didn’t exit.
Why: In a pipeline (cmd1 | cmd2), set -e only checks the exit code of the last command (tee). If cp fails but tee succeeds, the pipeline succeeds. This is a bash trap that breaks thousands of scripts.
The fix: Use set -o pipefail.
#!/bin/bash
set -e
set -o pipefail
BACKUP_DIR="/tmp/backup-$(date +%Y%m%d)"
SOURCE_DIR="/home/user/documents"
echo "Creating backup directory: $BACKUP_DIR"
mkdir "$BACKUP_DIR"
echo "Copying files from $SOURCE_DIR"
cp -r "$SOURCE_DIR" "$BACKUP_DIR" | tee -a backup.log
echo "Backup complete"
Now test:
SOURCE_DIR="/nonexistent"
./backup.sh
Output:
Creating backup directory: /tmp/backup-20260220
Copying files from /nonexistent
cp: cannot stat '/nonexistent': No such file or directory
Script exited. “Backup complete” didn’t print. set -o pipefail makes the pipeline fail if any command in the pipeline fails.
Rule: Always use set -e and set -o pipefail together.
Step 5: Debugging (set -x Shows Everything)
Your backup script fails mysteriously. Add set -x:
#!/bin/bash
set -e
set -o pipefail
set -x
BACKUP_DIR="/tmp/backup-$(date +%Y%m%d)"
SOURCE_DIR="/home/user/documents"
echo "Creating backup directory: $BACKUP_DIR"
mkdir "$BACKUP_DIR"
echo "Copying files from $SOURCE_DIR"
cp -r "$SOURCE_DIR" "$BACKUP_DIR"
echo "Backup complete"
Run it:
./backup.sh
Output:
+ BACKUP_DIR=/tmp/backup-20260220
+ SOURCE_DIR=/home/user/documents
+ echo 'Creating backup directory: /tmp/backup-20260220'
Creating backup directory: /tmp/backup-20260220
+ mkdir /tmp/backup-20260220
+ echo 'Copying files from /home/user/documents'
Copying files from /home/user/documents
+ cp -r /home/user/documents /tmp/backup-20260220
+ echo 'Backup complete'
Backup complete
set -x prints every command before executing it. Lines starting with + are the commands being run. This is how you debug bash scripts.
Step 6: Functions (Reusable Code Blocks)
The backup script works. Now you need a deploy script. Copy-paste the backup code? No. Use functions.
#!/bin/bash
set -e
set -o pipefail
backup() {
local source=$1
local dest=$2
echo "Backing up $source to $dest"
mkdir -p "$dest"
cp -r "$source" "$dest"
echo "Backup complete"
}
# Use the function
backup "/home/user/documents" "/tmp/backup-$(date +%Y%m%d)"
backup "/home/user/photos" "/tmp/photos-$(date +%Y%m%d)"
local makes the variable function-scoped. Without it, source and dest are global and can conflict with other variables.
Step 7: Error Messages (Make Failures Clear)
The backup function fails, but the error is cryptic:
./backup.sh
Output:
Backing up /nonexistent to /tmp/backup-20260220
cp: cannot stat '/nonexistent': No such file or directory
Better error message:
backup() {
local source=$1
local dest=$2
if [ ! -d "$source" ]; then
echo "Error: source directory does not exist: $source" >&2
return 1
fi
echo "Backing up $source to $dest"
mkdir -p "$dest"
cp -r "$source" "$dest"
echo "Backup complete"
}
>&2 sends the error message to stderr instead of stdout. This is important for scripts that parse output.
Step 8: The Production Deployment Script
Now build a real deployment script. Requirements:
- Build the application
- Run tests
- Stop the old service
- Deploy new files
- Start the new service
- Health check
- Rollback on failure
#!/bin/bash
set -e
set -o pipefail
APP_NAME="myapp"
DEPLOY_DIR="/opt/$APP_NAME"
BACKUP_DIR="/opt/$APP_NAME.backup"
BUILD_DIR="./build"
log() {
echo "[$(date +%Y-%m-%d\ %H:%M:%S)] $*"
}
error() {
echo "[$(date +%Y-%m-%d\ %H:%M:%S)] ERROR: $*" >&2
}
build() {
log "Building application..."
npm install
npm run build
if [ ! -d "$BUILD_DIR" ]; then
error "Build failed: $BUILD_DIR not found"
return 1
fi
log "Build complete"
}
run_tests() {
log "Running tests..."
npm test || {
error "Tests failed"
return 1
}
log "Tests passed"
}
backup_current() {
log "Backing up current deployment..."
if [ -d "$DEPLOY_DIR" ]; then
rm -rf "$BACKUP_DIR"
cp -r "$DEPLOY_DIR" "$BACKUP_DIR"
log "Backup complete"
else
log "No existing deployment to backup"
fi
}
deploy() {
log "Deploying new version..."
# Stop service
log "Stopping service..."
sudo systemctl stop "$APP_NAME" || true
# Deploy files
log "Copying files..."
rm -rf "$DEPLOY_DIR"
mkdir -p "$DEPLOY_DIR"
cp -r "$BUILD_DIR"/* "$DEPLOY_DIR"/
# Start service
log "Starting service..."
sudo systemctl start "$APP_NAME"
log "Deployment complete"
}
health_check() {
log "Running health check..."
sleep 5
for i in {1..10}; do
if curl -sf http://localhost:8080/health > /dev/null; then
log "Health check passed"
return 0
fi
log "Health check attempt $i failed, retrying..."
sleep 2
done
error "Health check failed after 10 attempts"
return 1
}
rollback() {
error "Deployment failed, rolling back..."
if [ -d "$BACKUP_DIR" ]; then
log "Restoring previous version..."
sudo systemctl stop "$APP_NAME" || true
rm -rf "$DEPLOY_DIR"
cp -r "$BACKUP_DIR" "$DEPLOY_DIR"
sudo systemctl start "$APP_NAME"
log "Rollback complete"
else
error "No backup found, cannot rollback"
fi
}
main() {
log "Starting deployment of $APP_NAME"
build || exit 1
run_tests || exit 1
backup_current || exit 1
deploy || { rollback; exit 1; }
if ! health_check; then
rollback
exit 1
fi
log "Deployment successful"
}
main "$@"
This script:
- Builds the app
- Runs tests
- Backs up the current version
- Deploys
- Health checks
- Rolls back on failure
Results:
- Before: 4 hours, 23 manual steps, 30% failure rate
- After: 12 minutes, 1 command, 2% failure rate (only when tests genuinely fail)
The 2% failure rate is expected — tests failing means don’t deploy.
What We Built
Starting from echo, we built a production deployment script:
- Echo script → Added variables → Encountered quoting bug
- Fixed quoting → Added error handling → Hit the
set -epipeline trap - Fixed with
set -o pipefail→ Added debugging → Learnedset -x - Added functions → Made code reusable → Built deployment script
- Production deploy → Reduced deployment time from 4hr to 12min
Cheat Sheet
Script header (always use these):
#!/bin/bash
set -e # Exit on error
set -o pipefail # Exit on pipe failure
set -u # Exit on undefined variable
Variables:
NAME="value" # Set variable
echo "$NAME" # Use variable (always quote)
echo "${NAME}_suffix" # Concat with string
Arguments:
$0 # Script name
$1 # First argument
$# # Number of arguments
$@ # All arguments
Conditionals:
if [ -f "file.txt" ]; then # File exists
if [ -d "dir" ]; then # Directory exists
if [ -z "$VAR" ]; then # String is empty
if [ "$A" = "$B" ]; then # Strings equal
if [ "$A" -eq "$B" ]; then # Numbers equal
Functions:
func() {
local var=$1 # Local variable
echo "value"
return 0 # Success
}
result=$(func "arg") # Capture output
Loops:
for file in *.txt; do
echo "$file"
done
for i in {1..10}; do
echo "$i"
done
while read line; do
echo "$line"
done < file.txt
Error handling:
command || {
echo "Command failed"
exit 1
}
Debugging:
set -x # Print commands
set +x # Stop printing
Common Mistakes and Fixes
1. Unquoted variables:
# Wrong
cp $FILE $DEST
# Right
cp "$FILE" "$DEST"
2. Missing error handling:
# Wrong
#!/bin/bash
command1
command2
# Right
#!/bin/bash
set -e
set -o pipefail
command1
command2
3. Using cd without checking:
# Wrong
cd /some/dir
rm -rf *
# Right
cd /some/dir || exit 1
rm -rf *
4. Not using functions:
# Wrong: Copy-pasted code everywhere
# Right: Extract to function
check_file() {
local file=$1
if [ ! -f "$file" ]; then
echo "Error: $file not found" >&2
return 1
fi
}
Keep Reading
- Bash Filename Extraction: From basename to 500K Files/Day — advanced parameter expansion techniques
- Task Automation: From Cron to a Go Task Runner — when bash scripts grow into schedulers
- Sed Cheat Sheet: 30 Essential One-Liners — text processing power for bash scripts