Build a log aggregator in Go from scratch. Tail files with inotify, survive log rotation, parse …
Task Automation: From Cron and Make to a Go Task Runner Task Automation: From Cron and Make to a Go Task Runner

Summary
Every Linux system runs scheduled jobs. Backups, log rotation, health checks, deployments. Most of these start as cron jobs. Some grow into Makefiles. A few become tangled shell scripts that nobody wants to touch.
This article walks through Linux task automation from the ground up. You will use cron, at, make, and systemd timers. At each step, you will build the same feature in Go. By the end, you will have a working task runner that schedules jobs, handles dependencies, retries failures, and prints a status dashboard.
All Go code uses only the standard library. Every code block compiles and runs.
Step 1: Scheduling with Cron
Cron is the oldest task scheduler on Linux. It runs jobs at fixed intervals. The cron daemon reads a table of jobs (the crontab) and executes them on schedule.
Linux: Working with Crontab
List your current cron jobs:
crontab -l
If you have never set one up, you will see no crontab for <username>. Edit the crontab:
crontab -e
This opens a file in your default editor. Each line has six fields: five time fields and a command.
# minute hour day month weekday command
0 2 * * * /home/user/backup.sh
The five time fields are:
- Minute: 0-59
- Hour: 0-23
- Day of month: 1-31
- Month: 1-12
- Day of week: 0-7 (0 and 7 are both Sunday)
Common patterns:
# Every 5 minutes
*/5 * * * * /usr/local/bin/health-check.sh
# Daily at 2:00 AM
0 2 * * * /home/user/backup.sh
# Every Sunday at midnight
0 0 * * 0 /home/user/weekly-report.sh
# First day of every month at 6:00 AM
0 6 1 * * /home/user/monthly-cleanup.sh
List only active jobs (skip comments):
crontab -l | grep -v '^#'
A * means “every value.” The / means “every N.” So */5 in the minute field means every 5 minutes. The value 0 in the minute field means exactly minute zero.
Go: Parsing Cron Expressions
Now build a cron parser in Go. The goal is to take a cron expression like */5 * * * * and check if a given time matches.
Start with a struct to hold a parsed cron schedule:
package main
import (
"fmt"
"strconv"
"strings"
"time"
)
type CronSchedule struct {
Minute []int
Hour []int
Day []int
Month []int
Weekday []int
}
func expandField(field string, min, max int) []int {
var values []int
parts := strings.Split(field, "/")
step := 1
if len(parts) == 2 {
s, err := strconv.Atoi(parts[1])
if err == nil {
step = s
}
}
base := parts[0]
if base == "*" {
for i := min; i <= max; i += step {
values = append(values, i)
}
return values
}
// Single number
num, err := strconv.Atoi(base)
if err == nil {
values = append(values, num)
}
return values
}
func parseCron(expr string) CronSchedule {
fields := strings.Fields(expr)
return CronSchedule{
Minute: expandField(fields[0], 0, 59),
Hour: expandField(fields[1], 0, 23),
Day: expandField(fields[2], 1, 31),
Month: expandField(fields[3], 1, 12),
Weekday: expandField(fields[4], 0, 6),
}
}
func contains(slice []int, val int) bool {
for _, v := range slice {
if v == val {
return true
}
}
return false
}
func (cs CronSchedule) Matches(t time.Time) bool {
return contains(cs.Minute, t.Minute()) &&
contains(cs.Hour, t.Hour()) &&
contains(cs.Day, t.Day()) &&
contains(cs.Month, int(t.Month())) &&
contains(cs.Weekday, int(t.Weekday()))
}
func main() {
schedule := parseCron("*/5 * * * *")
now := time.Now()
fmt.Printf("Current time: %s\n", now.Format("15:04"))
fmt.Printf("Matches */5 * * * *: %v\n", schedule.Matches(now))
twoAM := parseCron("0 2 * * *")
testTime := time.Date(2025, 1, 15, 2, 0, 0, 0, time.Local)
fmt.Printf("2:00 AM matches '0 2 * * *': %v\n", twoAM.Matches(testTime))
}
Run it:
go run main.go
Current time: 14:35
Matches */5 * * * *: true
2:00 AM matches '0 2 * * *': true
This works for */5 and for 0. But there is a bug.
The Bug: Bare * After Split
Try parsing * * * * * (every minute). The expandField function splits on /. For the field *, the split produces ["*"]. That has length 1, so step stays at 1. The base is *, so it loops from min to max with step 1. That works.
But now try 5 * * * * (minute 5 only). The split produces ["5"]. The base is "5", not "*". It parses the number 5. That works too.
Now try */5. The split produces ["*", "5"]. Base is "*", step is 5. It generates [0, 5, 10, 15, ...]. Correct.
So where is the bug? Try 1-5 (minutes 1 through 5). Cron supports ranges, but expandField does not handle them. Pass 1-5 and the base is "1-5". The strconv.Atoi call fails. The function returns an empty slice. The schedule never matches.
Test it:
func main() {
schedule := parseCron("1-5 * * * *")
testTime := time.Date(2025, 1, 15, 10, 3, 0, 0, time.Local)
fmt.Printf("Minute 3 matches '1-5 * * * *': %v\n", schedule.Matches(testTime))
}
Minute 3 matches '1-5 * * * *': false
That is wrong. Minute 3 is in the range 1-5.
The Fix: Handle Ranges
Add range parsing to expandField:
func expandField(field string, min, max int) []int {
var values []int
parts := strings.Split(field, "/")
step := 1
if len(parts) == 2 {
s, err := strconv.Atoi(parts[1])
if err == nil {
step = s
}
}
base := parts[0]
if base == "*" {
for i := min; i <= max; i += step {
values = append(values, i)
}
return values
}
// Range: "1-5"
if strings.Contains(base, "-") {
rangeParts := strings.Split(base, "-")
low, err1 := strconv.Atoi(rangeParts[0])
high, err2 := strconv.Atoi(rangeParts[1])
if err1 == nil && err2 == nil {
for i := low; i <= high; i += step {
values = append(values, i)
}
}
return values
}
// Single number
num, err := strconv.Atoi(base)
if err == nil {
values = append(values, num)
}
return values
}
Now test again:
func main() {
schedule := parseCron("1-5 * * * *")
testTime := time.Date(2025, 1, 15, 10, 3, 0, 0, time.Local)
fmt.Printf("Minute 3 matches '1-5 * * * *': %v\n", schedule.Matches(testTime))
}
Minute 3 matches '1-5 * * * *': true
Correct. The parser now handles *, single numbers, */N, and ranges.
Expand your knowledge with How to Replace Text in Multiple Files with Sed: A Step-by-Step Guide
Step 2: One-Off Tasks with at and batch
Cron is for repeating jobs. For one-time tasks, Linux has at.
Linux: Scheduling One-Off Jobs
Schedule a command to run at 2:00 AM:
echo "/home/user/backup.sh" | at 2:00 AM
Schedule a job 30 minutes from now:
echo "/home/user/report.sh" | at now + 30 minutes
List pending jobs:
atq
Output looks like this:
3 Wed Feb 5 02:00:00 2025 a user
4 Wed Feb 5 14:55:00 2025 a user
Remove a job by its number:
atrm 3
The batch command is similar to at, but it waits until the system load drops below 1.5 before running the job. This is useful for heavy tasks that should not compete with production workloads:
echo "/home/user/heavy-analysis.sh" | batch
Go: Building a Job Queue
Build a simple job queue that accepts commands with future execution times.
package main
import (
"context"
"fmt"
"os/exec"
"sync"
"time"
)
type Job struct {
ID int
Command string
RunAt time.Time
Done bool
Output string
Error string
}
type JobQueue struct {
mu sync.Mutex
jobs []*Job
next int
}
func NewJobQueue() *JobQueue {
return &JobQueue{next: 1}
}
func (q *JobQueue) Add(command string, runAt time.Time) int {
q.mu.Lock()
defer q.mu.Unlock()
job := &Job{
ID: q.next,
Command: command,
RunAt: runAt,
}
q.jobs = append(q.jobs, job)
q.next++
return job.ID
}
func (q *JobQueue) List() []*Job {
q.mu.Lock()
defer q.mu.Unlock()
pending := make([]*Job, 0)
for _, j := range q.jobs {
if !j.Done {
pending = append(pending, j)
}
}
return pending
}
func (q *JobQueue) Remove(id int) bool {
q.mu.Lock()
defer q.mu.Unlock()
for i, j := range q.jobs {
if j.ID == id && !j.Done {
q.jobs = append(q.jobs[:i], q.jobs[i+1:]...)
return true
}
}
return false
}
func (q *JobQueue) RunPending(ctx context.Context) {
q.mu.Lock()
now := time.Now()
var ready []*Job
for _, j := range q.jobs {
if !j.Done && !j.RunAt.After(now) {
ready = append(ready, j)
}
}
q.mu.Unlock()
for _, j := range ready {
select {
case <-ctx.Done():
return
default:
}
cmd := exec.CommandContext(ctx, "sh", "-c", j.Command)
output, err := cmd.CombinedOutput()
q.mu.Lock()
j.Done = true
j.Output = string(output)
if err != nil {
j.Error = err.Error()
}
q.mu.Unlock()
fmt.Printf("[Job %d] Completed: %s\n", j.ID, j.Command)
if len(j.Output) > 0 {
fmt.Printf(" Output: %s", j.Output)
}
if j.Error != "" {
fmt.Printf(" Error: %s\n", j.Error)
}
}
}
func main() {
queue := NewJobQueue()
// Schedule jobs
queue.Add("echo 'backup started' && date", time.Now().Add(1*time.Second))
queue.Add("echo 'report generated'", time.Now().Add(2*time.Second))
queue.Add("echo 'this one is removed'", time.Now().Add(3*time.Second))
// Remove job 3 (like atrm)
queue.Remove(3)
// List pending
fmt.Println("Pending jobs:")
for _, j := range queue.List() {
fmt.Printf(" [%d] %s at %s\n", j.ID, j.Command, j.RunAt.Format("15:04:05"))
}
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
// Wait and run
fmt.Println("\nWaiting for jobs...")
time.Sleep(3 * time.Second)
queue.RunPending(ctx)
fmt.Println("\nAll done.")
}
go run main.go
Pending jobs:
[1] echo 'backup started' && date at 14:30:02
[2] echo 'report generated' at 14:30:03
Waiting for jobs...
[Job 1] Completed: echo 'backup started' && date
Output: backup started
Wed Feb 5 14:30:04 2025
[Job 2] Completed: echo 'report generated'
Output: report generated
All done.
The Bug: Goroutine Leak with time.After
A common mistake when scheduling delayed execution is using time.After inside a goroutine without cleanup:
func (q *JobQueue) RunPendingBuggy() {
q.mu.Lock()
var pending []*Job
for _, j := range q.jobs {
if !j.Done {
pending = append(pending, j)
}
}
q.mu.Unlock()
for _, j := range pending {
job := j
go func() {
delay := time.Until(job.RunAt)
if delay > 0 {
<-time.After(delay)
}
cmd := exec.Command("sh", "-c", job.Command)
cmd.Run()
job.Done = true
}()
}
}
The problem: time.After creates a channel and a timer that live until the timer fires. If the program exits before the timer fires, the goroutine leaks. There is no way to cancel it. If you schedule 1000 jobs that all run in 1 hour, you have 1000 goroutines and 1000 timers consuming memory for that entire hour. And if the program exits, those jobs never run.
The Fix: Use time.NewTimer with Context
Replace time.After with time.NewTimer and a context:
func (q *JobQueue) RunPendingFixed(ctx context.Context) {
q.mu.Lock()
var pending []*Job
for _, j := range q.jobs {
if !j.Done {
pending = append(pending, j)
}
}
q.mu.Unlock()
var wg sync.WaitGroup
for _, j := range pending {
job := j
wg.Add(1)
go func() {
defer wg.Done()
delay := time.Until(job.RunAt)
if delay <= 0 {
delay = 0
}
timer := time.NewTimer(delay)
defer timer.Stop()
select {
case <-ctx.Done():
fmt.Printf("[Job %d] Cancelled\n", job.ID)
return
case <-timer.C:
}
cmd := exec.CommandContext(ctx, "sh", "-c", job.Command)
output, err := cmd.CombinedOutput()
q.mu.Lock()
job.Done = true
job.Output = string(output)
if err != nil {
job.Error = err.Error()
}
q.mu.Unlock()
fmt.Printf("[Job %d] Completed: %s\n", job.ID, job.Command)
}()
}
wg.Wait()
}
Now when the context is cancelled, the goroutine exits immediately. The timer is stopped and cleaned up. No leaks.
Deepen your understanding in How to Replace Text in Multiple Files with Sed: A Step-by-Step Guide
Step 3: Make for Task Dependencies
Cron and at run isolated commands. But real workflows have dependencies. You cannot deploy before you build. You cannot build before you test. Make solves this.
Linux: Makefiles for DevOps
Create a file called Makefile:
# Variables
APP_NAME = myapp
BUILD_DIR = ./build
GO_FILES = $(wildcard *.go)
# Default target
all: build
# Build depends on Go source files
build: $(GO_FILES)
@echo "Building $(APP_NAME)..."
go build -o $(BUILD_DIR)/$(APP_NAME) .
@echo "Done."
# Test must pass before build
test:
@echo "Running tests..."
go test ./...
# Deploy depends on build
deploy: build
@echo "Deploying $(APP_NAME)..."
cp $(BUILD_DIR)/$(APP_NAME) /usr/local/bin/
@echo "Deployed."
# Clean removes build artifacts
clean:
@echo "Cleaning..."
rm -rf $(BUILD_DIR)
# Lint runs before test
lint:
@echo "Linting..."
go vet ./...
# Full pipeline
ci: lint test build
# These targets are not files
.PHONY: all build test deploy clean lint ci
Run targets:
make build # Builds the app
make test # Runs tests
make deploy # Builds first, then deploys
make ci # Lint -> test -> build (in order)
make clean # Remove build artifacts
Make reads the dependency graph. When you run make deploy, it sees that deploy depends on build. It runs build first. If build depends on source files that have not changed, Make skips it. This is the power of dependency-based execution.
The .PHONY declaration tells Make that these targets are not files. Without it, if a file named build exists in the directory, Make would think the target is already up to date and skip it.
The $@ variable is the target name. $< is the first dependency. $^ is all dependencies:
%.o: %.c
gcc -c $< -o $@
This compiles any .c file into a .o file. $< is the .c file. $@ is the .o output.
Go: Building a Task Dependency Graph
Now build a task graph in Go. Each task has a name, a function to run, and a list of dependencies.
package main
import (
"fmt"
"strings"
"time"
)
type Task struct {
Name string
Deps []string
Run func() error
Done bool
Failed bool
elapsed time.Duration
}
type TaskGraph struct {
tasks map[string]*Task
order []string
}
func NewTaskGraph() *TaskGraph {
return &TaskGraph{
tasks: make(map[string]*Task),
}
}
func (g *TaskGraph) Add(name string, deps []string, run func() error) {
g.tasks[name] = &Task{
Name: name,
Deps: deps,
Run: run,
}
}
func (g *TaskGraph) resolve(name string, resolved *[]string, seen map[string]bool) error {
seen[name] = true
task, ok := g.tasks[name]
if !ok {
return fmt.Errorf("unknown task: %s", name)
}
for _, dep := range task.Deps {
if seen[dep] {
continue
}
if err := g.resolve(dep, resolved, seen); err != nil {
return err
}
}
*resolved = append(*resolved, name)
return nil
}
func (g *TaskGraph) Execute(target string) error {
var resolved []string
seen := make(map[string]bool)
if err := g.resolve(target, &resolved, seen); err != nil {
return err
}
for _, name := range resolved {
task := g.tasks[name]
if task.Done {
continue
}
fmt.Printf("[RUN] %s\n", name)
start := time.Now()
err := task.Run()
task.elapsed = time.Since(start)
if err != nil {
task.Failed = true
fmt.Printf("[FAIL] %s (%v) - %v\n", name, task.elapsed, err)
return fmt.Errorf("task %s failed: %w", name, err)
}
task.Done = true
fmt.Printf("[DONE] %s (%v)\n", name, task.elapsed)
}
return nil
}
func main() {
g := NewTaskGraph()
g.Add("lint", nil, func() error {
fmt.Println(" Running go vet...")
time.Sleep(100 * time.Millisecond)
return nil
})
g.Add("test", []string{"lint"}, func() error {
fmt.Println(" Running tests...")
time.Sleep(200 * time.Millisecond)
return nil
})
g.Add("build", []string{"test"}, func() error {
fmt.Println(" Compiling binary...")
time.Sleep(150 * time.Millisecond)
return nil
})
g.Add("deploy", []string{"build"}, func() error {
fmt.Println(" Deploying to server...")
time.Sleep(100 * time.Millisecond)
return nil
})
fmt.Println(strings.Repeat("-", 40))
fmt.Println("Running: make deploy")
fmt.Println(strings.Repeat("-", 40))
if err := g.Execute("deploy"); err != nil {
fmt.Printf("\nPipeline failed: %v\n", err)
} else {
fmt.Println("\nPipeline complete.")
}
}
go run main.go
----------------------------------------
Running: make deploy
----------------------------------------
[RUN] lint
Running go vet...
[DONE] lint (100ms)
[RUN] test
Running tests...
[DONE] test (200ms)
[RUN] build
Compiling binary...
[DONE] build (150ms)
[RUN] deploy
Deploying to server...
[DONE] deploy (100ms)
Pipeline complete.
Tasks execute in dependency order. Build runs after test. Deploy runs after build.
The Bug: No Cycle Detection
Add a circular dependency:
g.Add("alpha", []string{"beta"}, func() error {
return nil
})
g.Add("beta", []string{"alpha"}, func() error {
return nil
})
g.Execute("alpha")
The resolve function checks seen[dep] and skips it with continue. This means it silently ignores the cycle and produces an incomplete ordering. Task alpha depends on beta, but beta tries to resolve alpha, sees it is already in seen, and skips it. Now beta is added to the resolved list before alpha. But beta depends on alpha, which has not run yet.
In the worst case with a slightly different implementation, this causes infinite recursion and a stack overflow.
The Fix: Track Resolution State
Use two sets: one for “currently resolving” (on the stack) and one for “fully resolved.”
func (g *TaskGraph) resolveFixed(name string, resolved *[]string, visiting map[string]bool, visited map[string]bool) error {
if visited[name] {
return nil
}
if visiting[name] {
return fmt.Errorf("cycle detected: task %q depends on itself (directly or indirectly)", name)
}
visiting[name] = true
task, ok := g.tasks[name]
if !ok {
return fmt.Errorf("unknown task: %s", name)
}
for _, dep := range task.Deps {
if err := g.resolveFixed(dep, resolved, visiting, visited); err != nil {
return err
}
}
visiting[name] = false
visited[name] = true
*resolved = append(*resolved, name)
return nil
}
Now test it:
g.Add("alpha", []string{"beta"}, func() error { return nil })
g.Add("beta", []string{"alpha"}, func() error { return nil })
var resolved []string
err := g.resolveFixed("alpha", &resolved, make(map[string]bool), make(map[string]bool))
if err != nil {
fmt.Println(err)
}
cycle detected: task "alpha" depends on itself (directly or indirectly)
The cycle is caught before any task runs.
Explore this further in Terraform From Scratch: Provision AWS Infrastructure Step by Step
Step 4: Systemd Timers (Modern Cron)
Systemd timers are the modern replacement for cron. They integrate with systemd logging, dependency management, and service monitoring.
Linux: Creating a Systemd Timer
List all active timers:
systemctl list-timers
Output shows each timer, when it last ran, and when it will run next.
A systemd timer needs two files: a .timer file and a matching .service file.
Create the service file at /etc/systemd/system/mybackup.service:
[Unit]
Description=Run daily backup
[Service]
Type=oneshot
ExecStart=/home/user/backup.sh
User=user
Create the timer file at /etc/systemd/system/mybackup.timer:
[Unit]
Description=Daily backup timer
[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
[Install]
WantedBy=timers.target
Enable and start the timer:
sudo systemctl daemon-reload
sudo systemctl enable mybackup.timer
sudo systemctl start mybackup.timer
The OnCalendar format is Year-Month-Day Hour:Minute:Second. Some examples:
OnCalendar=*-*-* 02:00:00 # Daily at 2:00 AM
OnCalendar=Mon *-*-* 09:00:00 # Every Monday at 9:00 AM
OnCalendar=*-*-01 06:00:00 # First of every month at 6:00 AM
Another option is OnBootSec, which runs a specified time after boot:
OnBootSec=5min # 5 minutes after boot
OnBootSec=1h # 1 hour after boot
The Persistent=true option means that if the system was off when the timer should have fired, it will run as soon as the system starts.
Check logs for the service:
journalctl -u mybackup.service
Check when the timer will fire next:
systemctl status mybackup.timer
Go: Persistent Scheduler with State File
Build a scheduler that writes its state to a JSON file. This way it can survive restarts and know which jobs have already run.
package main
import (
"encoding/json"
"fmt"
"os"
"os/exec"
"path/filepath"
"time"
)
type ScheduledJob struct {
Name string `json:"name"`
Command string `json:"command"`
IntervalS int `json:"interval_seconds"`
LastRun time.Time `json:"last_run"`
RunCount int `json:"run_count"`
LastStatus string `json:"last_status"`
}
type SchedulerState struct {
Jobs []*ScheduledJob `json:"jobs"`
}
func loadState(path string) (*SchedulerState, error) {
data, err := os.ReadFile(path)
if err != nil {
if os.IsNotExist(err) {
return &SchedulerState{}, nil
}
return nil, err
}
var state SchedulerState
if err := json.Unmarshal(data, &state); err != nil {
return nil, err
}
return &state, nil
}
func saveState(path string, state *SchedulerState) error {
data, err := json.MarshalIndent(state, "", " ")
if err != nil {
return err
}
return os.WriteFile(path, data, 0644)
}
func runDue(state *SchedulerState) {
now := time.Now()
for _, job := range state.Jobs {
interval := time.Duration(job.IntervalS) * time.Second
if now.Sub(job.LastRun) < interval {
fmt.Printf("[SKIP] %s (next run in %v)\n", job.Name, interval-now.Sub(job.LastRun))
continue
}
fmt.Printf("[RUN] %s: %s\n", job.Name, job.Command)
cmd := exec.Command("sh", "-c", job.Command)
output, err := cmd.CombinedOutput()
job.LastRun = now
job.RunCount++
if err != nil {
job.LastStatus = fmt.Sprintf("FAILED: %v", err)
fmt.Printf("[FAIL] %s: %v\n", job.Name, err)
} else {
job.LastStatus = "OK"
fmt.Printf("[DONE] %s\n", job.Name)
}
if len(output) > 0 {
fmt.Printf(" Output: %s", string(output))
}
}
}
func main() {
stateFile := filepath.Join(os.TempDir(), "scheduler-state.json")
state, err := loadState(stateFile)
if err != nil {
fmt.Printf("Error loading state: %v\n", err)
return
}
// Define jobs if state is empty
if len(state.Jobs) == 0 {
state.Jobs = []*ScheduledJob{
{Name: "health-check", Command: "echo 'OK'", IntervalS: 10},
{Name: "disk-usage", Command: "df -h / | tail -1", IntervalS: 60},
{Name: "log-rotate", Command: "echo 'rotating logs'", IntervalS: 3600},
}
}
fmt.Println("Scheduler tick at", time.Now().Format("15:04:05"))
fmt.Println()
runDue(state)
if err := saveState(stateFile, state); err != nil {
fmt.Printf("Error saving state: %v\n", err)
} else {
fmt.Printf("\nState saved to %s\n", stateFile)
}
}
Run it twice:
go run main.go
Scheduler tick at 14:30:00
[RUN] health-check: echo 'OK'
[DONE] health-check
Output: OK
[RUN] disk-usage: df -h / | tail -1
[DONE] disk-usage
Output: /dev/sda1 50G 22G 26G 46% /
[RUN] log-rotate: echo 'rotating logs'
[DONE] log-rotate
Output: rotating logs
State saved to /tmp/scheduler-state.json
Run again within 10 seconds:
go run main.go
Scheduler tick at 14:30:05
[SKIP] health-check (next run in 5s)
[SKIP] disk-usage (next run in 55s)
[SKIP] log-rotate (next run in 59m55s)
State saved to /tmp/scheduler-state.json
The scheduler remembers when each job last ran. It skips jobs that are not due yet.
The Bug: Race Condition on State File
If two instances of the scheduler run at the same time, they both read the state file, both see that a job is due, both run it, and both write the file. The second write overwrites the first. Depending on timing, you lose run counts or get duplicate executions.
This is the same problem that happens if you run two cron jobs that both update the same file.
Simulate it:
// Instance 1 reads state: last_run = 14:00:00
// Instance 2 reads state: last_run = 14:00:00
// Instance 1 runs job, sets last_run = 14:30:00, writes file
// Instance 2 runs job (duplicate!), sets last_run = 14:30:01, writes file
// Instance 1's write is lost
The Fix: Use a Lock File
Before reading the state, acquire a lock file. If the lock file already exists, wait or exit.
func acquireLock(path string) (*os.File, error) {
lockPath := path + ".lock"
f, err := os.OpenFile(lockPath, os.O_CREATE|os.O_EXCL|os.O_WRONLY, 0644)
if err != nil {
if os.IsExist(err) {
return nil, fmt.Errorf("another instance is running (lock file exists: %s)", lockPath)
}
return nil, err
}
// Write PID for debugging
fmt.Fprintf(f, "%d", os.Getpid())
return f, nil
}
func releaseLock(f *os.File, path string) {
lockPath := path + ".lock"
f.Close()
os.Remove(lockPath)
}
Use it in main:
func main() {
stateFile := filepath.Join(os.TempDir(), "scheduler-state.json")
lock, err := acquireLock(stateFile)
if err != nil {
fmt.Println(err)
return
}
defer releaseLock(lock, stateFile)
// ... rest of the scheduler
}
Now if a second instance tries to run, it gets:
another instance is running (lock file exists: /tmp/scheduler-state.json.lock)
The O_CREATE|O_EXCL flags make the open atomic. If the file already exists, the call fails. No race condition.
Discover related concepts in Terraform From Scratch: Provision AWS Infrastructure Step by Step
Step 5: Task Runner with Retries and Timeouts
Jobs fail. Networks drop. Services restart. A good task runner does not give up on the first failure.
Linux: Retry Patterns in Bash
A simple retry loop in bash:
for i in 1 2 3; do
/home/user/deploy.sh && break || echo "Attempt $i failed, retrying in 5s..." && sleep 5
done
This tries the command up to 3 times. If it succeeds (&&), it breaks out of the loop. If it fails (||), it waits 5 seconds and tries again.
For timeouts, use the timeout command:
timeout 30 /home/user/long-task.sh
This kills the process after 30 seconds if it has not finished. The exit code is 124 when the timeout triggers.
Combine them:
for i in 1 2 3; do
timeout 30 /home/user/deploy.sh && break || echo "Attempt $i failed" && sleep 5
done
Check the exit code to distinguish between a timeout and a regular failure:
timeout 30 /home/user/deploy.sh
status=$?
if [ $status -eq 124 ]; then
echo "Command timed out"
elif [ $status -ne 0 ]; then
echo "Command failed with exit code $status"
fi
Go: Retries with Exponential Backoff
Build a task executor that retries with exponential backoff and a timeout per attempt.
package main
import (
"context"
"fmt"
"math"
"os/exec"
"time"
)
type TaskConfig struct {
Name string
Command string
MaxRetries int
Timeout time.Duration
}
type TaskResult struct {
Name string
Success bool
Attempts int
Output string
Error string
Duration time.Duration
}
func executeWithRetry(cfg TaskConfig) TaskResult {
result := TaskResult{Name: cfg.Name}
start := time.Now()
for attempt := 0; attempt <= cfg.MaxRetries; attempt++ {
result.Attempts = attempt + 1
if attempt > 0 {
backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
fmt.Printf(" Retry %d/%d in %v...\n", attempt, cfg.MaxRetries, backoff)
time.Sleep(backoff)
}
ctx, cancel := context.WithTimeout(context.Background(), cfg.Timeout)
cmd := exec.CommandContext(ctx, "sh", "-c", cfg.Command)
output, err := cmd.CombinedOutput()
cancel()
result.Output = string(output)
if err == nil {
result.Success = true
result.Duration = time.Since(start)
return result
}
if ctx.Err() == context.DeadlineExceeded {
result.Error = "timeout"
fmt.Printf(" Attempt %d: timed out after %v\n", attempt+1, cfg.Timeout)
} else {
result.Error = err.Error()
fmt.Printf(" Attempt %d: failed - %v\n", attempt+1, err)
}
}
result.Duration = time.Since(start)
return result
}
func main() {
tasks := []TaskConfig{
{
Name: "quick-task",
Command: "echo 'done'",
MaxRetries: 3,
Timeout: 5 * time.Second,
},
{
Name: "flaky-task",
Command: "if [ $(shuf -i 1-3 -n 1) -eq 1 ]; then echo 'success'; else exit 1; fi",
MaxRetries: 5,
Timeout: 5 * time.Second,
},
{
Name: "slow-task",
Command: "sleep 10 && echo 'done'",
MaxRetries: 2,
Timeout: 3 * time.Second,
},
}
fmt.Println("Running tasks with retries and timeouts")
fmt.Println()
for _, cfg := range tasks {
fmt.Printf("[START] %s\n", cfg.Name)
result := executeWithRetry(cfg)
if result.Success {
fmt.Printf("[PASS] %s (attempts: %d, duration: %v)\n", result.Name, result.Attempts, result.Duration)
} else {
fmt.Printf("[FAIL] %s (attempts: %d, error: %s, duration: %v)\n", result.Name, result.Attempts, result.Error, result.Duration)
}
fmt.Println()
}
}
go run main.go
Running tasks with retries and timeouts
[START] quick-task
[PASS] quick-task (attempts: 1, duration: 5ms)
[START] flaky-task
Attempt 1: failed - exit status 1
Retry 1/5 in 2s...
Attempt 2: failed - exit status 1
Retry 2/5 in 4s...
[PASS] flaky-task (attempts: 3, duration: 6.01s)
[START] slow-task
Attempt 1: timed out after 3s
Retry 1/2 in 2s...
Attempt 2: timed out after 3s
Retry 2/2 in 4s...
Attempt 3: timed out after 3s
[FAIL] slow-task (attempts: 3, error: timeout, duration: 15.02s)
The Bug: Unbounded Exponential Backoff
Look at the backoff calculation:
backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
After 10 retries: 2^10 = 1024 seconds which is about 17 minutes. After 15 retries: 2^15 = 32768 seconds which is over 9 hours. The backoff grows without limit.
If MaxRetries is set to 20, the total wait time before all retries complete is over 12 days. That is not useful.
// Attempt 1: 2 seconds
// Attempt 2: 4 seconds
// Attempt 3: 8 seconds
// Attempt 4: 16 seconds
// Attempt 5: 32 seconds
// Attempt 6: 64 seconds (over 1 minute)
// Attempt 7: 128 seconds (over 2 minutes)
// Attempt 8: 256 seconds (over 4 minutes)
// Attempt 9: 512 seconds (over 8 minutes)
// Attempt 10: 1024 seconds (over 17 minutes)
The Fix: Cap the Backoff
Add a maximum backoff duration:
func calculateBackoff(attempt int, maxBackoff time.Duration) time.Duration {
backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
if backoff > maxBackoff {
backoff = maxBackoff
}
return backoff
}
Use it in the retry loop:
if attempt > 0 {
backoff := calculateBackoff(attempt, 60*time.Second)
fmt.Printf(" Retry %d/%d in %v...\n", attempt, cfg.MaxRetries, backoff)
time.Sleep(backoff)
}
Now the backoff sequence is: 2s, 4s, 8s, 16s, 32s, 60s, 60s, 60s, … It never exceeds 60 seconds. After the cap, each retry waits exactly 1 minute. Predictable and reasonable.
Uncover more details in CI Pipeline Basics: From Shell Scripts to a Go Build Runner
Step 6: Complete Task Runner with Status Dashboard
Now combine everything from the previous steps into a single task runner: cron-like scheduling, dependency graph, retries, timeouts, and a colored terminal dashboard.
Configuration Format
Define tasks in a simple key-value format. No external YAML library needed.
package main
import (
"bufio"
"context"
"fmt"
"math"
"os/exec"
"strconv"
"strings"
"sync"
"time"
)
// --- Configuration Parser ---
type TaskDef struct {
Name string
Command string
Deps []string
MaxRetries int
TimeoutSec int
Schedule string
}
func parseConfig(input string) []TaskDef {
var tasks []TaskDef
var current *TaskDef
scanner := bufio.NewScanner(strings.NewReader(input))
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" || strings.HasPrefix(line, "#") {
continue
}
if strings.HasPrefix(line, "[") && strings.HasSuffix(line, "]") {
if current != nil {
tasks = append(tasks, *current)
}
name := line[1 : len(line)-1]
current = &TaskDef{
Name: name,
MaxRetries: 0,
TimeoutSec: 30,
}
continue
}
if current == nil {
continue
}
parts := strings.SplitN(line, "=", 2)
if len(parts) != 2 {
continue
}
key := strings.TrimSpace(parts[0])
value := strings.TrimSpace(parts[1])
switch key {
case "command":
current.Command = value
case "deps":
if value != "" {
for _, d := range strings.Split(value, ",") {
current.Deps = append(current.Deps, strings.TrimSpace(d))
}
}
case "retries":
n, err := strconv.Atoi(value)
if err == nil {
current.MaxRetries = n
}
case "timeout":
n, err := strconv.Atoi(value)
if err == nil {
current.TimeoutSec = n
}
case "schedule":
current.Schedule = value
}
}
if current != nil {
tasks = append(tasks, *current)
}
return tasks
}
Task Execution Engine
// --- Task Status ---
type Status int
const (
Pending Status = iota
Running
Passed
Failed
Skipped
)
func (s Status) String() string {
switch s {
case Pending:
return "PENDING"
case Running:
return "RUNNING"
case Passed:
return "PASSED"
case Failed:
return "FAILED"
case Skipped:
return "SKIPPED"
default:
return "UNKNOWN"
}
}
func (s Status) Color() string {
switch s {
case Pending:
return "\033[33m" // yellow
case Running:
return "\033[36m" // cyan
case Passed:
return "\033[32m" // green
case Failed:
return "\033[31m" // red
case Skipped:
return "\033[90m" // gray
default:
return "\033[0m"
}
}
const colorReset = "\033[0m"
// --- Task Result ---
type RunResult struct {
Name string
Status Status
Attempts int
Output string
Error string
Duration time.Duration
}
// --- Backoff ---
func cappedBackoff(attempt int, max time.Duration) time.Duration {
b := time.Duration(math.Pow(2, float64(attempt))) * time.Second
if b > max {
return max
}
return b
}
// --- Executor ---
func executeTask(ctx context.Context, def TaskDef) RunResult {
result := RunResult{Name: def.Name}
start := time.Now()
timeout := time.Duration(def.TimeoutSec) * time.Second
for attempt := 0; attempt <= def.MaxRetries; attempt++ {
result.Attempts = attempt + 1
if attempt > 0 {
backoff := cappedBackoff(attempt, 60*time.Second)
select {
case <-ctx.Done():
result.Status = Skipped
result.Error = "cancelled"
result.Duration = time.Since(start)
return result
case <-time.After(backoff):
}
}
cmdCtx, cancel := context.WithTimeout(ctx, timeout)
cmd := exec.CommandContext(cmdCtx, "sh", "-c", def.Command)
output, err := cmd.CombinedOutput()
cancel()
result.Output = string(output)
if err == nil {
result.Status = Passed
result.Duration = time.Since(start)
return result
}
if cmdCtx.Err() == context.DeadlineExceeded {
result.Error = "timeout"
} else {
result.Error = err.Error()
}
}
result.Status = Failed
result.Duration = time.Since(start)
return result
}
Dependency Resolution
// --- Dependency Graph ---
func resolveDeps(defs []TaskDef, target string) ([]string, error) {
defMap := make(map[string]TaskDef)
for _, d := range defs {
defMap[d.Name] = d
}
var order []string
visiting := make(map[string]bool)
visited := make(map[string]bool)
var visit func(string) error
visit = func(name string) error {
if visited[name] {
return nil
}
if visiting[name] {
return fmt.Errorf("cycle detected at task %q", name)
}
visiting[name] = true
def, ok := defMap[name]
if !ok {
return fmt.Errorf("unknown task: %s", name)
}
for _, dep := range def.Deps {
if err := visit(dep); err != nil {
return err
}
}
visiting[name] = false
visited[name] = true
order = append(order, name)
return nil
}
if err := visit(target); err != nil {
return nil, err
}
return order, nil
}
Dashboard Display
// --- Dashboard ---
func printHeader() {
fmt.Println()
fmt.Println(strings.Repeat("=", 70))
fmt.Println(" TASK RUNNER")
fmt.Println(strings.Repeat("=", 70))
fmt.Println()
}
func printTaskLine(r RunResult) {
color := r.Status.Color()
status := fmt.Sprintf("[%s]", r.Status)
dur := r.Duration.Truncate(time.Millisecond)
attempts := ""
if r.Attempts > 1 {
attempts = fmt.Sprintf(" (attempts: %d)", r.Attempts)
}
errMsg := ""
if r.Error != "" && r.Status == Failed {
errMsg = fmt.Sprintf(" - %s", r.Error)
}
fmt.Printf(" %s%-9s%s %-25s %10v%s%s\n",
color, status, colorReset,
r.Name, dur, attempts, errMsg)
}
func printSummary(results []RunResult) {
fmt.Println()
fmt.Println(strings.Repeat("-", 70))
total := len(results)
passed := 0
failed := 0
skipped := 0
var totalDuration time.Duration
for _, r := range results {
totalDuration += r.Duration
switch r.Status {
case Passed:
passed++
case Failed:
failed++
case Skipped:
skipped++
}
}
fmt.Printf("\n Total: %d", total)
if passed > 0 {
fmt.Printf(" | \033[32mPassed: %d\033[0m", passed)
}
if failed > 0 {
fmt.Printf(" | \033[31mFailed: %d\033[0m", failed)
}
if skipped > 0 {
fmt.Printf(" | \033[90mSkipped: %d\033[0m", skipped)
}
fmt.Printf(" | Duration: %v\n", totalDuration.Truncate(time.Millisecond))
fmt.Println()
fmt.Println(strings.Repeat("=", 70))
}
Main: Putting It All Together
func main() {
config := `
# Task Runner Configuration
[lint]
command = echo 'Running linter...' && sleep 0.1 && echo 'No issues found'
deps =
retries = 0
timeout = 10
[test]
command = echo 'Running tests...' && sleep 0.2 && echo '14 passed, 0 failed'
deps = lint
retries = 1
timeout = 15
[build]
command = echo 'Compiling...' && sleep 0.3 && echo 'Build complete: myapp v1.0.0'
deps = test
retries = 2
timeout = 30
[migrate]
command = echo 'Running migrations...' && sleep 0.1 && echo 'Applied 3 migrations'
deps =
retries = 3
timeout = 20
[deploy]
command = echo 'Deploying to production...' && sleep 0.2 && echo 'Deploy successful'
deps = build, migrate
retries = 2
timeout = 60
[notify]
command = echo 'Sending notification...' && echo 'Team notified via webhook'
deps = deploy
retries = 1
timeout = 10
`
defs := parseConfig(config)
defMap := make(map[string]TaskDef)
for _, d := range defs {
defMap[d.Name] = d
}
// Resolve execution order for the final target
target := "notify"
order, err := resolveDeps(defs, target)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
printHeader()
fmt.Printf(" Target: %s\n", target)
fmt.Printf(" Execution order: %s\n", strings.Join(order, " -> "))
fmt.Println()
fmt.Println(strings.Repeat("-", 70))
fmt.Println()
ctx := context.Background()
var results []RunResult
var mu sync.Mutex
completed := make(map[string]bool)
for _, name := range order {
def := defMap[name]
// Check if all deps passed
allDepsPassed := true
for _, dep := range def.Deps {
mu.Lock()
if !completed[dep] {
allDepsPassed = false
}
mu.Unlock()
}
if !allDepsPassed {
r := RunResult{
Name: name,
Status: Skipped,
Error: "dependency failed",
}
printTaskLine(r)
results = append(results, r)
continue
}
// Show running status
fmt.Printf(" %s%-9s%s %-25s\r",
Running.Color(), fmt.Sprintf("[%s]", Running), colorReset, name)
result := executeTask(ctx, def)
printTaskLine(result)
if result.Status == Passed {
mu.Lock()
completed[name] = true
mu.Unlock()
}
results = append(results, result)
}
printSummary(results)
}
Running the Complete Task Runner
Save all the code blocks above into a single main.go file and run it:
go run main.go
======================================================================
TASK RUNNER
======================================================================
Target: notify
Execution order: lint -> test -> build -> migrate -> deploy -> notify
----------------------------------------------------------------------
[PASSED] lint 102ms
[PASSED] test 205ms
[PASSED] build 308ms
[PASSED] migrate 104ms
[PASSED] deploy 207ms
[PASSED] notify 12ms
----------------------------------------------------------------------
Total: 6 | Passed: 6 | Duration: 938ms
======================================================================
All six tasks run in dependency order. Lint runs first because test depends on it. Build and migrate can both run after their dependencies complete. Deploy waits for both build and migrate. Notify runs last.
Testing a Failure
Change the build command to simulate a failure:
[build]
command = echo 'Compiling...' && exit 1
deps = test
retries = 2
timeout = 30
======================================================================
TASK RUNNER
======================================================================
Target: notify
Execution order: lint -> test -> build -> migrate -> deploy -> notify
----------------------------------------------------------------------
[PASSED] lint 101ms
[PASSED] test 204ms
[FAILED] build 6.31s (attempts: 3) - exit status 1
[PASSED] migrate 103ms
[SKIPPED] deploy 0s
[SKIPPED] notify 0s
----------------------------------------------------------------------
Total: 6 | Passed: 3 | Failed: 1 | Skipped: 2 | Duration: 6.718s
======================================================================
Build fails after 3 attempts (1 original + 2 retries). Deploy is skipped because it depends on build. Notify is skipped because it depends on deploy. But migrate still runs because it has no dependency on build.
The runner does not stop everything on first failure. It continues running tasks that do not depend on the failed one. Only downstream tasks are skipped.
Journey deeper into this topic with CI Pipeline Basics: From Shell Scripts to a Go Build Runner
What You Built
This article covered six layers of task automation:
- Cron for repeating schedules. You parsed cron expressions in Go with range support.
- At for one-off jobs. You built a job queue with proper goroutine cleanup using timers and contexts.
- Make for dependencies. You built a task graph with topological sort and cycle detection.
- Systemd timers for persistent scheduling. You built a state file with lock-based concurrency control.
- Retries and timeouts for resilience. You implemented exponential backoff with a cap.
- A complete task runner that combines all of the above into a single tool.
Every Go program uses only the standard library. Every Linux command runs on any modern distribution.
The task runner you built handles the same core problems as production tools: scheduling, dependency ordering, failure recovery, and status reporting. The patterns here — topological sort, exponential backoff, lock files, context cancellation — show up in build systems, CI pipelines, and orchestration tools.
Enrich your learning with Deploy Jenkins on Amazon EKS: Complete Tutorial for Pods and Deployments
References and Further Reading
- Evi Nemeth, et al. (2017). UNIX and Linux System Administration Handbook. Addison-Wesley Professional. Fifth Edition.
- The Go Programming Language Specification. (2024). The Go Programming Language Specification. Go Team.
- Michael Kerrisk. (2010). The Linux Programming Interface. No Starch Press.
- GNU Make Manual. (2023). GNU Make. Free Software Foundation.
- systemd Documentation. (2024). systemd.timer. freedesktop.org.
Keep Reading
- CI Pipeline Basics: From Shell Scripts to a Go Build Runner — use the scheduling and dependency patterns from this article in a CI context.
- Process Management: From Linux Commands to a Go Supervisor — manage the processes your task runner starts with signals and health checks.
- Mastering Bash: The Ultimate Guide to Command Line Productivity — go deeper on the bash scripting that powers cron jobs and Makefiles.
What task automation patterns have you implemented in your infrastructure?
Similar Articles
Related Content
More from devops
Learn Terraform with AWS from scratch. Start with a single S3 bucket, hit real errors, fix them, …
Learn nginx log analysis step by step — start with grep and awk one-liners for quick answers, then …
You Might Also Like
Learn AWS automation step by step. Start with AWS CLI commands for S3, EC2, and IAM, then build the …
