Skip main navigation
/user/kayd @ devops :~$ cat linux-automation-tools.md

Task Automation: From Cron and Make to a Go Task Runner Task Automation: From Cron and Make to a Go Task Runner

QR Code linking to: Task Automation: From Cron and Make to a Go Task Runner
Karandeep Singh
Karandeep Singh
• 29 minutes

Summary

Master Linux task automation with cron, make, and systemd timers. Then build a complete Go task runner with scheduling, retries, and a status dashboard.

Every Linux system runs scheduled jobs. Backups, log rotation, health checks, deployments. Most of these start as cron jobs. Some grow into Makefiles. A few become tangled shell scripts that nobody wants to touch.

This article walks through Linux task automation from the ground up. You will use cron, at, make, and systemd timers. At each step, you will build the same feature in Go. By the end, you will have a working task runner that schedules jobs, handles dependencies, retries failures, and prints a status dashboard.

All Go code uses only the standard library. Every code block compiles and runs.


Step 1: Scheduling with Cron

Cron is the oldest task scheduler on Linux. It runs jobs at fixed intervals. The cron daemon reads a table of jobs (the crontab) and executes them on schedule.

Linux: Working with Crontab

List your current cron jobs:

crontab -l

If you have never set one up, you will see no crontab for <username>. Edit the crontab:

crontab -e

This opens a file in your default editor. Each line has six fields: five time fields and a command.

# minute  hour  day  month  weekday  command
  0       2     *    *      *        /home/user/backup.sh

The five time fields are:

  • Minute: 0-59
  • Hour: 0-23
  • Day of month: 1-31
  • Month: 1-12
  • Day of week: 0-7 (0 and 7 are both Sunday)

Common patterns:

# Every 5 minutes
*/5 * * * * /usr/local/bin/health-check.sh

# Daily at 2:00 AM
0 2 * * * /home/user/backup.sh

# Every Sunday at midnight
0 0 * * 0 /home/user/weekly-report.sh

# First day of every month at 6:00 AM
0 6 1 * * /home/user/monthly-cleanup.sh

List only active jobs (skip comments):

crontab -l | grep -v '^#'

A * means “every value.” The / means “every N.” So */5 in the minute field means every 5 minutes. The value 0 in the minute field means exactly minute zero.

Go: Parsing Cron Expressions

Now build a cron parser in Go. The goal is to take a cron expression like */5 * * * * and check if a given time matches.

Start with a struct to hold a parsed cron schedule:

package main

import (
	"fmt"
	"strconv"
	"strings"
	"time"
)

type CronSchedule struct {
	Minute  []int
	Hour    []int
	Day     []int
	Month   []int
	Weekday []int
}

func expandField(field string, min, max int) []int {
	var values []int

	parts := strings.Split(field, "/")
	step := 1

	if len(parts) == 2 {
		s, err := strconv.Atoi(parts[1])
		if err == nil {
			step = s
		}
	}

	base := parts[0]
	if base == "*" {
		for i := min; i <= max; i += step {
			values = append(values, i)
		}
		return values
	}

	// Single number
	num, err := strconv.Atoi(base)
	if err == nil {
		values = append(values, num)
	}

	return values
}

func parseCron(expr string) CronSchedule {
	fields := strings.Fields(expr)
	return CronSchedule{
		Minute:  expandField(fields[0], 0, 59),
		Hour:    expandField(fields[1], 0, 23),
		Day:     expandField(fields[2], 1, 31),
		Month:   expandField(fields[3], 1, 12),
		Weekday: expandField(fields[4], 0, 6),
	}
}

func contains(slice []int, val int) bool {
	for _, v := range slice {
		if v == val {
			return true
		}
	}
	return false
}

func (cs CronSchedule) Matches(t time.Time) bool {
	return contains(cs.Minute, t.Minute()) &&
		contains(cs.Hour, t.Hour()) &&
		contains(cs.Day, t.Day()) &&
		contains(cs.Month, int(t.Month())) &&
		contains(cs.Weekday, int(t.Weekday()))
}

func main() {
	schedule := parseCron("*/5 * * * *")
	now := time.Now()

	fmt.Printf("Current time: %s\n", now.Format("15:04"))
	fmt.Printf("Matches */5 * * * *: %v\n", schedule.Matches(now))

	twoAM := parseCron("0 2 * * *")
	testTime := time.Date(2025, 1, 15, 2, 0, 0, 0, time.Local)
	fmt.Printf("2:00 AM matches '0 2 * * *': %v\n", twoAM.Matches(testTime))
}

Run it:

go run main.go
Current time: 14:35
Matches */5 * * * *: true
2:00 AM matches '0 2 * * *': true

This works for */5 and for 0. But there is a bug.

The Bug: Bare * After Split

Try parsing * * * * * (every minute). The expandField function splits on /. For the field *, the split produces ["*"]. That has length 1, so step stays at 1. The base is *, so it loops from min to max with step 1. That works.

But now try 5 * * * * (minute 5 only). The split produces ["5"]. The base is "5", not "*". It parses the number 5. That works too.

Now try */5. The split produces ["*", "5"]. Base is "*", step is 5. It generates [0, 5, 10, 15, ...]. Correct.

So where is the bug? Try 1-5 (minutes 1 through 5). Cron supports ranges, but expandField does not handle them. Pass 1-5 and the base is "1-5". The strconv.Atoi call fails. The function returns an empty slice. The schedule never matches.

Test it:

func main() {
	schedule := parseCron("1-5 * * * *")
	testTime := time.Date(2025, 1, 15, 10, 3, 0, 0, time.Local)
	fmt.Printf("Minute 3 matches '1-5 * * * *': %v\n", schedule.Matches(testTime))
}
Minute 3 matches '1-5 * * * *': false

That is wrong. Minute 3 is in the range 1-5.

The Fix: Handle Ranges

Add range parsing to expandField:

func expandField(field string, min, max int) []int {
	var values []int

	parts := strings.Split(field, "/")
	step := 1

	if len(parts) == 2 {
		s, err := strconv.Atoi(parts[1])
		if err == nil {
			step = s
		}
	}

	base := parts[0]

	if base == "*" {
		for i := min; i <= max; i += step {
			values = append(values, i)
		}
		return values
	}

	// Range: "1-5"
	if strings.Contains(base, "-") {
		rangeParts := strings.Split(base, "-")
		low, err1 := strconv.Atoi(rangeParts[0])
		high, err2 := strconv.Atoi(rangeParts[1])
		if err1 == nil && err2 == nil {
			for i := low; i <= high; i += step {
				values = append(values, i)
			}
		}
		return values
	}

	// Single number
	num, err := strconv.Atoi(base)
	if err == nil {
		values = append(values, num)
	}

	return values
}

Now test again:

func main() {
	schedule := parseCron("1-5 * * * *")
	testTime := time.Date(2025, 1, 15, 10, 3, 0, 0, time.Local)
	fmt.Printf("Minute 3 matches '1-5 * * * *': %v\n", schedule.Matches(testTime))
}
Minute 3 matches '1-5 * * * *': true

Correct. The parser now handles *, single numbers, */N, and ranges.


Step 2: One-Off Tasks with at and batch

Cron is for repeating jobs. For one-time tasks, Linux has at.

Linux: Scheduling One-Off Jobs

Schedule a command to run at 2:00 AM:

echo "/home/user/backup.sh" | at 2:00 AM

Schedule a job 30 minutes from now:

echo "/home/user/report.sh" | at now + 30 minutes

List pending jobs:

atq

Output looks like this:

3   Wed Feb  5 02:00:00 2025 a user
4   Wed Feb  5 14:55:00 2025 a user

Remove a job by its number:

atrm 3

The batch command is similar to at, but it waits until the system load drops below 1.5 before running the job. This is useful for heavy tasks that should not compete with production workloads:

echo "/home/user/heavy-analysis.sh" | batch

Go: Building a Job Queue

Build a simple job queue that accepts commands with future execution times.

package main

import (
	"context"
	"fmt"
	"os/exec"
	"sync"
	"time"
)

type Job struct {
	ID      int
	Command string
	RunAt   time.Time
	Done    bool
	Output  string
	Error   string
}

type JobQueue struct {
	mu   sync.Mutex
	jobs []*Job
	next int
}

func NewJobQueue() *JobQueue {
	return &JobQueue{next: 1}
}

func (q *JobQueue) Add(command string, runAt time.Time) int {
	q.mu.Lock()
	defer q.mu.Unlock()

	job := &Job{
		ID:      q.next,
		Command: command,
		RunAt:   runAt,
	}
	q.jobs = append(q.jobs, job)
	q.next++
	return job.ID
}

func (q *JobQueue) List() []*Job {
	q.mu.Lock()
	defer q.mu.Unlock()

	pending := make([]*Job, 0)
	for _, j := range q.jobs {
		if !j.Done {
			pending = append(pending, j)
		}
	}
	return pending
}

func (q *JobQueue) Remove(id int) bool {
	q.mu.Lock()
	defer q.mu.Unlock()

	for i, j := range q.jobs {
		if j.ID == id && !j.Done {
			q.jobs = append(q.jobs[:i], q.jobs[i+1:]...)
			return true
		}
	}
	return false
}

func (q *JobQueue) RunPending(ctx context.Context) {
	q.mu.Lock()
	now := time.Now()
	var ready []*Job
	for _, j := range q.jobs {
		if !j.Done && !j.RunAt.After(now) {
			ready = append(ready, j)
		}
	}
	q.mu.Unlock()

	for _, j := range ready {
		select {
		case <-ctx.Done():
			return
		default:
		}

		cmd := exec.CommandContext(ctx, "sh", "-c", j.Command)
		output, err := cmd.CombinedOutput()

		q.mu.Lock()
		j.Done = true
		j.Output = string(output)
		if err != nil {
			j.Error = err.Error()
		}
		q.mu.Unlock()

		fmt.Printf("[Job %d] Completed: %s\n", j.ID, j.Command)
		if len(j.Output) > 0 {
			fmt.Printf("  Output: %s", j.Output)
		}
		if j.Error != "" {
			fmt.Printf("  Error: %s\n", j.Error)
		}
	}
}

func main() {
	queue := NewJobQueue()

	// Schedule jobs
	queue.Add("echo 'backup started' && date", time.Now().Add(1*time.Second))
	queue.Add("echo 'report generated'", time.Now().Add(2*time.Second))
	queue.Add("echo 'this one is removed'", time.Now().Add(3*time.Second))

	// Remove job 3 (like atrm)
	queue.Remove(3)

	// List pending
	fmt.Println("Pending jobs:")
	for _, j := range queue.List() {
		fmt.Printf("  [%d] %s at %s\n", j.ID, j.Command, j.RunAt.Format("15:04:05"))
	}

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	// Wait and run
	fmt.Println("\nWaiting for jobs...")
	time.Sleep(3 * time.Second)
	queue.RunPending(ctx)

	fmt.Println("\nAll done.")
}
go run main.go
Pending jobs:
  [1] echo 'backup started' && date at 14:30:02
  [2] echo 'report generated' at 14:30:03

Waiting for jobs...
[Job 1] Completed: echo 'backup started' && date
  Output: backup started
Wed Feb  5 14:30:04 2025
[Job 2] Completed: echo 'report generated'
  Output: report generated

All done.

The Bug: Goroutine Leak with time.After

A common mistake when scheduling delayed execution is using time.After inside a goroutine without cleanup:

func (q *JobQueue) RunPendingBuggy() {
	q.mu.Lock()
	var pending []*Job
	for _, j := range q.jobs {
		if !j.Done {
			pending = append(pending, j)
		}
	}
	q.mu.Unlock()

	for _, j := range pending {
		job := j
		go func() {
			delay := time.Until(job.RunAt)
			if delay > 0 {
				<-time.After(delay)
			}
			cmd := exec.Command("sh", "-c", job.Command)
			cmd.Run()
			job.Done = true
		}()
	}
}

The problem: time.After creates a channel and a timer that live until the timer fires. If the program exits before the timer fires, the goroutine leaks. There is no way to cancel it. If you schedule 1000 jobs that all run in 1 hour, you have 1000 goroutines and 1000 timers consuming memory for that entire hour. And if the program exits, those jobs never run.

The Fix: Use time.NewTimer with Context

Replace time.After with time.NewTimer and a context:

func (q *JobQueue) RunPendingFixed(ctx context.Context) {
	q.mu.Lock()
	var pending []*Job
	for _, j := range q.jobs {
		if !j.Done {
			pending = append(pending, j)
		}
	}
	q.mu.Unlock()

	var wg sync.WaitGroup
	for _, j := range pending {
		job := j
		wg.Add(1)
		go func() {
			defer wg.Done()

			delay := time.Until(job.RunAt)
			if delay <= 0 {
				delay = 0
			}
			timer := time.NewTimer(delay)
			defer timer.Stop()

			select {
			case <-ctx.Done():
				fmt.Printf("[Job %d] Cancelled\n", job.ID)
				return
			case <-timer.C:
			}

			cmd := exec.CommandContext(ctx, "sh", "-c", job.Command)
			output, err := cmd.CombinedOutput()

			q.mu.Lock()
			job.Done = true
			job.Output = string(output)
			if err != nil {
				job.Error = err.Error()
			}
			q.mu.Unlock()

			fmt.Printf("[Job %d] Completed: %s\n", job.ID, job.Command)
		}()
	}
	wg.Wait()
}

Now when the context is cancelled, the goroutine exits immediately. The timer is stopped and cleaned up. No leaks.


Step 3: Make for Task Dependencies

Cron and at run isolated commands. But real workflows have dependencies. You cannot deploy before you build. You cannot build before you test. Make solves this.

Linux: Makefiles for DevOps

Create a file called Makefile:

# Variables
APP_NAME = myapp
BUILD_DIR = ./build
GO_FILES = $(wildcard *.go)

# Default target
all: build

# Build depends on Go source files
build: $(GO_FILES)
	@echo "Building $(APP_NAME)..."
	go build -o $(BUILD_DIR)/$(APP_NAME) .
	@echo "Done."

# Test must pass before build
test:
	@echo "Running tests..."
	go test ./...

# Deploy depends on build
deploy: build
	@echo "Deploying $(APP_NAME)..."
	cp $(BUILD_DIR)/$(APP_NAME) /usr/local/bin/
	@echo "Deployed."

# Clean removes build artifacts
clean:
	@echo "Cleaning..."
	rm -rf $(BUILD_DIR)

# Lint runs before test
lint:
	@echo "Linting..."
	go vet ./...

# Full pipeline
ci: lint test build

# These targets are not files
.PHONY: all build test deploy clean lint ci

Run targets:

make build        # Builds the app
make test         # Runs tests
make deploy       # Builds first, then deploys
make ci           # Lint -> test -> build (in order)
make clean        # Remove build artifacts

Make reads the dependency graph. When you run make deploy, it sees that deploy depends on build. It runs build first. If build depends on source files that have not changed, Make skips it. This is the power of dependency-based execution.

The .PHONY declaration tells Make that these targets are not files. Without it, if a file named build exists in the directory, Make would think the target is already up to date and skip it.

The $@ variable is the target name. $< is the first dependency. $^ is all dependencies:

%.o: %.c
	gcc -c $< -o $@

This compiles any .c file into a .o file. $< is the .c file. $@ is the .o output.

Go: Building a Task Dependency Graph

Now build a task graph in Go. Each task has a name, a function to run, and a list of dependencies.

package main

import (
	"fmt"
	"strings"
	"time"
)

type Task struct {
	Name    string
	Deps    []string
	Run     func() error
	Done    bool
	Failed  bool
	elapsed time.Duration
}

type TaskGraph struct {
	tasks map[string]*Task
	order []string
}

func NewTaskGraph() *TaskGraph {
	return &TaskGraph{
		tasks: make(map[string]*Task),
	}
}

func (g *TaskGraph) Add(name string, deps []string, run func() error) {
	g.tasks[name] = &Task{
		Name: name,
		Deps: deps,
		Run:  run,
	}
}

func (g *TaskGraph) resolve(name string, resolved *[]string, seen map[string]bool) error {
	seen[name] = true

	task, ok := g.tasks[name]
	if !ok {
		return fmt.Errorf("unknown task: %s", name)
	}

	for _, dep := range task.Deps {
		if seen[dep] {
			continue
		}
		if err := g.resolve(dep, resolved, seen); err != nil {
			return err
		}
	}

	*resolved = append(*resolved, name)
	return nil
}

func (g *TaskGraph) Execute(target string) error {
	var resolved []string
	seen := make(map[string]bool)

	if err := g.resolve(target, &resolved, seen); err != nil {
		return err
	}

	for _, name := range resolved {
		task := g.tasks[name]
		if task.Done {
			continue
		}

		fmt.Printf("[RUN]  %s\n", name)
		start := time.Now()
		err := task.Run()
		task.elapsed = time.Since(start)

		if err != nil {
			task.Failed = true
			fmt.Printf("[FAIL] %s (%v) - %v\n", name, task.elapsed, err)
			return fmt.Errorf("task %s failed: %w", name, err)
		}

		task.Done = true
		fmt.Printf("[DONE] %s (%v)\n", name, task.elapsed)
	}

	return nil
}

func main() {
	g := NewTaskGraph()

	g.Add("lint", nil, func() error {
		fmt.Println("  Running go vet...")
		time.Sleep(100 * time.Millisecond)
		return nil
	})

	g.Add("test", []string{"lint"}, func() error {
		fmt.Println("  Running tests...")
		time.Sleep(200 * time.Millisecond)
		return nil
	})

	g.Add("build", []string{"test"}, func() error {
		fmt.Println("  Compiling binary...")
		time.Sleep(150 * time.Millisecond)
		return nil
	})

	g.Add("deploy", []string{"build"}, func() error {
		fmt.Println("  Deploying to server...")
		time.Sleep(100 * time.Millisecond)
		return nil
	})

	fmt.Println(strings.Repeat("-", 40))
	fmt.Println("Running: make deploy")
	fmt.Println(strings.Repeat("-", 40))

	if err := g.Execute("deploy"); err != nil {
		fmt.Printf("\nPipeline failed: %v\n", err)
	} else {
		fmt.Println("\nPipeline complete.")
	}
}
go run main.go
----------------------------------------
Running: make deploy
----------------------------------------
[RUN]  lint
  Running go vet...
[DONE] lint (100ms)
[RUN]  test
  Running tests...
[DONE] test (200ms)
[RUN]  build
  Compiling binary...
[DONE] build (150ms)
[RUN]  deploy
  Deploying to server...
[DONE] deploy (100ms)

Pipeline complete.

Tasks execute in dependency order. Build runs after test. Deploy runs after build.

The Bug: No Cycle Detection

Add a circular dependency:

g.Add("alpha", []string{"beta"}, func() error {
	return nil
})
g.Add("beta", []string{"alpha"}, func() error {
	return nil
})
g.Execute("alpha")

The resolve function checks seen[dep] and skips it with continue. This means it silently ignores the cycle and produces an incomplete ordering. Task alpha depends on beta, but beta tries to resolve alpha, sees it is already in seen, and skips it. Now beta is added to the resolved list before alpha. But beta depends on alpha, which has not run yet.

In the worst case with a slightly different implementation, this causes infinite recursion and a stack overflow.

The Fix: Track Resolution State

Use two sets: one for “currently resolving” (on the stack) and one for “fully resolved.”

func (g *TaskGraph) resolveFixed(name string, resolved *[]string, visiting map[string]bool, visited map[string]bool) error {
	if visited[name] {
		return nil
	}
	if visiting[name] {
		return fmt.Errorf("cycle detected: task %q depends on itself (directly or indirectly)", name)
	}

	visiting[name] = true

	task, ok := g.tasks[name]
	if !ok {
		return fmt.Errorf("unknown task: %s", name)
	}

	for _, dep := range task.Deps {
		if err := g.resolveFixed(dep, resolved, visiting, visited); err != nil {
			return err
		}
	}

	visiting[name] = false
	visited[name] = true
	*resolved = append(*resolved, name)
	return nil
}

Now test it:

g.Add("alpha", []string{"beta"}, func() error { return nil })
g.Add("beta", []string{"alpha"}, func() error { return nil })

var resolved []string
err := g.resolveFixed("alpha", &resolved, make(map[string]bool), make(map[string]bool))
if err != nil {
	fmt.Println(err)
}
cycle detected: task "alpha" depends on itself (directly or indirectly)

The cycle is caught before any task runs.


Step 4: Systemd Timers (Modern Cron)

Systemd timers are the modern replacement for cron. They integrate with systemd logging, dependency management, and service monitoring.

Linux: Creating a Systemd Timer

List all active timers:

systemctl list-timers

Output shows each timer, when it last ran, and when it will run next.

A systemd timer needs two files: a .timer file and a matching .service file.

Create the service file at /etc/systemd/system/mybackup.service:

[Unit]
Description=Run daily backup

[Service]
Type=oneshot
ExecStart=/home/user/backup.sh
User=user

Create the timer file at /etc/systemd/system/mybackup.timer:

[Unit]
Description=Daily backup timer

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true

[Install]
WantedBy=timers.target

Enable and start the timer:

sudo systemctl daemon-reload
sudo systemctl enable mybackup.timer
sudo systemctl start mybackup.timer

The OnCalendar format is Year-Month-Day Hour:Minute:Second. Some examples:

OnCalendar=*-*-* 02:00:00          # Daily at 2:00 AM
OnCalendar=Mon *-*-* 09:00:00      # Every Monday at 9:00 AM
OnCalendar=*-*-01 06:00:00         # First of every month at 6:00 AM

Another option is OnBootSec, which runs a specified time after boot:

OnBootSec=5min       # 5 minutes after boot
OnBootSec=1h         # 1 hour after boot

The Persistent=true option means that if the system was off when the timer should have fired, it will run as soon as the system starts.

Check logs for the service:

journalctl -u mybackup.service

Check when the timer will fire next:

systemctl status mybackup.timer

Go: Persistent Scheduler with State File

Build a scheduler that writes its state to a JSON file. This way it can survive restarts and know which jobs have already run.

package main

import (
	"encoding/json"
	"fmt"
	"os"
	"os/exec"
	"path/filepath"
	"time"
)

type ScheduledJob struct {
	Name       string    `json:"name"`
	Command    string    `json:"command"`
	IntervalS  int       `json:"interval_seconds"`
	LastRun    time.Time `json:"last_run"`
	RunCount   int       `json:"run_count"`
	LastStatus string    `json:"last_status"`
}

type SchedulerState struct {
	Jobs []*ScheduledJob `json:"jobs"`
}

func loadState(path string) (*SchedulerState, error) {
	data, err := os.ReadFile(path)
	if err != nil {
		if os.IsNotExist(err) {
			return &SchedulerState{}, nil
		}
		return nil, err
	}

	var state SchedulerState
	if err := json.Unmarshal(data, &state); err != nil {
		return nil, err
	}
	return &state, nil
}

func saveState(path string, state *SchedulerState) error {
	data, err := json.MarshalIndent(state, "", "  ")
	if err != nil {
		return err
	}
	return os.WriteFile(path, data, 0644)
}

func runDue(state *SchedulerState) {
	now := time.Now()
	for _, job := range state.Jobs {
		interval := time.Duration(job.IntervalS) * time.Second
		if now.Sub(job.LastRun) < interval {
			fmt.Printf("[SKIP] %s (next run in %v)\n", job.Name, interval-now.Sub(job.LastRun))
			continue
		}

		fmt.Printf("[RUN]  %s: %s\n", job.Name, job.Command)
		cmd := exec.Command("sh", "-c", job.Command)
		output, err := cmd.CombinedOutput()

		job.LastRun = now
		job.RunCount++

		if err != nil {
			job.LastStatus = fmt.Sprintf("FAILED: %v", err)
			fmt.Printf("[FAIL] %s: %v\n", job.Name, err)
		} else {
			job.LastStatus = "OK"
			fmt.Printf("[DONE] %s\n", job.Name)
		}

		if len(output) > 0 {
			fmt.Printf("  Output: %s", string(output))
		}
	}
}

func main() {
	stateFile := filepath.Join(os.TempDir(), "scheduler-state.json")

	state, err := loadState(stateFile)
	if err != nil {
		fmt.Printf("Error loading state: %v\n", err)
		return
	}

	// Define jobs if state is empty
	if len(state.Jobs) == 0 {
		state.Jobs = []*ScheduledJob{
			{Name: "health-check", Command: "echo 'OK'", IntervalS: 10},
			{Name: "disk-usage", Command: "df -h / | tail -1", IntervalS: 60},
			{Name: "log-rotate", Command: "echo 'rotating logs'", IntervalS: 3600},
		}
	}

	fmt.Println("Scheduler tick at", time.Now().Format("15:04:05"))
	fmt.Println()

	runDue(state)

	if err := saveState(stateFile, state); err != nil {
		fmt.Printf("Error saving state: %v\n", err)
	} else {
		fmt.Printf("\nState saved to %s\n", stateFile)
	}
}

Run it twice:

go run main.go
Scheduler tick at 14:30:00

[RUN]  health-check: echo 'OK'
[DONE] health-check
  Output: OK
[RUN]  disk-usage: df -h / | tail -1
[DONE] disk-usage
  Output: /dev/sda1       50G   22G   26G  46% /
[RUN]  log-rotate: echo 'rotating logs'
[DONE] log-rotate
  Output: rotating logs

State saved to /tmp/scheduler-state.json

Run again within 10 seconds:

go run main.go
Scheduler tick at 14:30:05

[SKIP] health-check (next run in 5s)
[SKIP] disk-usage (next run in 55s)
[SKIP] log-rotate (next run in 59m55s)

State saved to /tmp/scheduler-state.json

The scheduler remembers when each job last ran. It skips jobs that are not due yet.

The Bug: Race Condition on State File

If two instances of the scheduler run at the same time, they both read the state file, both see that a job is due, both run it, and both write the file. The second write overwrites the first. Depending on timing, you lose run counts or get duplicate executions.

This is the same problem that happens if you run two cron jobs that both update the same file.

Simulate it:

// Instance 1 reads state: last_run = 14:00:00
// Instance 2 reads state: last_run = 14:00:00
// Instance 1 runs job, sets last_run = 14:30:00, writes file
// Instance 2 runs job (duplicate!), sets last_run = 14:30:01, writes file
// Instance 1's write is lost

The Fix: Use a Lock File

Before reading the state, acquire a lock file. If the lock file already exists, wait or exit.

func acquireLock(path string) (*os.File, error) {
	lockPath := path + ".lock"
	f, err := os.OpenFile(lockPath, os.O_CREATE|os.O_EXCL|os.O_WRONLY, 0644)
	if err != nil {
		if os.IsExist(err) {
			return nil, fmt.Errorf("another instance is running (lock file exists: %s)", lockPath)
		}
		return nil, err
	}
	// Write PID for debugging
	fmt.Fprintf(f, "%d", os.Getpid())
	return f, nil
}

func releaseLock(f *os.File, path string) {
	lockPath := path + ".lock"
	f.Close()
	os.Remove(lockPath)
}

Use it in main:

func main() {
	stateFile := filepath.Join(os.TempDir(), "scheduler-state.json")

	lock, err := acquireLock(stateFile)
	if err != nil {
		fmt.Println(err)
		return
	}
	defer releaseLock(lock, stateFile)

	// ... rest of the scheduler
}

Now if a second instance tries to run, it gets:

another instance is running (lock file exists: /tmp/scheduler-state.json.lock)

The O_CREATE|O_EXCL flags make the open atomic. If the file already exists, the call fails. No race condition.


Step 5: Task Runner with Retries and Timeouts

Jobs fail. Networks drop. Services restart. A good task runner does not give up on the first failure.

Linux: Retry Patterns in Bash

A simple retry loop in bash:

for i in 1 2 3; do
    /home/user/deploy.sh && break || echo "Attempt $i failed, retrying in 5s..." && sleep 5
done

This tries the command up to 3 times. If it succeeds (&&), it breaks out of the loop. If it fails (||), it waits 5 seconds and tries again.

For timeouts, use the timeout command:

timeout 30 /home/user/long-task.sh

This kills the process after 30 seconds if it has not finished. The exit code is 124 when the timeout triggers.

Combine them:

for i in 1 2 3; do
    timeout 30 /home/user/deploy.sh && break || echo "Attempt $i failed" && sleep 5
done

Check the exit code to distinguish between a timeout and a regular failure:

timeout 30 /home/user/deploy.sh
status=$?
if [ $status -eq 124 ]; then
    echo "Command timed out"
elif [ $status -ne 0 ]; then
    echo "Command failed with exit code $status"
fi

Go: Retries with Exponential Backoff

Build a task executor that retries with exponential backoff and a timeout per attempt.

package main

import (
	"context"
	"fmt"
	"math"
	"os/exec"
	"time"
)

type TaskConfig struct {
	Name       string
	Command    string
	MaxRetries int
	Timeout    time.Duration
}

type TaskResult struct {
	Name     string
	Success  bool
	Attempts int
	Output   string
	Error    string
	Duration time.Duration
}

func executeWithRetry(cfg TaskConfig) TaskResult {
	result := TaskResult{Name: cfg.Name}
	start := time.Now()

	for attempt := 0; attempt <= cfg.MaxRetries; attempt++ {
		result.Attempts = attempt + 1

		if attempt > 0 {
			backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
			fmt.Printf("  Retry %d/%d in %v...\n", attempt, cfg.MaxRetries, backoff)
			time.Sleep(backoff)
		}

		ctx, cancel := context.WithTimeout(context.Background(), cfg.Timeout)
		cmd := exec.CommandContext(ctx, "sh", "-c", cfg.Command)
		output, err := cmd.CombinedOutput()
		cancel()

		result.Output = string(output)

		if err == nil {
			result.Success = true
			result.Duration = time.Since(start)
			return result
		}

		if ctx.Err() == context.DeadlineExceeded {
			result.Error = "timeout"
			fmt.Printf("  Attempt %d: timed out after %v\n", attempt+1, cfg.Timeout)
		} else {
			result.Error = err.Error()
			fmt.Printf("  Attempt %d: failed - %v\n", attempt+1, err)
		}
	}

	result.Duration = time.Since(start)
	return result
}

func main() {
	tasks := []TaskConfig{
		{
			Name:       "quick-task",
			Command:    "echo 'done'",
			MaxRetries: 3,
			Timeout:    5 * time.Second,
		},
		{
			Name:       "flaky-task",
			Command:    "if [ $(shuf -i 1-3 -n 1) -eq 1 ]; then echo 'success'; else exit 1; fi",
			MaxRetries: 5,
			Timeout:    5 * time.Second,
		},
		{
			Name:       "slow-task",
			Command:    "sleep 10 && echo 'done'",
			MaxRetries: 2,
			Timeout:    3 * time.Second,
		},
	}

	fmt.Println("Running tasks with retries and timeouts")
	fmt.Println()

	for _, cfg := range tasks {
		fmt.Printf("[START] %s\n", cfg.Name)
		result := executeWithRetry(cfg)

		if result.Success {
			fmt.Printf("[PASS]  %s (attempts: %d, duration: %v)\n", result.Name, result.Attempts, result.Duration)
		} else {
			fmt.Printf("[FAIL]  %s (attempts: %d, error: %s, duration: %v)\n", result.Name, result.Attempts, result.Error, result.Duration)
		}
		fmt.Println()
	}
}
go run main.go
Running tasks with retries and timeouts

[START] quick-task
[PASS]  quick-task (attempts: 1, duration: 5ms)

[START] flaky-task
  Attempt 1: failed - exit status 1
  Retry 1/5 in 2s...
  Attempt 2: failed - exit status 1
  Retry 2/5 in 4s...
[PASS]  flaky-task (attempts: 3, duration: 6.01s)

[START] slow-task
  Attempt 1: timed out after 3s
  Retry 1/2 in 2s...
  Attempt 2: timed out after 3s
  Retry 2/2 in 4s...
  Attempt 3: timed out after 3s
[FAIL]  slow-task (attempts: 3, error: timeout, duration: 15.02s)

The Bug: Unbounded Exponential Backoff

Look at the backoff calculation:

backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second

After 10 retries: 2^10 = 1024 seconds which is about 17 minutes. After 15 retries: 2^15 = 32768 seconds which is over 9 hours. The backoff grows without limit.

If MaxRetries is set to 20, the total wait time before all retries complete is over 12 days. That is not useful.

// Attempt 1:  2 seconds
// Attempt 2:  4 seconds
// Attempt 3:  8 seconds
// Attempt 4:  16 seconds
// Attempt 5:  32 seconds
// Attempt 6:  64 seconds  (over 1 minute)
// Attempt 7:  128 seconds (over 2 minutes)
// Attempt 8:  256 seconds (over 4 minutes)
// Attempt 9:  512 seconds (over 8 minutes)
// Attempt 10: 1024 seconds (over 17 minutes)

The Fix: Cap the Backoff

Add a maximum backoff duration:

func calculateBackoff(attempt int, maxBackoff time.Duration) time.Duration {
	backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
	if backoff > maxBackoff {
		backoff = maxBackoff
	}
	return backoff
}

Use it in the retry loop:

if attempt > 0 {
	backoff := calculateBackoff(attempt, 60*time.Second)
	fmt.Printf("  Retry %d/%d in %v...\n", attempt, cfg.MaxRetries, backoff)
	time.Sleep(backoff)
}

Now the backoff sequence is: 2s, 4s, 8s, 16s, 32s, 60s, 60s, 60s, … It never exceeds 60 seconds. After the cap, each retry waits exactly 1 minute. Predictable and reasonable.


Step 6: Complete Task Runner with Status Dashboard

Now combine everything from the previous steps into a single task runner: cron-like scheduling, dependency graph, retries, timeouts, and a colored terminal dashboard.

Configuration Format

Define tasks in a simple key-value format. No external YAML library needed.

package main

import (
	"bufio"
	"context"
	"fmt"
	"math"
	"os/exec"
	"strconv"
	"strings"
	"sync"
	"time"
)

// --- Configuration Parser ---

type TaskDef struct {
	Name       string
	Command    string
	Deps       []string
	MaxRetries int
	TimeoutSec int
	Schedule   string
}

func parseConfig(input string) []TaskDef {
	var tasks []TaskDef
	var current *TaskDef

	scanner := bufio.NewScanner(strings.NewReader(input))
	for scanner.Scan() {
		line := strings.TrimSpace(scanner.Text())

		if line == "" || strings.HasPrefix(line, "#") {
			continue
		}

		if strings.HasPrefix(line, "[") && strings.HasSuffix(line, "]") {
			if current != nil {
				tasks = append(tasks, *current)
			}
			name := line[1 : len(line)-1]
			current = &TaskDef{
				Name:       name,
				MaxRetries: 0,
				TimeoutSec: 30,
			}
			continue
		}

		if current == nil {
			continue
		}

		parts := strings.SplitN(line, "=", 2)
		if len(parts) != 2 {
			continue
		}

		key := strings.TrimSpace(parts[0])
		value := strings.TrimSpace(parts[1])

		switch key {
		case "command":
			current.Command = value
		case "deps":
			if value != "" {
				for _, d := range strings.Split(value, ",") {
					current.Deps = append(current.Deps, strings.TrimSpace(d))
				}
			}
		case "retries":
			n, err := strconv.Atoi(value)
			if err == nil {
				current.MaxRetries = n
			}
		case "timeout":
			n, err := strconv.Atoi(value)
			if err == nil {
				current.TimeoutSec = n
			}
		case "schedule":
			current.Schedule = value
		}
	}

	if current != nil {
		tasks = append(tasks, *current)
	}

	return tasks
}

Task Execution Engine

// --- Task Status ---

type Status int

const (
	Pending Status = iota
	Running
	Passed
	Failed
	Skipped
)

func (s Status) String() string {
	switch s {
	case Pending:
		return "PENDING"
	case Running:
		return "RUNNING"
	case Passed:
		return "PASSED"
	case Failed:
		return "FAILED"
	case Skipped:
		return "SKIPPED"
	default:
		return "UNKNOWN"
	}
}

func (s Status) Color() string {
	switch s {
	case Pending:
		return "\033[33m" // yellow
	case Running:
		return "\033[36m" // cyan
	case Passed:
		return "\033[32m" // green
	case Failed:
		return "\033[31m" // red
	case Skipped:
		return "\033[90m" // gray
	default:
		return "\033[0m"
	}
}

const colorReset = "\033[0m"

// --- Task Result ---

type RunResult struct {
	Name     string
	Status   Status
	Attempts int
	Output   string
	Error    string
	Duration time.Duration
}

// --- Backoff ---

func cappedBackoff(attempt int, max time.Duration) time.Duration {
	b := time.Duration(math.Pow(2, float64(attempt))) * time.Second
	if b > max {
		return max
	}
	return b
}

// --- Executor ---

func executeTask(ctx context.Context, def TaskDef) RunResult {
	result := RunResult{Name: def.Name}
	start := time.Now()
	timeout := time.Duration(def.TimeoutSec) * time.Second

	for attempt := 0; attempt <= def.MaxRetries; attempt++ {
		result.Attempts = attempt + 1

		if attempt > 0 {
			backoff := cappedBackoff(attempt, 60*time.Second)
			select {
			case <-ctx.Done():
				result.Status = Skipped
				result.Error = "cancelled"
				result.Duration = time.Since(start)
				return result
			case <-time.After(backoff):
			}
		}

		cmdCtx, cancel := context.WithTimeout(ctx, timeout)
		cmd := exec.CommandContext(cmdCtx, "sh", "-c", def.Command)
		output, err := cmd.CombinedOutput()
		cancel()

		result.Output = string(output)

		if err == nil {
			result.Status = Passed
			result.Duration = time.Since(start)
			return result
		}

		if cmdCtx.Err() == context.DeadlineExceeded {
			result.Error = "timeout"
		} else {
			result.Error = err.Error()
		}
	}

	result.Status = Failed
	result.Duration = time.Since(start)
	return result
}

Dependency Resolution

// --- Dependency Graph ---

func resolveDeps(defs []TaskDef, target string) ([]string, error) {
	defMap := make(map[string]TaskDef)
	for _, d := range defs {
		defMap[d.Name] = d
	}

	var order []string
	visiting := make(map[string]bool)
	visited := make(map[string]bool)

	var visit func(string) error
	visit = func(name string) error {
		if visited[name] {
			return nil
		}
		if visiting[name] {
			return fmt.Errorf("cycle detected at task %q", name)
		}
		visiting[name] = true

		def, ok := defMap[name]
		if !ok {
			return fmt.Errorf("unknown task: %s", name)
		}

		for _, dep := range def.Deps {
			if err := visit(dep); err != nil {
				return err
			}
		}

		visiting[name] = false
		visited[name] = true
		order = append(order, name)
		return nil
	}

	if err := visit(target); err != nil {
		return nil, err
	}
	return order, nil
}

Dashboard Display

// --- Dashboard ---

func printHeader() {
	fmt.Println()
	fmt.Println(strings.Repeat("=", 70))
	fmt.Println("  TASK RUNNER")
	fmt.Println(strings.Repeat("=", 70))
	fmt.Println()
}

func printTaskLine(r RunResult) {
	color := r.Status.Color()
	status := fmt.Sprintf("[%s]", r.Status)

	dur := r.Duration.Truncate(time.Millisecond)
	attempts := ""
	if r.Attempts > 1 {
		attempts = fmt.Sprintf(" (attempts: %d)", r.Attempts)
	}

	errMsg := ""
	if r.Error != "" && r.Status == Failed {
		errMsg = fmt.Sprintf(" - %s", r.Error)
	}

	fmt.Printf("  %s%-9s%s %-25s %10v%s%s\n",
		color, status, colorReset,
		r.Name, dur, attempts, errMsg)
}

func printSummary(results []RunResult) {
	fmt.Println()
	fmt.Println(strings.Repeat("-", 70))

	total := len(results)
	passed := 0
	failed := 0
	skipped := 0
	var totalDuration time.Duration

	for _, r := range results {
		totalDuration += r.Duration
		switch r.Status {
		case Passed:
			passed++
		case Failed:
			failed++
		case Skipped:
			skipped++
		}
	}

	fmt.Printf("\n  Total: %d", total)
	if passed > 0 {
		fmt.Printf("  |  \033[32mPassed: %d\033[0m", passed)
	}
	if failed > 0 {
		fmt.Printf("  |  \033[31mFailed: %d\033[0m", failed)
	}
	if skipped > 0 {
		fmt.Printf("  |  \033[90mSkipped: %d\033[0m", skipped)
	}
	fmt.Printf("  |  Duration: %v\n", totalDuration.Truncate(time.Millisecond))
	fmt.Println()
	fmt.Println(strings.Repeat("=", 70))
}

Main: Putting It All Together

func main() {
	config := `
# Task Runner Configuration

[lint]
command = echo 'Running linter...' && sleep 0.1 && echo 'No issues found'
deps =
retries = 0
timeout = 10

[test]
command = echo 'Running tests...' && sleep 0.2 && echo '14 passed, 0 failed'
deps = lint
retries = 1
timeout = 15

[build]
command = echo 'Compiling...' && sleep 0.3 && echo 'Build complete: myapp v1.0.0'
deps = test
retries = 2
timeout = 30

[migrate]
command = echo 'Running migrations...' && sleep 0.1 && echo 'Applied 3 migrations'
deps =
retries = 3
timeout = 20

[deploy]
command = echo 'Deploying to production...' && sleep 0.2 && echo 'Deploy successful'
deps = build, migrate
retries = 2
timeout = 60

[notify]
command = echo 'Sending notification...' && echo 'Team notified via webhook'
deps = deploy
retries = 1
timeout = 10
`

	defs := parseConfig(config)

	defMap := make(map[string]TaskDef)
	for _, d := range defs {
		defMap[d.Name] = d
	}

	// Resolve execution order for the final target
	target := "notify"
	order, err := resolveDeps(defs, target)
	if err != nil {
		fmt.Printf("Error: %v\n", err)
		return
	}

	printHeader()
	fmt.Printf("  Target: %s\n", target)
	fmt.Printf("  Execution order: %s\n", strings.Join(order, " -> "))
	fmt.Println()
	fmt.Println(strings.Repeat("-", 70))
	fmt.Println()

	ctx := context.Background()
	var results []RunResult
	var mu sync.Mutex

	completed := make(map[string]bool)

	for _, name := range order {
		def := defMap[name]

		// Check if all deps passed
		allDepsPassed := true
		for _, dep := range def.Deps {
			mu.Lock()
			if !completed[dep] {
				allDepsPassed = false
			}
			mu.Unlock()
		}

		if !allDepsPassed {
			r := RunResult{
				Name:   name,
				Status: Skipped,
				Error:  "dependency failed",
			}
			printTaskLine(r)
			results = append(results, r)
			continue
		}

		// Show running status
		fmt.Printf("  %s%-9s%s %-25s\r",
			Running.Color(), fmt.Sprintf("[%s]", Running), colorReset, name)

		result := executeTask(ctx, def)
		printTaskLine(result)

		if result.Status == Passed {
			mu.Lock()
			completed[name] = true
			mu.Unlock()
		}

		results = append(results, result)
	}

	printSummary(results)
}

Running the Complete Task Runner

Save all the code blocks above into a single main.go file and run it:

go run main.go
======================================================================
  TASK RUNNER
======================================================================

  Target: notify
  Execution order: lint -> test -> build -> migrate -> deploy -> notify

----------------------------------------------------------------------

  [PASSED]  lint                           102ms
  [PASSED]  test                           205ms
  [PASSED]  build                          308ms
  [PASSED]  migrate                        104ms
  [PASSED]  deploy                         207ms
  [PASSED]  notify                          12ms

----------------------------------------------------------------------

  Total: 6  |  Passed: 6  |  Duration: 938ms

======================================================================

All six tasks run in dependency order. Lint runs first because test depends on it. Build and migrate can both run after their dependencies complete. Deploy waits for both build and migrate. Notify runs last.

Testing a Failure

Change the build command to simulate a failure:

[build]
command = echo 'Compiling...' && exit 1
deps = test
retries = 2
timeout = 30
======================================================================
  TASK RUNNER
======================================================================

  Target: notify
  Execution order: lint -> test -> build -> migrate -> deploy -> notify

----------------------------------------------------------------------

  [PASSED]  lint                           101ms
  [PASSED]  test                           204ms
  [FAILED]  build                         6.31s (attempts: 3) - exit status 1
  [PASSED]  migrate                        103ms
  [SKIPPED] deploy                            0s
  [SKIPPED] notify                            0s

----------------------------------------------------------------------

  Total: 6  |  Passed: 3  |  Failed: 1  |  Skipped: 2  |  Duration: 6.718s

======================================================================

Build fails after 3 attempts (1 original + 2 retries). Deploy is skipped because it depends on build. Notify is skipped because it depends on deploy. But migrate still runs because it has no dependency on build.

The runner does not stop everything on first failure. It continues running tasks that do not depend on the failed one. Only downstream tasks are skipped.


What You Built

This article covered six layers of task automation:

  1. Cron for repeating schedules. You parsed cron expressions in Go with range support.
  2. At for one-off jobs. You built a job queue with proper goroutine cleanup using timers and contexts.
  3. Make for dependencies. You built a task graph with topological sort and cycle detection.
  4. Systemd timers for persistent scheduling. You built a state file with lock-based concurrency control.
  5. Retries and timeouts for resilience. You implemented exponential backoff with a cap.
  6. A complete task runner that combines all of the above into a single tool.

Every Go program uses only the standard library. Every Linux command runs on any modern distribution.

The task runner you built handles the same core problems as production tools: scheduling, dependency ordering, failure recovery, and status reporting. The patterns here — topological sort, exponential backoff, lock files, context cancellation — show up in build systems, CI pipelines, and orchestration tools.

References and Further Reading

Keep Reading

Question

What task automation patterns have you implemented in your infrastructure?

Similar Articles

More from devops