Skip to main content
Menu
Home WhoAmI Stack Insights Blog Contact
/user/KayD @ karandeepsingh.ca :~$ cat nginx-logs-and-docker-your-ultimate-guide.md

Nginx Log Analysis: From grep to a Go Log Parser

Karandeep Singh
β€’ 26 minutes read

Summary

Nginx log analysis from grep to Go β€” each step shows the command-line approach first, then builds it in Go. Parse access logs, find slow endpoints, detect error spikes, build a log analysis dashboard.

When something goes wrong in production, the first thing you check is the nginx access log. Who requested what, how long it took, and what broke. In this article, we’ll learn nginx log analysis starting with grep and awk one-liners for quick answers, then build a Go parser for deeper analysis.

Prerequisites

  • A Linux system (native, WSL, or SSH)
  • Go 1.21+ installed
  • An nginx access log file (we’ll create a sample one)

Step 1: Understanding the Nginx Log Format

Nginx writes one line per request to its access log. The default format is called combined:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

Here is a sample log line:

93.184.216.34 - - [15/Feb/2026:14:30:05 +0000] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"

Each field:

FieldValueMeaning
$remote_addr93.184.216.34Client IP address
--Remote user identity (almost always -)
$remote_user-Authenticated user (usually -)
$time_local[15/Feb/2026:14:30:05 +0000]Timestamp in server’s local time
$request"GET /api/users HTTP/1.1"Method, path, and protocol
$status200HTTP status code
$body_bytes_sent1234Response size in bytes
$http_referer"https://example.com"Page that linked to this request
$http_user_agent"Mozilla/5.0"Client browser or bot string

The combined format is missing one critical field: response time. Without it, you can see what happened, but not how long it took. For production servers, use a custom format that includes $request_time:

log_format timed '$remote_addr - $remote_user [$time_local] '
                 '"$request" $status $body_bytes_sent '
                 '"$http_referer" "$http_user_agent" '
                 '$request_time $upstream_response_time';

$request_time is the total time nginx spent processing the request, in seconds with millisecond precision. $upstream_response_time is how long the backend took. The difference between them tells you how much time nginx itself added.

Create a Sample Log File

We need a realistic log file for all the steps that follow. This script generates 1000 lines with a mix of status codes, endpoints, response times, and IPs:

cat > /tmp/generate_logs.sh << 'SCRIPT'
#!/bin/bash

LOGFILE="/tmp/access.log"
> "$LOGFILE"

IPS=("10.0.1.50" "10.0.1.51" "10.0.1.52" "93.184.216.34" "172.16.0.10"
     "192.168.1.100" "203.0.113.15" "198.51.100.22" "10.0.2.80" "10.0.3.90")

ENDPOINTS=("/api/users" "/api/orders" "/api/search" "/api/auth/login"
           "/api/auth/refresh" "/api/orders/process" "/api/reports/generate"
           "/api/export/csv" "/api/users/9999" "/api/health"
           "/static/app.js" "/static/style.css" "/static/old-file.js"
           "/" "/favicon.ico")

AGENTS=("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36"
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
        "curl/7.81.0"
        "python-requests/2.28.1"
        "Go-http-client/1.1")

for i in $(seq 1 1000); do
    IP=${IPS[$((RANDOM % ${#IPS[@]}))]}
    ENDPOINT=${ENDPOINTS[$((RANDOM % ${#ENDPOINTS[@]}))]}
    AGENT=${AGENTS[$((RANDOM % ${#AGENTS[@]}))]}

    HOUR=$((14 + RANDOM % 1))
    MINUTE=$(printf "%02d" $((RANDOM % 60)))
    SECOND=$(printf "%02d" $((RANDOM % 60)))
    TIMESTAMP="15/Feb/2026:${HOUR}:${MINUTE}:${SECOND} +0000"

    # Weighted status codes: mostly 200, some errors
    RAND=$((RANDOM % 100))
    if [ $RAND -lt 65 ]; then
        STATUS=200
    elif [ $RAND -lt 70 ]; then
        STATUS=301
    elif [ $RAND -lt 75 ]; then
        STATUS=304
    elif [ $RAND -lt 85 ]; then
        STATUS=404
    elif [ $RAND -lt 93 ]; then
        STATUS=500
    else
        STATUS=502
    fi

    # Response time varies by endpoint
    case "$ENDPOINT" in
        "/api/reports/generate") RT=$(awk "BEGIN{printf \"%.3f\", 2.0 + (${RANDOM} % 3000) / 1000.0}");;
        "/api/export/csv")       RT=$(awk "BEGIN{printf \"%.3f\", 1.5 + (${RANDOM} % 2500) / 1000.0}");;
        "/api/search")           RT=$(awk "BEGIN{printf \"%.3f\", 0.5 + (${RANDOM} % 2000) / 1000.0}");;
        "/static/"*)             RT=$(awk "BEGIN{printf \"%.3f\", 0.001 + (${RANDOM} % 10) / 1000.0}");;
        "/favicon.ico")          RT=$(awk "BEGIN{printf \"%.3f\", 0.001 + (${RANDOM} % 5) / 1000.0}");;
        *)                       RT=$(awk "BEGIN{printf \"%.3f\", 0.01 + (${RANDOM} % 500) / 1000.0}");;
    esac

    BYTES=$((RANDOM % 50000 + 100))
    REFERER="-"
    if [ $((RANDOM % 3)) -eq 0 ]; then
        REFERER="https://example.com/page"
    fi

    METHOD="GET"
    if [[ "$ENDPOINT" == *"process"* || "$ENDPOINT" == *"login"* ]]; then
        METHOD="POST"
    fi

    echo "$IP - - [$TIMESTAMP] \"$METHOD $ENDPOINT HTTP/1.1\" $STATUS $BYTES \"$REFERER\" \"$AGENT\" $RT" >> "$LOGFILE"
done

# Add a few malformed lines (real logs always have these)
echo "malformed line with no structure" >> "$LOGFILE"
echo "" >> "$LOGFILE"
echo "10.0.1.50 - - [15/Feb/2026:14:30:00 +0000] \"-\" 400 0 \"-\" \"-\" 0.000" >> "$LOGFILE"

echo "Generated $(wc -l < $LOGFILE) log lines in $LOGFILE"
SCRIPT

bash /tmp/generate_logs.sh

Expected output:

Generated 1003 log lines in /tmp/access.log

Check that it worked:

head -3 /tmp/access.log

Expected output:

10.0.1.51 - - [15/Feb/2026:14:23:45 +0000] "GET /api/users HTTP/1.1" 200 12345 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" 0.045
93.184.216.34 - - [15/Feb/2026:14:12:08 +0000] "GET /api/search HTTP/1.1" 200 8901 "https://example.com/page" "curl/7.81.0" 1.234
10.0.2.80 - - [15/Feb/2026:14:55:31 +0000] "POST /api/orders/process HTTP/1.1" 500 256 "-" "python-requests/2.28.1" 0.892

Your numbers will be different because the script uses $RANDOM. That’s fine. The structure is what matters.

Step 2: Quick Analysis with grep and awk

These are the commands you run in the first five minutes of an incident. They give you answers in seconds.

How many requests total?

wc -l /tmp/access.log

Expected output:

1003 /tmp/access.log

How many 500 errors?

grep '" 500 ' /tmp/access.log | wc -l

The pattern " 500 " matches the status code field. The quotes and spaces prevent false matches β€” without them, you’d match bytes_sent values that happen to contain “500”.

How many 404s?

grep '" 404 ' /tmp/access.log | wc -l

Top 10 most requested URLs:

awk '{print $7}' /tmp/access.log | sort | uniq -c | sort -rn | head -10

Field $7 is the request path. sort | uniq -c counts occurrences. sort -rn sorts by count descending.

Top 10 IPs by request count:

awk '{print $1}' /tmp/access.log | sort | uniq -c | sort -rn | head -10

Same pattern, different field. Field $1 is the client IP.

Requests per minute (shows traffic pattern):

awk '{print $4}' /tmp/access.log | cut -d: -f1-3 | sort | uniq -c | tail -60

This extracts the timestamp, strips it to [DD/Mon/YYYY:HH:MM, then counts. Useful for spotting traffic spikes.

All 500 errors with timestamps and paths:

grep '" 500 ' /tmp/access.log | awk '{print $4, $7}'

Expected output (sample):

[15/Feb/2026:14:23:11 /api/orders/process
[15/Feb/2026:14:45:02 /api/search
[15/Feb/2026:14:51:38 /api/orders/process

These one-liners work fast. They answer immediate questions during an incident. But they have limits. You cannot easily calculate percentiles, correlate multiple fields, or build a reusable tool. That’s where Go comes in.

Step 3: Parse Nginx Logs in Go

We’ll write a program that reads an nginx access log and parses each line into a struct. This is the foundation for everything else.

mkdir -p /tmp/logparser && cd /tmp/logparser
go mod init logparser

main.go

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"regexp"
	"strconv"
	"time"
)

type LogEntry struct {
	IP           string
	Timestamp    time.Time
	Method       string
	Path         string
	Status       int
	BytesSent    int
	ResponseTime float64
}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("Usage: go run main.go <logfile>")
	}

	file, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatalf("Cannot open file: %v", err)
	}
	defer file.Close()

	// Regex for the timed log format
	pattern := `^(\S+) - \S+ \[(.+?)\] "(\S+) (\S+) \S+" (\d{3}) (\d+) ".+?" ".+?" (\S+)$`
	re := regexp.MustCompile(pattern)

	var entries []LogEntry
	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		line := scanner.Text()
		match := re.FindStringSubmatch(line)

		// BUG: no nil check on match β€” this will panic on malformed lines
		ip := match[1]
		ts, _ := time.Parse("02/Jan/2006:15:04:05 -0700", match[2])
		status, _ := strconv.Atoi(match[5])
		bytes, _ := strconv.Atoi(match[6])
		rt, _ := strconv.ParseFloat(match[7], 64)

		entries = append(entries, LogEntry{
			IP:           ip,
			Timestamp:    ts,
			Method:       match[3],
			Path:         match[4],
			Status:       status,
			BytesSent:    bytes,
			ResponseTime: rt,
		})
	}

	fmt.Printf("Parsed %d entries\n\n", len(entries))
	for i, e := range entries {
		if i >= 5 {
			break
		}
		fmt.Printf("%-16s %s  %s %-30s %d  %6d bytes  %.3fs\n",
			e.IP, e.Timestamp.Format("15:04:05"), e.Method, e.Path, e.Status, e.BytesSent, e.ResponseTime)
	}
}

Run it:

cd /tmp/logparser && go run main.go /tmp/access.log

Expected output:

panic: runtime error: index out of range [1] with length 0

goroutine 1 [running]:
main.main()
        /tmp/logparser/main.go:44 +0x...

It panics. The sample log file has malformed lines at the end. When FindStringSubmatch can’t match a line, it returns nil. Accessing match[1] on a nil slice causes a panic.

This happens with every real log file. Bots send garbage requests. Health checks produce odd entries. Load balancers inject their own lines. You must handle parse failures.

Fix: Check the Regex Match

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"regexp"
	"strconv"
	"time"
)

type LogEntry struct {
	IP           string
	Timestamp    time.Time
	Method       string
	Path         string
	Status       int
	BytesSent    int
	ResponseTime float64
}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("Usage: go run main.go <logfile>")
	}

	file, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatalf("Cannot open file: %v", err)
	}
	defer file.Close()

	pattern := `^(\S+) - \S+ \[(.+?)\] "(\S+) (\S+) \S+" (\d{3}) (\d+) ".+?" ".+?" (\S+)$`
	re := regexp.MustCompile(pattern)

	var entries []LogEntry
	skipped := 0
	total := 0
	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		line := scanner.Text()
		total++

		match := re.FindStringSubmatch(line)
		if match == nil {
			skipped++
			continue
		}

		ip := match[1]
		ts, _ := time.Parse("02/Jan/2006:15:04:05 -0700", match[2])
		status, _ := strconv.Atoi(match[5])
		bytes, _ := strconv.Atoi(match[6])
		rt, _ := strconv.ParseFloat(match[7], 64)

		entries = append(entries, LogEntry{
			IP:           ip,
			Timestamp:    ts,
			Method:       match[3],
			Path:         match[4],
			Status:       status,
			BytesSent:    bytes,
			ResponseTime: rt,
		})
	}

	fmt.Printf("Parsed: %d/%d lines (%d malformed, skipped)\n\n", len(entries), total, skipped)
	for i, e := range entries {
		if i >= 5 {
			break
		}
		fmt.Printf("%-16s %s  %s %-30s %d  %6d bytes  %.3fs\n",
			e.IP, e.Timestamp.Format("15:04:05"), e.Method, e.Path, e.Status, e.BytesSent, e.ResponseTime)
	}
}

Run it again:

cd /tmp/logparser && go run main.go /tmp/access.log

Expected output:

Parsed: 1000/1003 lines (3 malformed, skipped)

10.0.1.51        14:23:45  GET /api/users                  200   12345 bytes  0.045s
93.184.216.34    14:12:08  GET /api/search                 200    8901 bytes  1.234s
10.0.2.80        14:55:31  POST /api/orders/process         500     256 bytes  0.892s
10.0.1.50        14:07:19  GET /api/health                 200    1024 bytes  0.003s
172.16.0.10      14:41:55  GET /static/app.js              304    5678 bytes  0.002s

The key change: check if match == nil before accessing any capture group. Skip malformed lines and count them. In production, you might also want to log the malformed lines to a separate file for investigation.

Step 4: Analyze Response Codes and Find Errors

Linux Commands

Status code distribution β€” this is the first thing you check:

awk '{print $9}' /tmp/access.log | sort | uniq -c | sort -rn

Field $9 is the status code in the combined/timed format.

Error rate as a percentage:

awk '{s[$9]++; total++} END {for(k in s) if(k>=400) e+=s[k]; printf "Error rate: %.1f%% (%d/%d)\n", e/total*100, e, total}' /tmp/access.log

Go Code

We’ll build on the parser from Step 3. After parsing all entries, calculate the status code distribution and find the top error paths.

main.go

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"regexp"
	"sort"
	"strconv"
	"time"
)

type LogEntry struct {
	IP           string
	Timestamp    time.Time
	Method       string
	Path         string
	Status       int
	BytesSent    int
	ResponseTime float64
}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("Usage: go run main.go <logfile>")
	}

	entries := parseLog(os.Args[1])
	fmt.Printf("Parsed %d entries\n\n", len(entries))

	// Status code distribution
	statusCounts := make(map[string]int)
	for _, e := range entries {
		// BUG: grouping by string comparison
		code := fmt.Sprintf("%d", e.Status)
		if code >= "200" && code < "300" {
			statusCounts["2xx"]++
		} else if code >= "300" && code < "400" {
			statusCounts["3xx"]++
		} else if code >= "400" && code < "500" {
			statusCounts["4xx"]++
		} else if code >= "500" && code < "600" {
			statusCounts["5xx"]++
		}
	}

	fmt.Println("[Status Code Groups]")
	total := len(entries)
	for _, group := range []string{"2xx", "3xx", "4xx", "5xx"} {
		count := statusCounts[group]
		pct := float64(count) / float64(total) * 100
		fmt.Printf("  %s: %d (%.1f%%)\n", group, count, pct)
	}

	// Error rate
	errors := statusCounts["4xx"] + statusCounts["5xx"]
	fmt.Printf("\nError rate: %.1f%% (%d/%d)\n", float64(errors)/float64(total)*100, errors, total)

	// Top 10 paths returning 500
	fmt.Println("\n[Top 500 Error Paths]")
	error500 := make(map[string]int)
	for _, e := range entries {
		code := fmt.Sprintf("%d", e.Status)
		if code == "500" {
			error500[e.Path]++
		}
	}
	printTopN(error500, 10)

	// Top 10 paths returning 404
	fmt.Println("\n[Top 404 Paths]")
	error404 := make(map[string]int)
	for _, e := range entries {
		code := fmt.Sprintf("%d", e.Status)
		if code == "404" {
			error404[e.Path]++
		}
	}
	printTopN(error404, 10)
}

func parseLog(filename string) []LogEntry {
	file, err := os.Open(filename)
	if err != nil {
		log.Fatalf("Cannot open file: %v", err)
	}
	defer file.Close()

	pattern := `^(\S+) - \S+ \[(.+?)\] "(\S+) (\S+) \S+" (\d{3}) (\d+) ".+?" ".+?" (\S+)$`
	re := regexp.MustCompile(pattern)

	var entries []LogEntry
	skipped := 0
	total := 0
	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		line := scanner.Text()
		total++
		match := re.FindStringSubmatch(line)
		if match == nil {
			skipped++
			continue
		}
		ts, _ := time.Parse("02/Jan/2006:15:04:05 -0700", match[2])
		status, _ := strconv.Atoi(match[5])
		bytes, _ := strconv.Atoi(match[6])
		rt, _ := strconv.ParseFloat(match[7], 64)

		entries = append(entries, LogEntry{
			IP: match[1], Timestamp: ts, Method: match[3],
			Path: match[4], Status: status, BytesSent: bytes, ResponseTime: rt,
		})
	}
	fmt.Printf("Parsed: %d/%d lines (%d malformed, skipped)\n", len(entries), total, skipped)
	return entries
}

type kv struct {
	Key   string
	Value int
}

func printTopN(m map[string]int, n int) {
	var sorted []kv
	for k, v := range m {
		sorted = append(sorted, kv{k, v})
	}
	sort.Slice(sorted, func(i, j int) bool { return sorted[i].Value > sorted[j].Value })
	for i, item := range sorted {
		if i >= n {
			break
		}
		fmt.Printf("  %-35s %d hits\n", item.Key, item.Value)
	}
}

Run it:

cd /tmp/logparser && go run main.go /tmp/access.log

Expected output:

Parsed: 1000/1003 lines (3 malformed, skipped)
Parsed 1000 entries

[Status Code Groups]
  2xx: 650 (65.0%)
  3xx: 83 (8.3%)
  4xx: 100 (10.0%)
  5xx: 167 (16.7%)

Error rate: 26.7% (267/1000)

[Top 500 Error Paths]
  /api/orders/process               14 hits
  /api/search                       12 hits
  /api/users                        11 hits

[Top 404 Paths]
  /api/users/9999                   15 hits
  /static/old-file.js               12 hits
  /api/orders                        9 hits

This looks like it works. But there’s a subtle bug. Look at this code:

code := fmt.Sprintf("%d", e.Status)
if code >= "200" && code < "300" {

We’re comparing status codes as strings. String comparison is lexicographic (character by character), not numeric. For the status codes we use here (200, 301, 404, 500, 502), string comparison happens to give the right answer because all codes are three digits. But it’s wrong for two reasons:

  1. If a status code were somehow 99 (two digits), "99" > "500" is true in string comparison because "9" > "5".
  2. The code converts an integer to a string and then compares strings. That’s unnecessary work and makes the intent unclear.

Fix: Compare Integers Directly

Replace all the string-based comparisons with integer comparisons:

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"regexp"
	"sort"
	"strconv"
	"time"
)

type LogEntry struct {
	IP           string
	Timestamp    time.Time
	Method       string
	Path         string
	Status       int
	BytesSent    int
	ResponseTime float64
}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("Usage: go run main.go <logfile>")
	}

	entries := parseLog(os.Args[1])
	fmt.Printf("Parsed %d entries\n\n", len(entries))

	// Status code distribution β€” compare integers, not strings
	statusCounts := make(map[string]int)
	for _, e := range entries {
		switch {
		case e.Status >= 200 && e.Status < 300:
			statusCounts["2xx"]++
		case e.Status >= 300 && e.Status < 400:
			statusCounts["3xx"]++
		case e.Status >= 400 && e.Status < 500:
			statusCounts["4xx"]++
		case e.Status >= 500 && e.Status < 600:
			statusCounts["5xx"]++
		}
	}

	fmt.Println("[Status Code Groups]")
	total := len(entries)
	for _, group := range []string{"2xx", "3xx", "4xx", "5xx"} {
		count := statusCounts[group]
		pct := float64(count) / float64(total) * 100
		fmt.Printf("  %s: %d (%.1f%%)\n", group, count, pct)
	}

	errors := statusCounts["4xx"] + statusCounts["5xx"]
	fmt.Printf("\nError rate: %.1f%% (%d/%d)\n", float64(errors)/float64(total)*100, errors, total)

	// Top paths returning 500 β€” compare integer directly
	fmt.Println("\n[Top 500 Error Paths]")
	error500 := make(map[string]int)
	for _, e := range entries {
		if e.Status == 500 {
			error500[e.Path]++
		}
	}
	printTopN(error500, 10)

	fmt.Println("\n[Top 404 Paths]")
	error404 := make(map[string]int)
	for _, e := range entries {
		if e.Status == 404 {
			error404[e.Path]++
		}
	}
	printTopN(error404, 10)
}

func parseLog(filename string) []LogEntry {
	file, err := os.Open(filename)
	if err != nil {
		log.Fatalf("Cannot open file: %v", err)
	}
	defer file.Close()

	pattern := `^(\S+) - \S+ \[(.+?)\] "(\S+) (\S+) \S+" (\d{3}) (\d+) ".+?" ".+?" (\S+)$`
	re := regexp.MustCompile(pattern)

	var entries []LogEntry
	skipped := 0
	total := 0
	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		line := scanner.Text()
		total++
		match := re.FindStringSubmatch(line)
		if match == nil {
			skipped++
			continue
		}
		ts, _ := time.Parse("02/Jan/2006:15:04:05 -0700", match[2])
		status, _ := strconv.Atoi(match[5])
		bytes, _ := strconv.Atoi(match[6])
		rt, _ := strconv.ParseFloat(match[7], 64)

		entries = append(entries, LogEntry{
			IP: match[1], Timestamp: ts, Method: match[3],
			Path: match[4], Status: status, BytesSent: bytes, ResponseTime: rt,
		})
	}
	fmt.Printf("Parsed: %d/%d lines (%d malformed, skipped)\n", len(entries), total, skipped)
	return entries
}

type kv struct {
	Key   string
	Value int
}

func printTopN(m map[string]int, n int) {
	var sorted []kv
	for k, v := range m {
		sorted = append(sorted, kv{k, v})
	}
	sort.Slice(sorted, func(i, j int) bool { return sorted[i].Value > sorted[j].Value })
	for i, item := range sorted {
		if i >= n {
			break
		}
		fmt.Printf("  %-35s %d hits\n", item.Key, item.Value)
	}
}

The lesson: always convert numbers from log parsing to actual numeric types before comparing. strconv.Atoi already gave us an integer. Use it as an integer. Don’t convert it back to a string.

Step 5: Response Time Analysis

Linux Commands

If your nginx log format includes $request_time (ours does β€” it’s the last field), you can get basic stats:

Average response time:

awk '{sum+=$NF; n++} END {printf "Average: %.3fs\n", sum/n}' /tmp/access.log

$NF is the last field on each line. In our log format, that’s the response time.

Slowest 10 requests:

sort -t' ' -k14 -rn /tmp/access.log | head -10 | awk '{print $NF, $7}'

Expected output (sample):

4.892 /api/reports/generate
4.231 /api/export/csv
3.876 /api/reports/generate
3.544 /api/export/csv
2.998 /api/reports/generate

These commands tell you what’s slow, but they can’t calculate percentiles. Percentiles matter because averages hide problems. If 95% of requests are 50ms and 5% are 10 seconds, the average is ~550ms β€” which tells you almost nothing.

Go Code

We’ll calculate p50, p95, and p99 percentiles, plus per-endpoint breakdowns.

main.go

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"regexp"
	"sort"
	"strconv"
	"time"
)

type LogEntry struct {
	IP           string
	Timestamp    time.Time
	Method       string
	Path         string
	Status       int
	BytesSent    int
	ResponseTime float64
}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("Usage: go run main.go <logfile>")
	}

	entries := parseLog(os.Args[1])

	// Collect all response times
	var times []float64
	for _, e := range entries {
		times = append(times, e.ResponseTime)
	}

	// BUG: we calculate percentiles without sorting
	// We assume the data is already in order β€” it is not
	fmt.Println("[Response Time Analysis]")
	avg := 0.0
	for _, t := range times {
		avg += t
	}
	avg /= float64(len(times))

	p50 := times[len(times)/2]
	p95 := times[int(float64(len(times))*0.95)]
	p99 := times[int(float64(len(times))*0.99)]

	fmt.Printf("  Average: %.3fs\n", avg)
	fmt.Printf("  p50:     %.3fs\n", p50)
	fmt.Printf("  p95:     %.3fs\n", p95)
	fmt.Printf("  p99:     %.3fs\n", p99)

	// Top 10 slowest endpoints (by average response time)
	fmt.Println("\n[Slowest Endpoints]")
	endpointTimes := make(map[string][]float64)
	for _, e := range entries {
		endpointTimes[e.Path] = append(endpointTimes[e.Path], e.ResponseTime)
	}

	type endpointStat struct {
		Path    string
		AvgTime float64
		Count   int
	}
	var stats []endpointStat
	for path, rts := range endpointTimes {
		sum := 0.0
		for _, t := range rts {
			sum += t
		}
		stats = append(stats, endpointStat{path, sum / float64(len(rts)), len(rts)})
	}
	sort.Slice(stats, func(i, j int) bool { return stats[i].AvgTime > stats[j].AvgTime })

	for i, s := range stats {
		if i >= 10 {
			break
		}
		fmt.Printf("  %-35s avg %.3fs  (%d requests)\n", s.Path, s.AvgTime, s.Count)
	}
}

func parseLog(filename string) []LogEntry {
	file, err := os.Open(filename)
	if err != nil {
		log.Fatalf("Cannot open file: %v", err)
	}
	defer file.Close()

	pattern := `^(\S+) - \S+ \[(.+?)\] "(\S+) (\S+) \S+" (\d{3}) (\d+) ".+?" ".+?" (\S+)$`
	re := regexp.MustCompile(pattern)

	var entries []LogEntry
	skipped := 0
	total := 0
	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		line := scanner.Text()
		total++
		match := re.FindStringSubmatch(line)
		if match == nil {
			skipped++
			continue
		}
		ts, _ := time.Parse("02/Jan/2006:15:04:05 -0700", match[2])
		status, _ := strconv.Atoi(match[5])
		bytes, _ := strconv.Atoi(match[6])
		rt, _ := strconv.ParseFloat(match[7], 64)

		entries = append(entries, LogEntry{
			IP: match[1], Timestamp: ts, Method: match[3],
			Path: match[4], Status: status, BytesSent: bytes, ResponseTime: rt,
		})
	}
	fmt.Printf("Parsed: %d/%d lines (%d malformed, skipped)\n\n", len(entries), total, skipped)
	return entries
}

Run it:

cd /tmp/logparser && go run main.go /tmp/access.log

Expected output:

Parsed: 1000/1003 lines (3 malformed, skipped)

[Response Time Analysis]
  Average: 0.583s
  p50:     0.023s
  p95:     0.023s
  p99:     0.201s

[Slowest Endpoints]
  /api/reports/generate               avg 3.412s  (68 requests)
  /api/export/csv                     avg 2.734s  (72 requests)
  /api/search                         avg 1.489s  (65 requests)

The p50 and p95 values look wrong. They’re almost the same, and p95 is lower than the average. That makes no sense.

The bug: percentiles require sorted data. The index len(times)/2 gives you the median only if the data is sorted. We’re reading log lines in chronological order, not by response time. So times[len(times)/2] is just the response time of whatever request happened to be in the middle of the file.

Fix: Sort Before Calculating Percentiles

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"regexp"
	"sort"
	"strconv"
	"time"
)

type LogEntry struct {
	IP           string
	Timestamp    time.Time
	Method       string
	Path         string
	Status       int
	BytesSent    int
	ResponseTime float64
}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("Usage: go run main.go <logfile>")
	}

	entries := parseLog(os.Args[1])

	var times []float64
	for _, e := range entries {
		times = append(times, e.ResponseTime)
	}

	// FIX: sort the response times before calculating percentiles
	sort.Float64s(times)

	fmt.Println("[Response Time Analysis]")
	avg := 0.0
	for _, t := range times {
		avg += t
	}
	avg /= float64(len(times))

	p50 := times[len(times)/2]
	p95 := times[int(float64(len(times))*0.95)]
	p99 := times[int(float64(len(times))*0.99)]

	fmt.Printf("  Average: %.3fs\n", avg)
	fmt.Printf("  p50:     %.3fs\n", p50)
	fmt.Printf("  p95:     %.3fs\n", p95)
	fmt.Printf("  p99:     %.3fs\n", p99)

	// Top 10 slowest endpoints
	fmt.Println("\n[Slowest Endpoints]")
	endpointTimes := make(map[string][]float64)
	for _, e := range entries {
		endpointTimes[e.Path] = append(endpointTimes[e.Path], e.ResponseTime)
	}

	type endpointStat struct {
		Path    string
		AvgTime float64
		Count   int
	}
	var stats []endpointStat
	for path, rts := range endpointTimes {
		sum := 0.0
		for _, t := range rts {
			sum += t
		}
		stats = append(stats, endpointStat{path, sum / float64(len(rts)), len(rts)})
	}
	sort.Slice(stats, func(i, j int) bool { return stats[i].AvgTime > stats[j].AvgTime })

	for i, s := range stats {
		if i >= 10 {
			break
		}
		fmt.Printf("  %-35s avg %.3fs  (%d requests)\n", s.Path, s.AvgTime, s.Count)
	}
}

func parseLog(filename string) []LogEntry {
	file, err := os.Open(filename)
	if err != nil {
		log.Fatalf("Cannot open file: %v", err)
	}
	defer file.Close()

	pattern := `^(\S+) - \S+ \[(.+?)\] "(\S+) (\S+) \S+" (\d{3}) (\d+) ".+?" ".+?" (\S+)$`
	re := regexp.MustCompile(pattern)

	var entries []LogEntry
	skipped := 0
	total := 0
	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		line := scanner.Text()
		total++
		match := re.FindStringSubmatch(line)
		if match == nil {
			skipped++
			continue
		}
		ts, _ := time.Parse("02/Jan/2006:15:04:05 -0700", match[2])
		status, _ := strconv.Atoi(match[5])
		bytes, _ := strconv.Atoi(match[6])
		rt, _ := strconv.ParseFloat(match[7], 64)

		entries = append(entries, LogEntry{
			IP: match[1], Timestamp: ts, Method: match[3],
			Path: match[4], Status: status, BytesSent: bytes, ResponseTime: rt,
		})
	}
	fmt.Printf("Parsed: %d/%d lines (%d malformed, skipped)\n\n", len(entries), total, skipped)
	return entries
}

Run it:

cd /tmp/logparser && go run main.go /tmp/access.log

Expected output:

Parsed: 1000/1003 lines (3 malformed, skipped)

[Response Time Analysis]
  Average: 0.583s
  p50:     0.145s
  p95:     2.876s
  p99:     4.231s

[Slowest Endpoints]
  /api/reports/generate               avg 3.412s  (68 requests)
  /api/export/csv                     avg 2.734s  (72 requests)
  /api/search                         avg 1.489s  (65 requests)
  /api/orders/process                 avg 0.267s  (64 requests)
  /api/auth/login                     avg 0.251s  (70 requests)
  /api/users                          avg 0.243s  (67 requests)
  /api/orders                         avg 0.238s  (69 requests)
  /api/auth/refresh                   avg 0.235s  (66 requests)
  /api/users/9999                     avg 0.228s  (63 requests)
  /api/health                         avg 0.221s  (71 requests)

Now the numbers make sense. p50 is less than the average (most requests are fast, a few are slow, pulling the average up). p95 and p99 show the long tail β€” 5% of requests take nearly 3 seconds, and 1% take over 4 seconds.

The one-line fix was sort.Float64s(times). Without it, every percentile calculation is wrong.

Step 6: Build a Log Analysis Dashboard

Now we combine everything into a single tool that prints a complete analysis report. It uses ANSI colors to highlight problems: green for success, yellow for warnings, red for errors.

main.go

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"regexp"
	"sort"
	"strconv"
	"strings"
	"time"
)

// ANSI color codes
const (
	colorReset  = "\033[0m"
	colorRed    = "\033[31m"
	colorGreen  = "\033[32m"
	colorYellow = "\033[33m"
	colorCyan   = "\033[36m"
	colorBold   = "\033[1m"
)

type LogEntry struct {
	IP           string
	Timestamp    time.Time
	Method       string
	Path         string
	Status       int
	BytesSent    int
	ResponseTime float64
}

func main() {
	if len(os.Args) < 2 {
		fmt.Println("Usage: go run main.go <logfile>")
		os.Exit(1)
	}

	filename := os.Args[1]
	entries := parseLog(filename)
	if len(entries) == 0 {
		fmt.Println("No valid log entries found.")
		os.Exit(1)
	}

	fmt.Printf("\n%sNginx Log Analysis%s β€” %s\n", colorBold, colorReset, filename)
	fmt.Println(strings.Repeat("=", 55))

	printSummary(entries)
	printStatusCodes(entries)
	printErrors(entries)
	printPerformance(entries)
	printSlowEndpoints(entries)
	printTopIPs(entries)
	printBandwidth(entries)
}

// --- Parsing ---

func parseLog(filename string) []LogEntry {
	file, err := os.Open(filename)
	if err != nil {
		log.Fatalf("Cannot open file: %v", err)
	}
	defer file.Close()

	pattern := `^(\S+) - \S+ \[(.+?)\] "(\S+) (\S+) \S+" (\d{3}) (\d+) ".+?" ".+?" (\S+)$`
	re := regexp.MustCompile(pattern)

	var entries []LogEntry
	skipped := 0
	total := 0
	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		line := scanner.Text()
		total++
		match := re.FindStringSubmatch(line)
		if match == nil {
			skipped++
			continue
		}
		ts, _ := time.Parse("02/Jan/2006:15:04:05 -0700", match[2])
		status, _ := strconv.Atoi(match[5])
		bytes, _ := strconv.Atoi(match[6])
		rt, _ := strconv.ParseFloat(match[7], 64)

		entries = append(entries, LogEntry{
			IP: match[1], Timestamp: ts, Method: match[3],
			Path: match[4], Status: status, BytesSent: bytes, ResponseTime: rt,
		})
	}

	fmt.Printf("Parsed: %d/%d lines (%d malformed, skipped)\n", len(entries), total, skipped)
	return entries
}

// --- Summary ---

func printSummary(entries []LogEntry) {
	fmt.Printf("\n%s[Summary]%s\n", colorCyan, colorReset)

	minTime := entries[0].Timestamp
	maxTime := entries[0].Timestamp
	for _, e := range entries {
		if e.Timestamp.Before(minTime) {
			minTime = e.Timestamp
		}
		if e.Timestamp.After(maxTime) {
			maxTime = e.Timestamp
		}
	}

	duration := maxTime.Sub(minTime).Seconds()
	rps := 0.0
	if duration > 0 {
		rps = float64(len(entries)) / duration
	}

	fmt.Printf("  Total requests: %s%d%s\n", colorBold, len(entries), colorReset)
	fmt.Printf("  Time range: %s β€” %s\n",
		minTime.Format("02/Jan/2006 15:04"), maxTime.Format("02/Jan/2006 15:04"))
	fmt.Printf("  Requests/sec: %.1f\n", rps)
}

// --- Status Codes ---

func printStatusCodes(entries []LogEntry) {
	fmt.Printf("\n%s[Status Codes]%s\n", colorCyan, colorReset)

	counts := make(map[int]int)
	for _, e := range entries {
		counts[e.Status]++
	}

	// Sort status codes
	var codes []int
	for code := range counts {
		codes = append(codes, code)
	}
	sort.Ints(codes)

	total := len(entries)
	maxCount := 0
	for _, c := range counts {
		if c > maxCount {
			maxCount = c
		}
	}

	for _, code := range codes {
		count := counts[code]
		pct := float64(count) / float64(total) * 100
		barLen := int(float64(count) / float64(maxCount) * 40)
		bar := strings.Repeat("#", barLen)

		color := colorGreen
		if code >= 300 && code < 400 {
			color = colorYellow
		} else if code >= 400 && code < 500 {
			color = colorYellow
		} else if code >= 500 {
			color = colorRed
		}

		fmt.Printf("  %s%d%s  %-40s  %d (%.1f%%)\n", color, code, colorReset, bar, count, pct)
	}
}

// --- Errors ---

func printErrors(entries []LogEntry) {
	fmt.Printf("\n%s[Errors]%s\n", colorCyan, colorReset)

	type errorKey struct {
		Path   string
		Status int
	}
	errorCounts := make(map[errorKey]int)

	for _, e := range entries {
		if e.Status >= 400 {
			errorCounts[errorKey{e.Path, e.Status}]++
		}
	}

	type errorEntry struct {
		Path   string
		Status int
		Count  int
	}
	var sorted []errorEntry
	for k, v := range errorCounts {
		sorted = append(sorted, errorEntry{k.Path, k.Status, v})
	}
	sort.Slice(sorted, func(i, j int) bool { return sorted[i].Count > sorted[j].Count })

	for i, e := range sorted {
		if i >= 5 {
			break
		}
		color := colorYellow
		if e.Status >= 500 {
			color = colorRed
		}
		fmt.Printf("  %-30s %s%d%s   %d hits\n", e.Path, color, e.Status, colorReset, e.Count)
	}
}

// --- Performance ---

func printPerformance(entries []LogEntry) {
	fmt.Printf("\n%s[Performance]%s\n", colorCyan, colorReset)

	var times []float64
	for _, e := range entries {
		times = append(times, e.ResponseTime)
	}

	sort.Float64s(times)

	avg := 0.0
	for _, t := range times {
		avg += t
	}
	avg /= float64(len(times))

	p50 := times[len(times)/2]
	p95 := times[int(float64(len(times))*0.95)]
	p99 := times[int(float64(len(times))*0.99)]

	p95color := colorGreen
	if p95 > 1.0 {
		p95color = colorYellow
	}
	if p95 > 5.0 {
		p95color = colorRed
	}

	p99color := colorGreen
	if p99 > 2.0 {
		p99color = colorYellow
	}
	if p99 > 10.0 {
		p99color = colorRed
	}

	fmt.Printf("  Average: %.3fs | p50: %.3fs | %sp95: %.3fs%s | %sp99: %.3fs%s\n",
		avg, p50, p95color, p95, colorReset, p99color, p99, colorReset)
}

// --- Slow Endpoints ---

func printSlowEndpoints(entries []LogEntry) {
	fmt.Printf("\n%s[Slow Endpoints]%s\n", colorCyan, colorReset)

	endpointTimes := make(map[string][]float64)
	for _, e := range entries {
		endpointTimes[e.Path] = append(endpointTimes[e.Path], e.ResponseTime)
	}

	type stat struct {
		Path    string
		AvgTime float64
		Count   int
	}
	var stats []stat
	for path, rts := range endpointTimes {
		sum := 0.0
		for _, t := range rts {
			sum += t
		}
		stats = append(stats, stat{path, sum / float64(len(rts)), len(rts)})
	}
	sort.Slice(stats, func(i, j int) bool { return stats[i].AvgTime > stats[j].AvgTime })

	for i, s := range stats {
		if i >= 5 {
			break
		}
		color := colorGreen
		if s.AvgTime > 1.0 {
			color = colorYellow
		}
		if s.AvgTime > 3.0 {
			color = colorRed
		}
		fmt.Printf("  %-30s %savg %.2fs%s  (%d requests)\n", s.Path, color, s.AvgTime, colorReset, s.Count)
	}
}

// --- Top IPs ---

func printTopIPs(entries []LogEntry) {
	fmt.Printf("\n%s[Top IPs]%s\n", colorCyan, colorReset)

	ipCounts := make(map[string]int)
	for _, e := range entries {
		ipCounts[e.IP]++
	}

	type ipStat struct {
		IP    string
		Count int
	}
	var sorted []ipStat
	for ip, count := range ipCounts {
		sorted = append(sorted, ipStat{ip, count})
	}
	sort.Slice(sorted, func(i, j int) bool { return sorted[i].Count > sorted[j].Count })

	for i, s := range sorted {
		if i >= 10 {
			break
		}
		fmt.Printf("  %-18s %d requests\n", s.IP, s.Count)
	}
}

// --- Bandwidth ---

func printBandwidth(entries []LogEntry) {
	fmt.Printf("\n%s[Bandwidth]%s\n", colorCyan, colorReset)

	totalBytes := 0
	for _, e := range entries {
		totalBytes += e.BytesSent
	}

	avgBytes := totalBytes / len(entries)

	if totalBytes > 1024*1024*1024 {
		fmt.Printf("  Total: %.2f GB\n", float64(totalBytes)/(1024*1024*1024))
	} else if totalBytes > 1024*1024 {
		fmt.Printf("  Total: %.2f MB\n", float64(totalBytes)/(1024*1024))
	} else {
		fmt.Printf("  Total: %.2f KB\n", float64(totalBytes)/1024)
	}
	fmt.Printf("  Average per request: %d bytes\n", avgBytes)
	fmt.Println()
}

Run it:

cd /tmp/logparser && go run main.go /tmp/access.log

Expected output:

Parsed: 1000/1003 lines (3 malformed, skipped)

Nginx Log Analysis β€” /tmp/access.log
=======================================================

[Summary]
  Total requests: 1000
  Time range: 15/Feb/2026 14:00 β€” 15/Feb/2026 14:59
  Requests/sec: 16.7

[Status Codes]
  200  ########################################  650 (65.0%)
  301  #####                                      50 (5.0%)
  304  ####                                       33 (3.3%)
  404  ########                                  100 (10.0%)
  500  ######                                     80 (8.0%)
  502  ####                                       47 (4.7%)

[Errors]
  /api/users/9999              404   34 hits
  /api/orders/process          500   28 hits
  /api/auth/refresh            502   18 hits
  /api/search                  500   12 hits
  /static/old-file.js          404    8 hits

[Performance]
  Average: 0.583s | p50: 0.145s | p95: 2.876s | p99: 4.231s

[Slow Endpoints]
  /api/reports/generate        avg 3.41s  (68 requests)
  /api/export/csv              avg 2.73s  (72 requests)
  /api/search                  avg 1.49s  (65 requests)
  /api/orders/process          avg 0.27s  (64 requests)
  /api/auth/login              avg 0.25s  (70 requests)

[Top IPs]
  10.0.1.50          134 requests
  10.0.1.51          121 requests
  10.0.1.52          108 requests
  93.184.216.34       97 requests
  172.16.0.10         89 requests
  192.168.1.100       82 requests
  203.0.113.15        78 requests
  198.51.100.22       73 requests
  10.0.2.80           67 requests
  10.0.3.90           51 requests

[Bandwidth]
  Total: 23.84 MB
  Average per request: 25012 bytes

Your numbers will vary because the sample log file is randomly generated. The structure and format will be the same.

In about 200 lines of Go, we have a tool that replaces a dozen separate grep/awk commands with a single report. You can extend this further β€” add time-based grouping (requests per minute), add JSON output for piping to other tools, or add tail-follow mode for real-time monitoring.

What We Built

Here is what we covered, step by step:

  1. Nginx log format β€” understand each field, add $request_time for performance data
  2. grep / awk β€” quick incident analysis in the first five minutes
  3. Go regex parser β€” structured log entries with malformed line handling
  4. Status code analysis β€” error rates and distributions using integer comparison
  5. Response time analysis β€” percentiles with properly sorted data
  6. Combined dashboard β€” a Go tool that prints a complete analysis report with colors

Each step had a trap:

  • Step 3: regex returns nil on malformed lines β€” always check before accessing groups
  • Step 4: string comparison of status codes works by accident β€” always use integers
  • Step 5: percentiles on unsorted data give wrong results β€” always sort first

Cheat Sheet

Quick nginx log analysis from the command line:

wc -l access.log                                  # total requests
grep '" 500 ' access.log | wc -l                  # 500 errors
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head  # top URLs
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head  # top IPs
awk '{print $9}' access.log | sort | uniq -c | sort -rn         # status distribution

Go patterns for log parsing:

// Always check regex match before accessing groups
match := re.FindStringSubmatch(line)
if match == nil { skipped++; continue }

// Parse numbers from logs as actual numbers
status, _ := strconv.Atoi(match[5])

// Sort before calculating percentiles
sort.Float64s(times)
p95 := times[int(float64(len(times))*0.95)]

Key rules:

  • First five minutes of an incident: grep for errors, count by status, find top IPs
  • Always parse status codes as integers, not strings
  • Always sort data before calculating percentiles
  • Real log files have malformed lines β€” always handle parse failures
  • Add $request_time to your nginx log format β€” without it, you cannot debug performance
  • JSON log format (escape=json) is better for production β€” easier to parse, no regex needed

References and Further Reading

Keep Reading

Question

What's your go-to log analysis trick during an incident? The one-liner that gives you the answer fastest?

Contents