/user/kayd @ devops :~$ cat how-to-scale-database-in-microservices.md

Database Scaling: From 100K to 5M Users Database Scaling: From 100K to 5M Users

Karandeep Singh

Jan 9, 2024 • 13 minutes

Summary

A worked example of scaling a database from a single Postgres instance to a sharded architecture. Start simple, hit realistic bottlenecks (connection limits, replica lag, query timeouts), and fix each one. The numbers are illustrative; the progression is the real lesson.

Scaling a database from a hundred thousand users to a few million surfaces a predictable sequence of problems. Start with a single Postgres instance and grow, and you hit the same walls in the same order. This article walks a hypothetical app through that path — every bottleneck and the fix for each one.

To be clear up front: the app and its metrics in this walkthrough are illustrative, chosen to be realistic, not measurements from a system I ran. What is real is the failure sequence itself. I run Postgres-compatible databases (Aurora) in production, and the two problems this progression starts with — connection limits and pooling — are the ones I actually deal with. The progression: single database → connection pool → read replica → caching → sharding. Each step shows the problem that forces a change, the signals that point to it, and the fix that buys the next stretch of growth.

Computer motherboard close-up representing scalable database architecture

Starting Point: Single Postgres Database

At the starting point, the architecture is simple:

┌──────────────┐
│   API Server │ (10 instances, auto-scaled)
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  Postgres DB │ (1 instance, db.m5.large)
│  - 2 vCPU    │
│  - 8GB RAM   │
│  - 100GB SSD │
└──────────────┘

Metrics at 100K users:

Average response time: 180ms
95th percentile: 450ms
Database CPU: 35%
Connection count: 150 (max 200)
Queries per second: ~800

This works fine — until a successful marketing campaign drives a surge in traffic.

Step 1: The First Bottleneck (Connection Limits)

Two weeks after the campaign, API requests started failing:

Error: could not connect to server: remaining connection slots are reserved

Check the database:

SELECT count(*) FROM pg_stat_activity WHERE state = 'active';

Result: 198 connections (max is 200).

The API auto-scaled from 10 to 25 instances. Each instance opened 20 connections to the database on startup. 25 × 20 = 500 connections. But Postgres on db.m5.large only allows 200.

Metrics at this point:

Users: 150K
API instances: 25
Database connections: 198/200 (99%)
Query failures: 12% of requests

Why it happens: The API servers scaled but the database did not. More servers = more connections — until it hits the wall.

Step 2: First Fix — Connection Pooling

The simple fix: add PgBouncer, a connection pooler that sits between the API and the database. It maintains a small pool of database connections and reuses them across many API connections.

┌──────────────┐
│   API Server │ (25 instances)
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  PgBouncer   │ (connection pooler)
│  max_client  │ = 5000
│  pool_size   │ = 100
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  Postgres DB │
└──────────────┘

PgBouncer configuration:

[databases]
myapp = host=postgres.internal port=5432 dbname=myapp

[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 100

pool_mode = transaction means PgBouncer returns the connection to the pool after each transaction, not after the client disconnects. This is key. One database connection can serve many API requests.

This step I can vouch for personally: I run pgbouncer in production, baked into the machine image so every instance comes up with pooling already in place. Connection limits are one of the first database walls you hit, and a pooler is the cheapest fix you will ever deploy. On managed Postgres, also check the parameter group — max_connections and friends live there, and tuning them is part of the same job.

Deploy PgBouncer. Update the API connection string from postgres://postgres.internal:5432/myapp to postgres://pgbouncer.internal:6432/myapp.

Results after deploying PgBouncer:

Database connections: 100 (from 198)
API instances: 25 (unchanged)
Connection errors: 0%
Response time: 175ms (slightly better)

This fixes the connection problem. But usage keeps growing.

Step 3: The Second Bottleneck (Database CPU)

Two months later, response times started degrading:

Metrics:

Users: 300K
Average response time: 850ms (was 180ms)
95th percentile: 2.3s
Database CPU: 92%
Queries per second: ~3,200

Check the slow query log:

SELECT * FROM users WHERE email = $1;  -- 450ms
SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20;  -- 680ms

These queries used to take 5ms. Now they’re 100x slower. Why?

Check CPU usage:

htop

Both CPUs pegged at 100%. The database is CPU-bound. Every query waits for CPU time.

Why it happened: Read queries scale with users. More users = more reads. The db.m5.large (2 vCPU) can’t handle 3,200 queries/sec.

Step 4: Second Fix — Read Replica

Vertical scaling (bigger instance) is the easy fix. Moving to db.m5.xlarge (4 vCPU, 16GB RAM) helps for a few weeks. Then CPU hits 90% again. Vertical scaling has limits.

The real fix: horizontal scaling with a read replica. Most queries are reads (around 95% of traffic in a read-heavy app). Reads can be served from a replica. Writes still go to the primary.

                     ┌──────────────┐
                ┌───▶│ Read Replica │◀─── Read queries (95%)
                │    └──────────────┘
┌──────────────┐│
│  PgBouncer   ├┤
└──────────────┘│
                │    ┌──────────────┐
                └───▶│  Primary DB  │◀─── Write queries (5%)
                     └──────┬───────┘
                            │
                            │ (replication)
                            ▼
                     ┌──────────────┐
                     │ Read Replica │
                     └──────────────┘

Create the read replica (AWS RDS):

aws rds create-db-instance-read-replica \
  --db-instance-identifier myapp-read-replica-1 \
  --source-db-instance-identifier myapp-primary \
  --db-instance-class db.m5.large \
  --publicly-accessible false

On Aurora, the managed version of this is even simpler: the cluster gives you a reader endpoint that load-balances across replicas, so the application only needs to know two connection strings — writer and reader.

Update the API to split reads and writes. A simple wrapper:

package db

import (
	"database/sql"
)

type DB struct {
	primary  *sql.DB
	replicas []*sql.DB
	current  int
}

func (d *DB) Write() *sql.DB {
	return d.primary
}

func (d *DB) Read() *sql.DB {
	// Round-robin across replicas
	replica := d.replicas[d.current%len(d.replicas)]
	d.current++
	return replica
}

Usage in API:

// Write
_, err := db.Write().Exec("INSERT INTO users ...")

// Read
rows, err := db.Read().Query("SELECT * FROM users WHERE ...")

Results after adding read replica:

Primary CPU: 18% (was 92%)
Replica CPU: 45%
Average response time: 210ms (was 850ms)
95th percentile: 480ms (was 2.3s)

The primary now only handles writes (5% of traffic). The replica handles all reads. Adding a second replica splits read traffic 50/50.

Step 5: The Third Bottleneck (Replica Lag)

With read replicas deployed, a new class of bug report appears: “I just changed my password but the app still shows the old one.”

Check replication lag:

-- On replica
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;

Result: 8 seconds.

The replica is 8 seconds behind the primary. A user changes their password (write goes to primary), then immediately requests their profile (read goes to replica). The replica doesn’t have the new password yet. Stale read.

Why it happened: High write volume + network latency. Postgres replicates asynchronously by default. The replica applies changes as fast as it can, but it falls behind during peak traffic.

Step 6: Third Fix — Read-After-Write Consistency

You can’t force the replica to sync faster (async replication is the only option for performance). Instead, fix it at the application level: after a write, read from the primary for the next N seconds.

package db

import (
	"context"
	"database/sql"
	"sync"
	"time"
)

type DB struct {
	primary  *sql.DB
	replicas []*sql.DB
	current  int

	// Track recent writes per user
	recentWrites map[string]time.Time
	mu           sync.RWMutex
}

const stalenessTolerance = 10 * time.Second

func (d *DB) Write(ctx context.Context, userID string) *sql.DB {
	d.mu.Lock()
	d.recentWrites[userID] = time.Now()
	d.mu.Unlock()
	return d.primary
}

func (d *DB) Read(ctx context.Context, userID string) *sql.DB {
	d.mu.RLock()
	writeTime, exists := d.recentWrites[userID]
	d.mu.RUnlock()

	if exists && time.Since(writeTime) < stalenessTolerance {
		// Read from primary if user wrote recently
		return d.primary
	}

	// Otherwise read from replica
	replica := d.replicas[d.current%len(d.replicas)]
	d.current++
	return replica
}

After a user writes, their reads go to the primary for 10 seconds. After 10 seconds, they go back to the replica. This guarantees consistency where it matters.

Results:

Stale read reports: 0 (was 5-10 per day)
Primary CPU increased slightly: 22% (from 18%)
Replica CPU: 40% each

The trade-off: primary handles slightly more read traffic (only recent-write users). But consistency is guaranteed.

Step 7: The Fourth Bottleneck (Query Performance)

At 800K users, queries started slowing down again despite CPU being fine:

SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20;

Execution time: 2.1 seconds.

Check the query plan:

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20;

Limit  (cost=45231.23..45231.28 rows=20 width=180) (actual time=2087.231..2087.245 rows=20 loops=1)
  ->  Sort  (cost=45231.23..45356.89 rows=50265 width=180) (actual time=2087.229..2087.236 rows=20 loops=1)
        Sort Key: created_at DESC
        Sort Method: top-N heapsort  Memory: 29kB
        ->  Seq Scan on orders  (cost=0.00..43980.12 rows=50265 width=180) (actual time=0.034..1843.421 rows=48392 loops=1)
              Filter: (user_id = $1)
              Rows Removed by Filter: 4738261
Planning Time: 0.215 ms
Execution Time: 2087.291 ms

Seq Scan = sequential scan. Postgres read 4.7 million rows to find 48K matching rows. No index on user_id.

Add the index:

CREATE INDEX CONCURRENTLY idx_orders_user_id_created_at ON orders (user_id, created_at DESC);

CONCURRENTLY means it builds the index without locking the table. Takes longer but doesn’t block writes.

Check the query plan again:

Limit  (cost=0.43..45.67 rows=20 width=180) (actual time=0.123..0.234 rows=20 loops=1)
  ->  Index Scan using idx_orders_user_id_created_at on orders  (cost=0.43..113456.78 rows=50265 width=180) (actual time=0.121..0.226 rows=20 loops=1)
        Index Cond: (user_id = $1)
Planning Time: 0.089 ms
Execution Time: 0.267 ms

Index Scan now. Execution time: 0.27ms (was 2087ms). 7,700x faster.

Lesson: Indexes matter more than hardware. Fix your queries before throwing bigger servers at the problem.

Step 8: The Fifth Bottleneck (Data Size and Cost)

At 2M users, a cost problem appears. Database storage has grown to 800GB. AWS RDS charges $0.115/GB/month for SSD. That’s $92/month for storage alone. Plus the primary and 2 replicas = 3 × 800GB = 2.4TB = $276/month just for storage.

And the backups. RDS gives you automated backup storage equal to your DB size for free; only backup storage beyond that is billed (around $0.095/GB-month). With a 7-day retention window the extra storage stays modest, but it grows with your data.

Total database cost: ~$1,800/month and growing.

Check what’s using space:

SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

 schemaname |    tablename    |  size
------------+-----------------+---------
 public     | events          | 520 GB
 public     | logs            | 180 GB
 public     | orders          | 65 GB
 public     | users           | 35 GB

The events and logs tables are analytics data. They’re queried for reports, but they don’t need to be in the primary database.

Step 9: Fifth Fix — Offload Analytics to Data Warehouse

Move events and logs to a separate data warehouse. AWS Redshift (a columnar database optimized for analytics) is a good fit.

Set up a daily sync:

# Export from Postgres
pg_dump --table events --table logs --data-only myapp > /tmp/analytics.sql

# Import to Redshift
psql -h analytics.redshift.amazonaws.com -d analytics -f /tmp/analytics.sql

This can be automated with AWS Data Pipeline.

Delete old data from Postgres:

DELETE FROM events WHERE created_at < NOW() - INTERVAL '90 days';
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days';

Results:

Database size: 300GB (from 800GB)
Monthly cost: $800 (from $1,800)
Query performance: unchanged (analytics queries now go to Redshift)

Analytics queries got faster too because Redshift is optimized for them. Win-win.

Step 10: The Final Bottleneck (Write Throughput)

At 4M users, writes started queueing:

Metrics:

Users: 4M
Write queries/sec: 1,200
Primary CPU: 85%
Disk I/O: 8,000 IOPS (max 10,000)
Write latency: 950ms

The primary can’t keep up. Queries are optimized. Reads are offloaded. But writes have to go to one place: the primary.

Options:

Vertical scaling: Bigger instance (more CPU, more IOPS)
Sharding: Split data across multiple databases

Vertical scaling buys some time with db.m5.4xlarge (16 vCPU, 64GB RAM, 20,000 IOPS). Then the same wall returns. Time to shard.

Step 11: Final Fix — Sharding

Sharding means splitting data across multiple databases. Each database (shard) holds a subset of users.

API determines user's shard:
  shard_id = user_id % num_shards

┌─────────────┐
│  API Server │
└──────┬──────┘
       │
       ├───▶ Shard 0 (user_id % 4 == 0)
       ├───▶ Shard 1 (user_id % 4 == 1)
       ├───▶ Shard 2 (user_id % 4 == 2)
       └───▶ Shard 3 (user_id % 4 == 3)

Start with 4 shards. Each shard is a full Postgres primary + 2 read replicas.

Shard routing logic:

package db

import (
	"database/sql"
	"fmt"
)

type ShardedDB struct {
	shards []*DB
}

func (s *ShardedDB) GetShard(userID int64) *DB {
	shardID := userID % int64(len(s.shards))
	return s.shards[shardID]
}

func (s *ShardedDB) Write(userID int64) *sql.DB {
	return s.GetShard(userID).Write()
}

func (s *ShardedDB) Read(userID int64) *sql.DB {
	return s.GetShard(userID).Read()
}

Usage:

shard := db.GetShard(userID)
_, err := shard.Write().Exec("INSERT INTO orders ...")

Migration process:

Create 4 empty databases (shards)
Write a migration script that reads from the old database and writes to the correct shard
Run the migration during a low-traffic window (several hours at a few million users)
Update the API to use sharded routing
Monitor for 1 week, then decommission the old database

Results after sharding:

Primary CPU per shard: 22% (was 85% on single database)
Write latency: 180ms (was 950ms)
Write throughput: 4,800 queries/sec total (was 1,200)
Can scale to 16M users before needing more shards

The Full Progression

Across the journey, the database architecture evolves through several major changes:

Single database → Connection errors at 150K users
Added PgBouncer → Fixed connections, hit CPU limits at 300K users
Added read replicas → Fixed CPU, encountered replica lag
Read-after-write consistency → Fixed stale reads, hit query performance issues
Added indexes → Fixed queries, hit storage costs at 2M users
Offloaded analytics → Reduced costs, hit write throughput at 4M users
Sharding → Now supports 5M users, can scale to 16M+

Each change buys the next stretch of growth. In this example, total cost roughly doubles from the read-replica stage to the sharded stage — while capacity grows far more than that. The pattern to notice: cost grows in steps, and each step should be forced by a measured bottleneck, not by anticipation.

Key Metrics Summary

Example figures for the hypothetical app, to show the shape of the progression:

Stage	Users	Architecture	Avg Response	P95 Response	DB Cost/Month
Start	100K	Single DB	180ms	450ms	$400
Connection Pool	150K	Single DB + PgBouncer	175ms	440ms	$450
Read Replica	500K	Primary + 2 Replicas	210ms	480ms	$1,200
Analytics Offload	2M	Primary + 2 Replicas + Redshift	195ms	420ms	$1,100
Sharding	5M	4 Shards (12 DBs total) + Redshift	185ms	395ms	$4,200

Response times actually improve across the progression because each bottleneck is fixed at the stage it appears.

Lessons Learned

1. Don’t shard too early. Sharding could begin at 300K users, but it isn’t needed that early. Vertical scaling + replicas + indexes carry the system to 4M users before sharding becomes necessary.

2. Measure before fixing. Every change was driven by metrics. CPU high? Add replicas. Writes slow? Add indexes. Costs high? Offload cold data.

3. Application-level fixes are cheaper than infrastructure. Connection pooling is nearly free and delays the next infrastructure step. Read-after-write consistency is a small application change that fixes stale reads. Sharding is a major project with a permanent infrastructure bill — do it last.

4. Indexes matter more than hardware. That 7,700x speedup from one index can delay the need for sharding by months.

5. Know your bottleneck. CPU? Add replicas. I/O? Bigger disks. Connections? Add pooling. Writes? Shard. Don’t guess. Profile.

Keep Reading

Task Automation: From Cron to a Go Task Runner — automate database maintenance tasks (backups, migrations, monitoring)
Nginx Log Analysis: From grep to a Go Log Parser — apply the same scaling patterns to log processing
Config Templating: From envsubst to Go — manage database connection strings across environments

More from system-design

No related category content found.

Docker Compose, Bake & ECR: Build and Ship Apps

Build a multi-container app with Docker Compose, then build images with Docker Bake and push them to …

Fix Kubernetes CrashLoopBackOff: Causes and Solutions

Kubernetes CrashLoopBackOff explained: a workflow to diagnose it and fix the six most common causes, …

Kubernetes Fundamentals: Pods, Deployments, Services

Learn Kubernetes fundamentals hands-on: deploy your first pod, understand Deployments and …