Build a multi-container app with Docker Compose, then build images with Docker Bake and push them to …
Database Scaling: From 100K to 5M Users Database Scaling: From 100K to 5M Users

Summary
Scaling a database from a hundred thousand users to a few million surfaces a predictable sequence of problems. Start with a single Postgres instance serving 100,000 users and grow toward 5 million on a sharded, multi-region architecture, and you hit the same walls in the same order. This walks through every scaling decision along that path, every bottleneck, and the fix for each one.
No theory. No “you might want to consider.” Just the actual progression: single database → connection pool → read replica → caching → sharding. Each step shows the problem that forces a change, the metrics that prove it, and the fix that buys the next 6 months of growth.

Starting Point: Single Postgres Database
At the starting point, the architecture is simple:
┌──────────────┐
│ API Server │ (10 instances, auto-scaled)
└──────┬───────┘
│
▼
┌──────────────┐
│ Postgres DB │ (1 instance, db.m5.large)
│ - 2 vCPU │
│ - 8GB RAM │
│ - 100GB SSD │
└──────────────┘
Metrics at 100K users:
- Average response time: 180ms
- 95th percentile: 450ms
- Database CPU: 35%
- Connection count: 150 (max 200)
- Queries per second: ~800
This works fine — until a successful marketing campaign drives a surge in traffic.
Expand your knowledge with Terraform From Scratch: Provision AWS Infrastructure
Step 1: The First Bottleneck (Connection Limits)
Two weeks after the campaign, API requests started failing:
Error: could not connect to server: remaining connection slots are reserved
Check the database:
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
Result: 198 connections (max is 200).
The API auto-scaled from 10 to 25 instances. Each instance opened 20 connections to the database on startup. 25 × 20 = 500 connections. But Postgres on db.m5.large only allows 200.
Metrics at this point:
- Users: 150K
- API instances: 25
- Database connections: 198/200 (99%)
- Query failures: 12% of requests
Why it happens: The API servers scaled but the database did not. More servers = more connections — until it hits the wall.
Deepen your understanding in Build and Deploy a Go Lambda Function
Step 2: First Fix — Connection Pooling
The simple fix: add PgBouncer, a connection pooler that sits between the API and the database. It maintains a small pool of database connections and reuses them across many API connections.
┌──────────────┐
│ API Server │ (25 instances)
└──────┬───────┘
│
▼
┌──────────────┐
│ PgBouncer │ (connection pooler)
│ max_client │ = 5000
│ pool_size │ = 100
└──────┬───────┘
│
▼
┌──────────────┐
│ Postgres DB │
└──────────────┘
PgBouncer configuration:
[databases]
myapp = host=postgres.internal port=5432 dbname=myapp
[pgbouncer]
pool_mode = transaction
max_client_conn = 5000
default_pool_size = 100
pool_mode = transaction means PgBouncer returns the connection to the pool after each transaction, not after the client disconnects. This is key. One database connection can serve thousands of API requests per second.
Deploy PgBouncer. Update the API connection string from postgres://postgres.internal:5432/myapp to postgres://pgbouncer.internal:6432/myapp.
Results after deploying PgBouncer:
- Database connections: 100 (from 198)
- API instances: 25 (unchanged)
- Connection errors: 0%
- Response time: 175ms (slightly better)
This fixes the connection problem. But usage keeps growing.
Explore this further in Build and Deploy a Go Lambda Function
Step 3: The Second Bottleneck (Database CPU)
Two months later, response times started degrading:
Metrics:
- Users: 300K
- Average response time: 850ms (was 180ms)
- 95th percentile: 2.3s
- Database CPU: 92%
- Queries per second: ~3,200
Check the slow query log:
SELECT * FROM users WHERE email = $1; -- 450ms
SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20; -- 680ms
These queries used to take 5ms. Now they’re 100x slower. Why?
Check CPU usage:
htop
Both CPUs pegged at 100%. The database is CPU-bound. Every query waits for CPU time.
Why it happened: Read queries scale with users. More users = more reads. The db.m5.large (2 vCPU) can’t handle 3,200 queries/sec.
Discover related concepts in CPU Monitoring: From Linux Commands to a Go Dashboard
Step 4: Second Fix — Read Replica
Vertical scaling (bigger instance) is the easy fix. Moving to db.m5.xlarge (4 vCPU, 16GB RAM) helps for a few weeks. Then CPU hits 90% again. Vertical scaling has limits.
The real fix: horizontal scaling with a read replica. Most queries are reads (around 95% of traffic in a read-heavy app). Reads can be served from a replica. Writes still go to the primary.
┌──────────────┐
┌───▶│ Read Replica │◀─── Read queries (95%)
│ └──────────────┘
┌──────────────┐│
│ PgBouncer ├┤
└──────────────┘│
│ ┌──────────────┐
└───▶│ Primary DB │◀─── Write queries (5%)
└──────┬───────┘
│
│ (replication)
▼
┌──────────────┐
│ Read Replica │
└──────────────┘
Create the read replica (AWS RDS):
aws rds create-db-instance-read-replica \
--db-instance-identifier myapp-read-replica-1 \
--source-db-instance-identifier myapp-primary \
--db-instance-class db.m5.large \
--publicly-accessible false
Update the API to split reads and writes. A simple wrapper:
package db
import (
"database/sql"
)
type DB struct {
primary *sql.DB
replicas []*sql.DB
current int
}
func (d *DB) Write() *sql.DB {
return d.primary
}
func (d *DB) Read() *sql.DB {
// Round-robin across replicas
replica := d.replicas[d.current%len(d.replicas)]
d.current++
return replica
}
Usage in API:
// Write
_, err := db.Write().Exec("INSERT INTO users ...")
// Read
rows, err := db.Read().Query("SELECT * FROM users WHERE ...")
Results after adding read replica:
- Primary CPU: 18% (was 92%)
- Replica CPU: 45%
- Average response time: 210ms (was 850ms)
- 95th percentile: 480ms (was 2.3s)
The primary now only handles writes (5% of traffic). The replica handles all reads. Adding a second replica splits read traffic 50/50.
Uncover more details in CPU Monitoring: From Linux Commands to a Go Dashboard
Step 5: The Third Bottleneck (Replica Lag)
With read replicas deployed, a new class of bug report appears: “I just changed my password but the app still shows the old one.”
Check replication lag:
-- On replica
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
Result: 8 seconds.
The replica is 8 seconds behind the primary. A user changes their password (write goes to primary), then immediately requests their profile (read goes to replica). The replica doesn’t have the new password yet. Stale read.
Why it happened: High write volume + network latency. Postgres replicates asynchronously by default. The replica applies changes as fast as it can, but it falls behind during peak traffic.
Journey deeper into this topic with CPU Monitoring: From Linux Commands to a Go Dashboard
Step 6: Third Fix — Read-After-Write Consistency
You can’t force the replica to sync faster (async replication is the only option for performance). Instead, fix it at the application level: after a write, read from the primary for the next N seconds.
package db
import (
"context"
"database/sql"
"sync"
"time"
)
type DB struct {
primary *sql.DB
replicas []*sql.DB
current int
// Track recent writes per user
recentWrites map[string]time.Time
mu sync.RWMutex
}
const stalenessTolerance = 10 * time.Second
func (d *DB) Write(ctx context.Context, userID string) *sql.DB {
d.mu.Lock()
d.recentWrites[userID] = time.Now()
d.mu.Unlock()
return d.primary
}
func (d *DB) Read(ctx context.Context, userID string) *sql.DB {
d.mu.RLock()
writeTime, exists := d.recentWrites[userID]
d.mu.RUnlock()
if exists && time.Since(writeTime) < stalenessTolerance {
// Read from primary if user wrote recently
return d.primary
}
// Otherwise read from replica
replica := d.replicas[d.current%len(d.replicas)]
d.current++
return replica
}
After a user writes, their reads go to the primary for 10 seconds. After 10 seconds, they go back to the replica. This guarantees consistency where it matters.
Results:
- Stale read reports: 0 (was 5-10 per day)
- Primary CPU increased slightly: 22% (from 18%)
- Replica CPU: 40% each
The trade-off: primary handles slightly more read traffic (only recent-write users). But consistency is guaranteed.
Enrich your learning with AWS CLI Automation: From Bash Scripts to Go
Step 7: The Fourth Bottleneck (Query Performance)
At 800K users, queries started slowing down again despite CPU being fine:
SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20;
Execution time: 2.1 seconds.
Check the query plan:
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20;
Limit (cost=45231.23..45231.28 rows=20 width=180) (actual time=2087.231..2087.245 rows=20 loops=1)
-> Sort (cost=45231.23..45356.89 rows=50265 width=180) (actual time=2087.229..2087.236 rows=20 loops=1)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 29kB
-> Seq Scan on orders (cost=0.00..43980.12 rows=50265 width=180) (actual time=0.034..1843.421 rows=48392 loops=1)
Filter: (user_id = $1)
Rows Removed by Filter: 4738261
Planning Time: 0.215 ms
Execution Time: 2087.291 ms
Seq Scan = sequential scan. Postgres read 4.7 million rows to find 48K matching rows. No index on user_id.
Add the index:
CREATE INDEX CONCURRENTLY idx_orders_user_id_created_at ON orders (user_id, created_at DESC);
CONCURRENTLY means it builds the index without locking the table. Takes longer but doesn’t block writes.
Check the query plan again:
Limit (cost=0.43..45.67 rows=20 width=180) (actual time=0.123..0.234 rows=20 loops=1)
-> Index Scan using idx_orders_user_id_created_at on orders (cost=0.43..113456.78 rows=50265 width=180) (actual time=0.121..0.226 rows=20 loops=1)
Index Cond: (user_id = $1)
Planning Time: 0.089 ms
Execution Time: 0.267 ms
Index Scan now. Execution time: 0.27ms (was 2087ms). 7,700x faster.
Lesson: Indexes matter more than hardware. Fix your queries before throwing bigger servers at the problem.
Gain comprehensive insights from CPU Monitoring: From Linux Commands to a Go Dashboard
Step 8: The Fifth Bottleneck (Data Size and Cost)
At 2M users, a cost problem appears. Database storage has grown to 800GB. AWS RDS charges $0.115/GB/month for SSD. That’s $92/month for storage alone. Plus the primary and 2 replicas = 3 × 800GB = 2.4TB = $276/month just for storage.
And the backups. RDS gives you automated backup storage equal to your DB size for free; only backup storage beyond that is billed (around $0.095/GB-month). With a 7-day retention window the extra storage stays modest, but it grows with your data.
Total database cost: ~$1,800/month and growing.
Check what’s using space:
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
schemaname | tablename | size
------------+-----------------+---------
public | events | 520 GB
public | logs | 180 GB
public | orders | 65 GB
public | users | 35 GB
The events and logs tables are analytics data. They’re queried for reports, but they don’t need to be in the primary database.
Master this concept through AWS CLI Automation: From Bash Scripts to Go
Step 9: Fifth Fix — Offload Analytics to Data Warehouse
Move events and logs to a separate data warehouse. AWS Redshift (a columnar database optimized for analytics) is a good fit.
Set up a daily sync:
# Export from Postgres
pg_dump --table events --table logs --data-only myapp > /tmp/analytics.sql
# Import to Redshift
psql -h analytics.redshift.amazonaws.com -d analytics -f /tmp/analytics.sql
This can be automated with AWS Data Pipeline.
Delete old data from Postgres:
DELETE FROM events WHERE created_at < NOW() - INTERVAL '90 days';
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days';
Results:
- Database size: 300GB (from 800GB)
- Monthly cost: $800 (from $1,800)
- Query performance: unchanged (analytics queries now go to Redshift)
Analytics queries got faster too because Redshift is optimized for them. Win-win.
Delve into specifics at Building a URL Shortener: From Linux Networking to Go
Step 10: The Final Bottleneck (Write Throughput)
At 4M users, writes started queueing:
Metrics:
- Users: 4M
- Write queries/sec: 1,200
- Primary CPU: 85%
- Disk I/O: 8,000 IOPS (max 10,000)
- Write latency: 950ms
The primary can’t keep up. Queries are optimized. Reads are offloaded. But writes have to go to one place: the primary.
Options:
- Vertical scaling: Bigger instance (more CPU, more IOPS)
- Sharding: Split data across multiple databases
Vertical scaling buys ~2 more months with db.m5.4xlarge (16 vCPU, 64GB RAM, 20,000 IOPS). Then the same wall returns. Time to shard.
Deepen your understanding in Build and Deploy a Go Lambda Function
Step 11: Final Fix — Sharding
Sharding means splitting data across multiple databases. Each database (shard) holds a subset of users.
API determines user's shard:
shard_id = user_id % num_shards
┌─────────────┐
│ API Server │
└──────┬──────┘
│
├───▶ Shard 0 (user_id % 4 == 0)
├───▶ Shard 1 (user_id % 4 == 1)
├───▶ Shard 2 (user_id % 4 == 2)
└───▶ Shard 3 (user_id % 4 == 3)
Start with 4 shards. Each shard is a full Postgres primary + 2 read replicas.
Shard routing logic:
package db
import (
"database/sql"
"fmt"
)
type ShardedDB struct {
shards []*DB
}
func (s *ShardedDB) GetShard(userID int64) *DB {
shardID := userID % int64(len(s.shards))
return s.shards[shardID]
}
func (s *ShardedDB) Write(userID int64) *sql.DB {
return s.GetShard(userID).Write()
}
func (s *ShardedDB) Read(userID int64) *sql.DB {
return s.GetShard(userID).Read()
}
Usage:
shard := db.GetShard(userID)
_, err := shard.Write().Exec("INSERT INTO orders ...")
Migration process:
- Create 4 empty databases (shards)
- Write a migration script that reads from the old database and writes to the correct shard
- Run the migration during a low-traffic window (several hours at a few million users)
- Update the API to use sharded routing
- Monitor for 1 week, then decommission the old database
Results after sharding:
Deepen your understanding in Build and Deploy a Go Lambda Function
- Primary CPU per shard: 22% (was 85% on single database)
- Write latency: 180ms (was 950ms)
- Write throughput: 4,800 queries/sec total (was 1,200)
- Can scale to 16M users before needing more shards
The Full Progression
Across the journey, the database architecture evolves through several major changes:
- Single database → Connection errors at 150K users
- Added PgBouncer → Fixed connections, hit CPU limits at 300K users
- Added read replicas → Fixed CPU, encountered replica lag
- Read-after-write consistency → Fixed stale reads, hit query performance issues
- Added indexes → Fixed queries, hit storage costs at 2M users
- Offloaded analytics → Reduced costs, hit write throughput at 4M users
- Sharding → Now supports 5M users, can scale to 16M+
Each change buys roughly 2-6 months of growth. Total cost rises from about $1,800/month (single database) to about $4,200/month (4 shards + Redshift) — while serving on the order of 50x more users.
Deepen your understanding in Build and Deploy a Go Lambda Function
Key Metrics Summary
| Stage | Users | Architecture | Avg Response | P95 Response | DB Cost/Month |
|---|---|---|---|---|---|
| Start | 100K | Single DB | 180ms | 450ms | $400 |
| Connection Pool | 150K | Single DB + PgBouncer | 175ms | 440ms | $450 |
| Read Replica | 500K | Primary + 2 Replicas | 210ms | 480ms | $1,200 |
| Analytics Offload | 2M | Primary + 2 Replicas + Redshift | 195ms | 420ms | $1,100 |
| Sharding | 5M | 4 Shards (12 DBs total) + Redshift | 185ms | 395ms | $4,200 |
Response times actually improve across the progression because each bottleneck is fixed at the stage it appears.
Deepen your understanding in Build and Deploy a Go Lambda Function
Lessons Learned
1. Don’t shard too early. Sharding could begin at 300K users, but it isn’t needed that early. Vertical scaling + replicas + indexes carry the system to 4M users before sharding becomes necessary.
2. Measure before fixing. Every change was driven by metrics. CPU high? Add replicas. Writes slow? Add indexes. Costs high? Offload cold data.
3. Application-level fixes are cheaper than infrastructure. Connection pooling costs $0 and buys ~6 months. Read-after-write consistency costs ~2 days of dev time and fixes stale reads. Sharding costs ~4 weeks of dev time and ~$3,000/month more infrastructure.
4. Indexes matter more than hardware. That 7,700x speedup from one index can delay the need for sharding by months.
5. Know your bottleneck. CPU? Add replicas. I/O? Bigger disks. Connections? Add pooling. Writes? Shard. Don’t guess. Profile.
Deepen your understanding in Build and Deploy a Go Lambda Function
Keep Reading
- Task Automation: From Cron to a Go Task Runner — automate database maintenance tasks (backups, migrations, monitoring)
- Nginx Log Analysis: From grep to a Go Log Parser — apply the same scaling patterns to log processing
- Config Templating: From envsubst to Go — manage database connection strings across environments
Similar Articles
Related Content
More from system-design
No related category content found.
You Might Also Like
Kubernetes CrashLoopBackOff explained: a workflow to diagnose it and fix the six most common causes, …
Learn Kubernetes fundamentals hands-on: deploy your first pod, understand Deployments and …
Contents
- Starting Point: Single Postgres Database
- Step 1: The First Bottleneck (Connection Limits)
- Step 2: First Fix — Connection Pooling
- Step 3: The Second Bottleneck (Database CPU)
- Step 4: Second Fix — Read Replica
- Step 5: The Third Bottleneck (Replica Lag)
- Step 6: Third Fix — Read-After-Write Consistency
- Step 7: The Fourth Bottleneck (Query Performance)
- Step 8: The Fifth Bottleneck (Data Size and Cost)
- Step 9: Fifth Fix — Offload Analytics to Data Warehouse
- Step 10: The Final Bottleneck (Write Throughput)
- Step 11: Final Fix — Sharding
- The Full Progression
- Key Metrics Summary
- Lessons Learned
- Keep Reading

