Set up a Kubernetes cluster on AWS EKS with eksctl: prerequisites, one-command cluster creation, …
Boto3 + AWS Lambda: A Production Serverless Pipeline Boto3 + AWS Lambda: A Production Serverless Pipeline

Summary

The Surprisingly High Lambda Bill That Taught Me Boto3
I once built a serverless analytics pipeline. The requirements seemed straightforward:
- Process user activity events from SQS queue
- Enrich events with user data from DynamoDB
- Store processed events in S3 for analysis
- Handle a high volume of events daily, with significant peak traffic
First month’s AWS bill: much higher than expected.
The problem wasn’t Lambda itself. The problem was how I used Boto3 in Lambda. This article documents the optimization journey that reduced costs significantly while improving reliability.
Expand your knowledge with Build and Deploy a Go Lambda Function
The Naive First Implementation (That Cost Way Too Much)
Here’s my initial Lambda function - textbook example but terrible for production:
import boto3
import json
# DON'T DO THIS - creates client on every invocation
def lambda_handler(event, context):
# Cold start penalty - initializing clients inside handler
s3 = boto3.client('s3')
dynamodb = boto3.client('dynamodb')
sqs = boto3.client('sqs')
for record in event['Records']:
# Parse event
event_data = json.loads(record['body'])
# Enrich with user data - SYNCHRONOUS call (slow!)
user_response = dynamodb.get_item(
TableName='users',
Key={'user_id': {'S': event_data['user_id']}}
)
# Process data
processed = {
'event': event_data,
'user': user_response.get('Item', {})
}
# Write to S3 - one file per event (expensive!)
s3.put_object(
Bucket='analytics-raw',
Key=f"events/{event_data['event_id']}.json",
Body=json.dumps(processed)
)
return {'statusCode': 200}
What went wrong:
- Client initialization inside handler: noticeable cold start overhead per invocation
- One S3 PUT per event: every event becomes a PUT request, ballooning PUT costs
- Synchronous DynamoDB calls: slow average execution time
- No batch processing: Each Lambda invoked for single event
- No error handling: Failed events lost forever
- Memory over-provisioned: 1024MB when 256MB sufficient
Where the costs piled up:
Deepen your understanding in Advanced Bash String Operations
- Lambda execution dominated the bill (long durations × huge invocation count)
- S3 PUT requests added up fast (one PUT per event)
- DynamoDB reads were costly (one read per event, with consistent reads)
- Data transfer added a smaller but real chunk
The Optimized Implementation (Major Cost Reduction)
After several weeks of optimization, here’s the production version:
import boto3
import json
from typing import List, Dict
import os
# Initialize clients OUTSIDE handler (reused across invocations)
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
users_table = dynamodb.Table('users')
# Environment variables
BUCKET_NAME = os.environ['ANALYTICS_BUCKET']
BATCH_SIZE = 100
def lambda_handler(event, context):
"""
Process SQS events in batches
Memory: 256MB (reduced from 1024MB)
Timeout: 60s
Batch size: 10 messages (configured in SQS trigger)
"""
events_buffer = []
failed_items = []
for record in event['Records']:
try:
event_data = json.loads(record['body'])
# Batch DynamoDB requests (10x faster than individual gets)
user_data = get_user_cached(event_data['user_id'])
events_buffer.append({
'event': event_data,
'user': user_data,
'processed_at': context.request_id
})
# Flush buffer when full
if len(events_buffer) >= BATCH_SIZE:
write_batch_to_s3(events_buffer)
events_buffer = []
except Exception as e:
# Send failures to DLQ for reprocessing
failed_items.append({
'itemIdentifier': record['messageId'],
'error': str(e)
})
# Flush remaining events
if events_buffer:
write_batch_to_s3(events_buffer)
# Return partial batch failures
return {
'batchItemFailures': failed_items
}
# In-memory cache (persists across warm invocations)
user_cache = {}
def get_user_cached(user_id: str) -> Dict:
"""Get user with Lambda execution context caching"""
if user_id in user_cache:
return user_cache[user_id]
# Batch read with consistent read disabled (eventual consistency OK)
response = users_table.get_item(
Key={'user_id': user_id},
ConsistentRead=False # 50% cost reduction
)
user = response.get('Item', {})
user_cache[user_id] = user # Cache for warm invocations
return user
def write_batch_to_s3(events: List[Dict]):
"""Write 100 events as single S3 object instead of 100 separate PUTs"""
timestamp = events[0]['event']['timestamp']
date = timestamp[:10] # YYYY-MM-DD
s3.put_object(
Bucket=BUCKET_NAME,
Key=f"events/date={date}/{context.request_id}.json",
Body='\n'.join(json.dumps(e) for e in events),
ContentType='application/json'
)
Key optimizations:
- Client initialization outside handler: Eliminated repeated cold start overhead
- Batch S3 writes: Many events per PUT (a large reduction in PUT requests)
- In-memory caching: Strong cache hit rate on users
- Eventual consistency for DynamoDB: meaningful cost reduction
- Reduced memory: 256MB (sufficient for workload)
- Partial batch failure handling: Failed events automatically retry
Where costs dropped:
Explore this further in The Complete Guide to AWS S3 Static Website Hosting
- Lambda execution dropped sharply once average duration came down
- S3 PUT requests fell dramatically thanks to batching
- DynamoDB reads got cheaper with eventual consistency and caching
- A small DLQ storage line item appeared, but the overall bill saw a large reduction
Production Lessons from Running This at Scale
Lesson 1: Cold Starts Matter
Initial cold starts were painfully slow with client initialization inside the handler.
Optimizations that worked:
- Initialize Boto3 clients outside handler: noticeably faster cold starts
- Use Lambda layers for dependencies: further cold start improvement
- Minimize deployment package: another nudge faster
- Provisioned concurrency for critical paths: eliminated cold starts entirely
End result: substantially faster cold starts.
Lesson 2: Concurrent Execution Limits Will Hit You
We hit AWS account concurrency limits during a traffic spike. Our queue backed up significantly.
The fix:
- Requested a concurrency limit increase
- Implemented exponential backoff in producers
- Added CloudWatch alarms for queue depth thresholds
Lesson 3: DLQ Configuration is Not Optional
Early on, we lost events due to unhandled errors before implementing DLQ.
Proper error handling:
- Configure SQS DLQ with multi-day retention
- Set a sensible maxReceiveCount (retry failed messages a few times)
- Monitor DLQ depth daily
- Weekly review of DLQ messages to identify systematic issues
Lesson 4: Memory vs Duration is a Trade-off
After testing several memory configurations, doubling memory from 128MB to 256MB roughly halved duration at similar cost — so 256MB ended up being the sweet spot for this workload. Going higher cost more without proportional speedup.
Lesson 5: Boto3 Retries Need Configuration
Default Boto3 retry config caused timeout issues during AWS service hiccups.
Custom retry configuration:
from botocore.config import Config
retry_config = Config(
retries={
'max_attempts': 3,
'mode': 'adaptive' # Uses exponential backoff
},
connect_timeout=5,
read_timeout=10
)
s3 = boto3.client('s3', config=retry_config)
This dramatically reduced the rate of timeout errors across invocations.
Discover related concepts in Sed Cheat Sheet: 30 One-Liners from Real Production Logs
Cost Optimization Checklist
From expensive mistakes:
Uncover more details in The Complete Guide to AWS S3 Static Website Hosting
- Initialize Boto3 clients outside handler function
- Batch operations (S3 PUTs, DynamoDB batch operations)
- Use eventual consistency for DynamoDB when possible
- Right-size memory allocation (test different configs)
- Implement caching for frequently accessed data
- Configure proper timeouts to avoid runaway executions
- Use reserved capacity for predictable workloads (notable discount)
- Enable compression for S3 objects
- Clean up old DLQ messages
- Monitor costs daily in first month
When NOT to Use Lambda + Boto3
After building many serverless pipelines, Lambda + Boto3 is NOT appropriate for:
- Long-running tasks (>15 minutes) - Use Fargate or EC2
- Large file processing (>10GB) - Lambda has 10GB storage limit
- Consistent sub-10ms latency requirements - Cold starts are unpredictable
- High-frequency, steady-state workloads - EC2 is cheaper
- Complex dependencies - Deployment packages >250MB don’t work well
Lambda + Boto3 excels at:
Journey deeper into this topic with Build and Deploy a Go Lambda Function
- Event-driven architectures
- Intermittent workloads
- Rapid scaling requirements (idle to many concurrent invocations in seconds)
- Variable traffic patterns
References
What's been your biggest challenge with serverless data pipelines? Cold starts? Cost optimization? Error handling?
Similar Articles
Related Content
More from cloud
Kubernetes CrashLoopBackOff explained: a step-by-step workflow to diagnose it and fix the six most …
Learn Kubernetes fundamentals hands-on: deploy your first pod, understand Deployments and …
You Might Also Like
Build a Go app that sends and processes SQS messages: start with one message, hit the visibility …
Build a Go CRUD app with DynamoDB from scratch: start with raw attribute maps, hit the verbosity …
A hands-on guide to building your first AWS Lambda function with Go: start with a basic handler, hit …

