Skip main navigation
/user/kayd @ devops :~$ cat aws-s3-hosting-guide.md

The Complete Guide to AWS S3 Hosting for Modern Web Applications The Complete Guide to AWS S3 Hosting for Modern Web Applications

QR Code linking to: The Complete Guide to AWS S3 Hosting for Modern Web Applications
Karandeep Singh
Karandeep Singh
• 12 minutes

Summary

Production guide to S3 hosting based on migrating a 50,000-page documentation site from EC2 WordPress to S3+CloudFront. Includes real cost breakdowns, performance metrics, and lessons from managing 15 S3-hosted sites in production.

Cloud server technology representing scalable web hosting infrastructure

The $450/Month WordPress Problem

In 2022, I inherited a documentation website for a Calgary-based SaaS company running on WordPress. The site had 50,000+ pages of technical documentation and was hosted on three EC2 t3.large instances behind an ALB. Monthly AWS bill: $450.

The performance was terrible:

  • Average page load time: 3.2 seconds
  • Server response time: 800ms
  • Database query time: 400-600ms per page
  • Weekly outages from PHP-FPM crashes under load

The site was entirely static content - no user accounts, no comments, no dynamic functionality. Every page hit required PHP to query MySQL, render the template, and serve HTML. We were paying $450/month to serve static content dynamically.

This article documents our migration to S3 + CloudFront hosting, reducing costs from $450/month to $18/month while improving average load times from 3.2s to 420ms.

The Migration Strategy: WordPress to Static S3

Our migration plan involved:

  1. Export WordPress content to static HTML
  2. Set up S3 bucket configured for static website hosting
  3. Configure CloudFront CDN with custom domain and SSL
  4. Implement 301 redirects for changed URLs
  5. Switch DNS from ALB to CloudFront
  6. Decommission EC2 instances

The critical challenge: 50,000 pages meant migration needed to be automated and thoroughly tested before cutover.

Phase 1: Static Site Export from WordPress

First challenge: exporting 50,000 WordPress pages to static HTML. I tried the “Simply Static” plugin - it crashed after 3,000 pages.

The solution: custom export script using WordPress CLI and parallel processing:

#!/bin/bash
# export_wordpress.sh - Export WordPress to static HTML

WP_CLI="/usr/local/bin/wp"
OUTPUT_DIR="/var/www/static_export"
BASE_URL="https://docs.example.com"

# Get all published pages and posts
$WP_CLI post list --post_type=page,post --post_status=publish --format=ids | \
tr ' ' '\n' | \
xargs -P 10 -I {} bash -c '
    # Get post slug and content
    SLUG=$('$WP_CLI' post get {} --field=post_name)
    URL=$('$WP_CLI' post url {})

    # Fetch rendered HTML
    curl -s "$URL" > "'$OUTPUT_DIR'/$SLUG.html"

    echo "Exported: $SLUG"
'

This script processed pages in parallel (10 concurrent), taking 6 hours to export all 50,000 pages. Key lessons:

  • Use xargs -P for parallelization
  • Handle WordPress permalinks correctly
  • Preserve URL structure for SEO
  • Export took 3 attempts before getting URL mapping correct

Phase 2: S3 Bucket Configuration

Setting up the S3 bucket required specific configuration for static website hosting:

# Create bucket
aws s3 mb s3://docs-example-com --region us-east-1

# Enable static website hosting
aws s3 website s3://docs-example-com \
    --index-document index.html \
    --error-document 404.html

# Upload content
aws s3 sync /var/www/static_export s3://docs-example-com \
    --delete \
    --cache-control "max-age=31536000, public" \
    --exclude "*.html" \
    --exclude "*.xml"

# HTML files get shorter cache (for updates)
aws s3 sync /var/www/static_export s3://docs-example-com \
    --delete \
    --cache-control "max-age=3600, public" \
    --exclude "*" \
    --include "*.html" \
    --include "*.xml"

Critical mistake I made: initially set Cache-Control to 1 year on HTML files. When we found broken links, CloudFront cached the 404 pages for a year. Had to invalidate 15,000 objects costing $75.

The correct approach: long cache for static assets (images, CSS, JS), short cache for HTML.

Phase 3: CloudFront Configuration (The Tricky Part)

CloudFront setup had three major bugs that took days to debug:

Bug #1: Origin Access Identity Misconfiguration

First attempt at CloudFront configuration:

# Create CloudFront distribution (WRONG - caused 403 errors)
aws cloudfront create-distribution --origin-domain-name docs-example-com.s3.amazonaws.com

This failed with 403 errors on all requests. The problem: CloudFront was trying to access S3 objects directly, but the bucket policy only allowed public access via S3 website endpoint.

The fix: Use S3 website endpoint as origin, not the S3 bucket endpoint:

# Correct origin configuration
aws cloudfront create-distribution \
    --origin-domain-name docs-example-com.s3-website-us-east-1.amazonaws.com \
    --default-root-object index.html

Key difference:

  • S3 bucket endpoint: bucket-name.s3.amazonaws.com (requires OAI/OAC)
  • S3 website endpoint: bucket-name.s3-website-region.amazonaws.com (public access)

Bug #2: Missing Custom Error Responses

After fixing origin, we got CloudFront default error pages (white text on black background - terrible UX). Needed custom 404 handling:

# Configure custom error responses
aws cloudfront update-distribution --id DISTRIBUTION_ID --distribution-config '{
  "CustomErrorResponses": {
    "Items": [
      {
        "ErrorCode": 404,
        "ResponsePagePath": "/404.html",
        "ResponseCode": "404",
        "ErrorCachingMinTTL": 300
      },
      {
        "ErrorCode": 403,
        "ResponsePagePath": "/404.html",
        "ResponseCode": "404",
        "ErrorCachingMinTTL": 300
      }
    ]
  }
}'

Critical insight: S3 website hosting returns 403 for missing files, not 404. Must map both 403 and 404 to custom error page.

Bug #3: Cache Invalidation Hell

After deploying, we found 200 broken links. Fixed them and uploaded corrected HTML. But CloudFront kept serving the old broken pages.

Problem: CloudFront’s default TTL is 24 hours. Our content was cached with broken links.

Cost of lesson learned:

# Invalidate all HTML files (expensive!)
aws cloudfront create-invalidation \
    --distribution-id DISTRIBUTION_ID \
    --paths "/*.html" "/*/index.html"

# Cost: $0.005 per path after first 1000 free per month
# Our invalidation: 15,000 paths = $70 charge

The correct approach: Use versioned filenames for assets, or use shorter TTL for HTML:

# Set appropriate cache behaviors
aws s3 cp /var/www/static_export s3://docs-example-com \
    --recursive \
    --cache-control "max-age=3600" \
    --exclude "*" \
    --include "*.html"

After these fixes, CloudFront worked perfectly. Average response time: 45ms globally.

Phase 4: Custom Domain and SSL Configuration

Configuring custom domain required three components: Route 53, ACM certificate, and CloudFront alias.

SSL Certificate Request (Must be in us-east-1)

Critical mistake: I initially requested the ACM certificate in ca-central-1 (our primary region). CloudFront REQUIRES certificates in us-east-1.

# Request certificate in us-east-1 (required for CloudFront)
aws acm request-certificate \
    --domain-name docs.example.com \
    --validation-method DNS \
    --region us-east-1

# Get certificate ARN
CERT_ARN="arn:aws:acm:us-east-1:ACCOUNT_ID:certificate/CERT_ID"

DNS Validation

# Add CNAME records to Route 53 for validation
aws route53 change-resource-record-sets --hosted-zone-id Z1234567890ABC --change-batch '{
  "Changes": [{
    "Action": "CREATE",
    "ResourceRecordSet": {
      "Name": "_validation.docs.example.com",
      "Type": "CNAME",
      "TTL": 300,
      "ResourceRecords": [{"Value": "validation-value-from-acm.acm-validation.aws."}]
    }
  }]
}'

Certificate validation took 15 minutes.

CloudFront Domain Configuration

# Update CloudFront distribution with custom domain and SSL
aws cloudfront update-distribution --id DISTRIBUTION_ID --distribution-config '{
  "Aliases": {
    "Items": ["docs.example.com"]
  },
  "ViewerCertificate": {
    "ACMCertificateArn": "'$CERT_ARN'",
    "SSLSupportMethod": "sni-only",
    "MinimumProtocolVersion": "TLSv1.2_2021"
  }
}'

Route 53 A Record

# Create alias record pointing to CloudFront
aws route53 change-resource-record-sets --hosted-zone-id Z1234567890ABC --change-batch '{
  "Changes": [{
    "Action": "CREATE",
    "ResourceRecordSet": {
      "Name": "docs.example.com",
      "Type": "A",
      "AliasTarget": {
        "HostedZoneId": "Z2FDTNDATAQYW2",
        "DNSName": "d123456789.cloudfront.net",
        "EvaluateTargetHealth": false
      }
    }
  }]
}'

Note: Z2FDTNDATAQYW2 is CloudFront’s hosted zone ID (constant for all CloudFront distributions).

DNS propagation took 5 minutes. HTTPS worked immediately after DNS updated.

Production Performance Metrics

After migration, I monitored performance for 30 days. The improvement was dramatic:

Load Time Comparison

Before (WordPress on EC2):

  • Average page load: 3.2 seconds
  • Time to First Byte (TTFB): 800ms
  • DOM Content Loaded: 1.4s
  • Fully Loaded: 3.2s

After (S3 + CloudFront):

  • Average page load: 420ms (87% improvement)
  • Time to First Byte (TTFB): 45ms (94% improvement)
  • DOM Content Loaded: 180ms
  • Fully Loaded: 420ms

Geographic Performance

Testing from different locations using WebPageTest:

LocationWordPress TTFBCloudFront TTFBImprovement
Calgary (local)120ms25ms79%
New York280ms38ms86%
London620ms52ms92%
Tokyo850ms68ms92%
Sydney920ms75ms92%

CloudFront’s edge locations eliminated geographic latency. Tokyo users saw the biggest improvement: 850ms → 68ms.

Traffic Handling

Stress tested both configurations:

WordPress Setup (t3.large x 3):

  • Max sustained: 150 req/sec before response times degraded
  • CPU at 85% under load
  • Required aggressive CloudWatch alarms

S3 + CloudFront:

  • Sustained 3,000+ req/sec with no degradation
  • No scaling concerns
  • CloudFront absorbed load automatically

During a HackerNews front page hit (15,000 concurrent users), the WordPress setup would have crashed. S3+CloudFront didn’t notice.

Real Cost Breakdown

Here’s the actual AWS bill comparison (30-day average):

Before: WordPress on EC2

3x t3.large EC2 instances: $315.36
EBS volumes (300GB gp3): $27.00
Application Load Balancer: $22.50
RDS db.t3.medium: $68.40
Data transfer out: $18.20
Backups (automated EBS snapshots): $12.00
------------------------------
Monthly Total: $463.46

After: S3 + CloudFront

S3 storage (2.4GB): $0.055
S3 requests (2.1M GET): $0.84
CloudFront data transfer (120GB): $10.20
CloudFront requests (2.1M): $2.10
Route 53 hosted zone: $0.50
ACM certificate: $0.00 (free)
------------------------------
Monthly Total: $13.69

Savings: $449.77/month (97% reduction)

This doesn’t include the eliminated operational costs:

  • No more EC2 patching
  • No more WordPress/PHP updates
  • No more database backups to monitor
  • No more 3 AM “site down” alerts

Annual savings: $5,397.24

These savings funded two additional engineer months for feature development instead of infrastructure maintenance.

Security Hardening: Lessons from Production

After initial deployment, our security team audited the setup. Three issues found:

Issue #1: S3 Bucket Public Access

Security team flagged that S3 bucket was publicly accessible. While necessary for S3 website hosting, it’s a red flag in security scans.

The solution: Use CloudFront with Origin Access Control (OAC) and block public S3 access:

# Block public access to S3 bucket
aws s3api put-public-access-block \
    --bucket docs-example-com \
    --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

# Create Origin Access Control
aws cloudfront create-origin-access-control \
    --origin-access-control-config '{
        "Name": "docs-oac",
        "SigningProtocol": "sigv4",
        "SigningBehavior": "always",
        "OriginAccessControlOriginType": "s3"
    }'

# Update bucket policy to only allow CloudFront
aws s3api put-bucket-policy --bucket docs-example-com --policy '{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AllowCloudFrontServicePrincipal",
        "Effect": "Allow",
        "Principal": {"Service": "cloudfront.amazonaws.com"},
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::docs-example-com/*",
        "Condition": {
            "StringEquals": {
                "AWS:SourceArn": "arn:aws:cloudfront::ACCOUNT_ID:distribution/DISTRIBUTION_ID"
            }
        }
    }]
}'

Note: This requires using S3 bucket origin (not S3 website endpoint). Changed our CloudFront origin configuration.

Issue #2: Missing Security Headers

Initial deployment had no security headers. Added using CloudFront response headers policy:

aws cloudfront create-response-headers-policy \
    --response-headers-policy-config '{
        "Name": "security-headers-policy",
        "SecurityHeadersConfig": {
            "StrictTransportSecurity": {
                "Override": true,
                "AccessControlMaxAgeSec": 31536000,
                "IncludeSubdomains": true
            },
            "ContentTypeOptions": {"Override": true},
            "FrameOptions": {
                "Override": true,
                "FrameOption": "DENY"
            },
            "XSSProtection": {
                "Override": true,
                "Protection": true,
                "ModeBlock": true
            },
            "ReferrerPolicy": {
                "Override": true,
                "ReferrerPolicy": "strict-origin-when-cross-origin"
            }
        }
    }'

This fixed security scanner complaints and improved our security score from B to A+.

Issue #3: Access Logging

Security team required access logs for compliance. Enabled S3 and CloudFront logging:

# Create logging bucket
aws s3 mb s3://docs-example-com-logs

# Enable S3 access logging
aws s3api put-bucket-logging --bucket docs-example-com \
    --bucket-logging-status '{
        "LoggingEnabled": {
            "TargetBucket": "docs-example-com-logs",
            "TargetPrefix": "s3-access-logs/"
        }
    }'

# Enable CloudFront logging
aws cloudfront update-distribution --id DISTRIBUTION_ID --distribution-config '{
    "Logging": {
        "Enabled": true,
        "IncludeCookies": false,
        "Bucket": "docs-example-com-logs.s3.amazonaws.com",
        "Prefix": "cloudfront-logs/"
    }
}'

Logs proved valuable when investigating a traffic spike (turned out to be a bot, blocked via WAF).

Deployment Automation with GitHub Actions

Manual deployment worked for initial migration, but we needed CI/CD for ongoing updates. Implemented GitHub Actions pipeline:

# .github/workflows/deploy.yml
name: Deploy to S3

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Sync to S3
        run: |
          aws s3 sync ./content s3://docs-example-com \
            --delete \
            --cache-control "max-age=3600" \
            --exclude "*.jpg" \
            --exclude "*.png" \
            --exclude "*.css" \
            --exclude "*.js"

          # Static assets get longer cache
          aws s3 sync ./content s3://docs-example-com \
            --cache-control "max-age=31536000, immutable" \
            --exclude "*" \
            --include "*.jpg" \
            --include "*.png" \
            --include "*.css" \
            --include "*.js"

      - name: Invalidate CloudFront cache
        run: |
          aws cloudfront create-invalidation \
            --distribution-id ${{ secrets.CLOUDFRONT_DISTRIBUTION_ID }} \
            --paths "/*.html" "/sitemap.xml" "/robots.txt"

        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_DEFAULT_REGION: us-east-1

Key optimizations:

  • Only invalidate HTML/sitemap/robots.txt (not images/CSS) to minimize costs
  • Use --delete flag to remove old files
  • Separate cache-control for different file types

This pipeline deploys in 45 seconds, compared to 15 minutes for WordPress deployment.

Monitoring and Alerting

Set up CloudWatch alarms to monitor:

# Alert on 5xx error rate > 1%
aws cloudwatch put-metric-alarm \
    --alarm-name "docs-high-error-rate" \
    --alarm-description "CloudFront 5xx error rate above 1%" \
    --metric-name 5xxErrorRate \
    --namespace AWS/CloudFront \
    --statistic Average \
    --period 300 \
    --evaluation-periods 2 \
    --threshold 1.0 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=DistributionId,Value=$DISTRIBUTION_ID

# Alert on sudden traffic spike (potential DDoS)
aws cloudwatch put-metric-alarm \
    --alarm-name "docs-traffic-spike" \
    --alarm-description "Request rate 10x above baseline" \
    --metric-name Requests \
    --namespace AWS/CloudFront \
    --statistic Sum \
    --period 300 \
    --evaluation-periods 1 \
    --threshold 100000 \
    --comparison-operator GreaterThanThreshold

In 18 months of operation:

0
  • Zero 5xx error alerts (site has been 100% available)
  • Three traffic spike alerts (all legitimate traffic, not attacks)
  • Average monthly cost remained under $20

Lessons Learned from 18 Months of Production

What Went Right

  1. Performance improvement exceeded expectations: 87% faster load times globally
  2. Cost savings were massive: $449/month → $18/month (97% reduction)
  3. Zero maintenance burden: No more security patches, no more late-night server crashes
  4. Infinite scalability: Handled HackerNews traffic spike with zero issues
  5. Improved security posture: Security scan score went from B to A+

What Went Wrong (And How We Fixed It)

  1. Initial CloudFront misconfiguration: Cost $70 in invalidation charges. Fixed by using correct origin endpoint.
  2. Broken links after migration: 200 pages had incorrect relative links. Fixed with find/replace script before redeployment.
  3. Overly aggressive caching: HTML cached for 24 hours caused stale content. Reduced to 1 hour TTL.
  4. Missing security headers: Security team flagged. Fixed with CloudFront response headers policy.
  5. No access logging initially: Added after security audit required it for compliance.

Key Recommendations

For anyone considering S3 hosting migration:

  1. Use S3 website endpoint OR bucket origin with OAC, not both: They have different behaviors. Choose one approach and stick with it.

  2. Get caching right from the start: Long cache for assets, short cache for HTML. Wrong caching is expensive to fix.

  3. Enable access logging immediately: Required for security compliance and invaluable for debugging.

  4. Test thoroughly before cutover: We tested for 2 weeks with test subdomain before switching production DNS.

  5. Have rollback plan: Keep old WordPress environment running for 7 days after cutover in case of issues.

  6. Monitor costs daily initially: Watch your AWS bill closely for the first week to ensure no surprises.

    1

Final Results: Worth Every Hour of Migration Effort

Migration timeline:

  • Planning and testing: 2 weeks
  • Export and data cleanup: 3 days
  • S3 and CloudFront configuration: 2 days
  • Security hardening: 1 day
  • DNS cutover and monitoring: 1 day
  • Total effort: ~25 engineer days

ROI after 18 months:

  • Cost savings: $8,096 ($449/month × 18 months - $18/month × 18 months)
  • Eliminated operational burden: ~40 hours saved (no patching, no incidents)
  • Performance improvement: 87% faster load times
  • Zero downtime in 18 months

The migration paid for itself in 60 days. Everything after that was pure savings and improved user experience.

For Calgary-based companies or anyone running static/semi-static sites on traditional hosting: S3 + CloudFront is worth serious consideration. The cost savings alone justify the migration effort, and the performance improvements are a massive bonus.

2

References and Further Reading

Question

Have you migrated from traditional hosting to S3? What challenges did you encounter?

Similar Articles

More from cloud

Knowledge Quiz

Test your general knowledge with this quick quiz!

The quiz consists of 5 multiple-choice questions.

Take as much time as you need.

Your score will be shown at the end.