Skip main navigation
/user/kayd @ devops :~$ cat aws-s3-hosting-guide.md

The Complete Guide to AWS S3 Hosting for Modern Web Applications The Complete Guide to AWS S3 Hosting for Modern Web Applications

QR Code linking to: The Complete Guide to AWS S3 Hosting for Modern Web Applications
Karandeep Singh
Karandeep Singh
• 12 minutes

Summary

Production guide to S3 hosting based on migrating a large documentation site from EC2 WordPress to S3+CloudFront. Includes cost breakdowns, performance lessons, and field notes from managing S3-hosted sites in production.

Cloud server technology representing scalable web hosting infrastructure

The Expensive WordPress Problem

I once inherited a documentation website running on WordPress. The site had a large catalogue of technical documentation pages and was hosted on multiple EC2 instances behind an ALB. The monthly AWS bill was substantial.

The performance was terrible:

  • Slow average page loads
  • High server response times
  • Heavy database query overhead on every page
  • Recurring outages from PHP-FPM crashes under load

The site was entirely static content - no user accounts, no comments, no dynamic functionality. Every page hit required PHP to query MySQL, render the template, and serve HTML. We were paying a premium to serve static content dynamically.

This article documents our migration to S3 + CloudFront hosting, which dramatically reduced our monthly costs while delivering much faster average load times.

The Migration Strategy: WordPress to Static S3

Our migration plan involved:

  1. Export WordPress content to static HTML
  2. Set up S3 bucket configured for static website hosting
  3. Configure CloudFront CDN with custom domain and SSL
  4. Implement 301 redirects for changed URLs
  5. Switch DNS from ALB to CloudFront
  6. Decommission EC2 instances

The critical challenge: the page count was large enough that migration needed to be automated and thoroughly tested before cutover.

Phase 1: Static Site Export from WordPress

First challenge: exporting a large WordPress site to static HTML. I tried the “Simply Static” plugin - it crashed partway through the export.

The solution: custom export script using WordPress CLI and parallel processing:

#!/bin/bash
# export_wordpress.sh - Export WordPress to static HTML

WP_CLI="/usr/local/bin/wp"
OUTPUT_DIR="/var/www/static_export"
BASE_URL="https://docs.example.com"

# Get all published pages and posts
$WP_CLI post list --post_type=page,post --post_status=publish --format=ids | \
tr ' ' '\n' | \
xargs -P 10 -I {} bash -c '
    # Get post slug and content
    SLUG=$('$WP_CLI' post get {} --field=post_name)
    URL=$('$WP_CLI' post url {})

    # Fetch rendered HTML
    curl -s "$URL" > "'$OUTPUT_DIR'/$SLUG.html"

    echo "Exported: $SLUG"
'

This script processed pages in parallel (10 concurrent), and the full export took several hours. Key lessons:

  • Use xargs -P for parallelization
  • Handle WordPress permalinks correctly
  • Preserve URL structure for SEO
  • Export took multiple attempts before getting URL mapping correct

Phase 2: S3 Bucket Configuration

Setting up the S3 bucket required specific configuration for static website hosting:

# Create bucket
aws s3 mb s3://docs-example-com --region us-east-1

# Enable static website hosting
aws s3 website s3://docs-example-com \
    --index-document index.html \
    --error-document 404.html

# Upload content
aws s3 sync /var/www/static_export s3://docs-example-com \
    --delete \
    --cache-control "max-age=31536000, public" \
    --exclude "*.html" \
    --exclude "*.xml"

# HTML files get shorter cache (for updates)
aws s3 sync /var/www/static_export s3://docs-example-com \
    --delete \
    --cache-control "max-age=3600, public" \
    --exclude "*" \
    --include "*.html" \
    --include "*.xml"

Critical mistake I made: initially set Cache-Control to 1 year on HTML files. When we found broken links, CloudFront cached the 404 pages for a year. The resulting wide-scale invalidation racked up real charges.

The correct approach: long cache for static assets (images, CSS, JS), short cache for HTML.

Phase 3: CloudFront Configuration (The Tricky Part)

CloudFront setup had three major bugs that took days to debug:

Bug #1: Origin Access Identity Misconfiguration

First attempt at CloudFront configuration:

# Create CloudFront distribution (WRONG - caused 403 errors)
aws cloudfront create-distribution --origin-domain-name docs-example-com.s3.amazonaws.com

This failed with 403 errors on all requests. The problem: CloudFront was trying to access S3 objects directly, but the bucket policy only allowed public access via S3 website endpoint.

The fix: Use S3 website endpoint as origin, not the S3 bucket endpoint:

# Correct origin configuration
aws cloudfront create-distribution \
    --origin-domain-name docs-example-com.s3-website-us-east-1.amazonaws.com \
    --default-root-object index.html

Key difference:

  • S3 bucket endpoint: bucket-name.s3.amazonaws.com (requires OAI/OAC)
  • S3 website endpoint: bucket-name.s3-website-region.amazonaws.com (public access)

Bug #2: Missing Custom Error Responses

After fixing origin, we got CloudFront default error pages (white text on black background - terrible UX). Needed custom 404 handling:

# Configure custom error responses
aws cloudfront update-distribution --id DISTRIBUTION_ID --distribution-config '{
  "CustomErrorResponses": {
    "Items": [
      {
        "ErrorCode": 404,
        "ResponsePagePath": "/404.html",
        "ResponseCode": "404",
        "ErrorCachingMinTTL": 300
      },
      {
        "ErrorCode": 403,
        "ResponsePagePath": "/404.html",
        "ResponseCode": "404",
        "ErrorCachingMinTTL": 300
      }
    ]
  }
}'

Critical insight: S3 website hosting returns 403 for missing files, not 404. Must map both 403 and 404 to custom error page.

Bug #3: Cache Invalidation Hell

After deploying, we found a number of broken links. Fixed them and uploaded corrected HTML. But CloudFront kept serving the old broken pages.

Problem: CloudFront’s default TTL is 24 hours. Our content was cached with broken links.

Cost of lesson learned:

# Invalidate all HTML files (expensive!)
aws cloudfront create-invalidation \
    --distribution-id DISTRIBUTION_ID \
    --paths "/*.html" "/*/index.html"

# Cost: $0.005 per path after first 1000 free per month
# A wide invalidation across many HTML paths can rack up real charges

The correct approach: Use versioned filenames for assets, or use shorter TTL for HTML:

# Set appropriate cache behaviors
aws s3 cp /var/www/static_export s3://docs-example-com \
    --recursive \
    --cache-control "max-age=3600" \
    --exclude "*" \
    --include "*.html"

After these fixes, CloudFront worked smoothly, with very low average response times globally.

Phase 4: Custom Domain and SSL Configuration

Configuring custom domain required three components: Route 53, ACM certificate, and CloudFront alias.

SSL Certificate Request (Must be in us-east-1)

Critical mistake: I initially requested the ACM certificate in ca-central-1 (our primary region). CloudFront REQUIRES certificates in us-east-1.

# Request certificate in us-east-1 (required for CloudFront)
aws acm request-certificate \
    --domain-name docs.example.com \
    --validation-method DNS \
    --region us-east-1

# Get certificate ARN
CERT_ARN="arn:aws:acm:us-east-1:ACCOUNT_ID:certificate/CERT_ID"

DNS Validation

# Add CNAME records to Route 53 for validation
aws route53 change-resource-record-sets --hosted-zone-id Z1234567890ABC --change-batch '{
  "Changes": [{
    "Action": "CREATE",
    "ResourceRecordSet": {
      "Name": "_validation.docs.example.com",
      "Type": "CNAME",
      "TTL": 300,
      "ResourceRecords": [{"Value": "validation-value-from-acm.acm-validation.aws."}]
    }
  }]
}'

Certificate validation completed quickly once the DNS records propagated.

CloudFront Domain Configuration

# Update CloudFront distribution with custom domain and SSL
aws cloudfront update-distribution --id DISTRIBUTION_ID --distribution-config '{
  "Aliases": {
    "Items": ["docs.example.com"]
  },
  "ViewerCertificate": {
    "ACMCertificateArn": "'$CERT_ARN'",
    "SSLSupportMethod": "sni-only",
    "MinimumProtocolVersion": "TLSv1.2_2021"
  }
}'

Route 53 A Record

# Create alias record pointing to CloudFront
aws route53 change-resource-record-sets --hosted-zone-id Z1234567890ABC --change-batch '{
  "Changes": [{
    "Action": "CREATE",
    "ResourceRecordSet": {
      "Name": "docs.example.com",
      "Type": "A",
      "AliasTarget": {
        "HostedZoneId": "Z2FDTNDATAQYW2",
        "DNSName": "d123456789.cloudfront.net",
        "EvaluateTargetHealth": false
      }
    }
  }]
}'

Note: Z2FDTNDATAQYW2 is CloudFront’s hosted zone ID (constant for all CloudFront distributions).

DNS propagation finished quickly. HTTPS worked immediately after DNS updated.

Production Performance Metrics

After migration, I monitored performance over the following weeks. The improvement was dramatic:

Load Time Comparison

Before (WordPress on EC2):

  • Slow average page loads
  • High Time to First Byte (TTFB)
  • Sluggish DOM Content Loaded times
  • Long fully-loaded times

After (S3 + CloudFront):

  • Much faster average page loads
  • Very low TTFB
  • Snappy DOM Content Loaded
  • Quick fully-loaded times

Geographic Performance

Testing from different locations using WebPageTest showed dramatic improvements everywhere - especially for far-from-origin users.

CloudFront’s edge locations eliminated most of the geographic latency. The biggest gains were for users physically far from the original origin region.

Traffic Handling

Stress tested both configurations:

WordPress Setup:

  • Throughput plateaued at modest sustained request rates before response times degraded
  • CPU pegged under load
  • Required aggressive CloudWatch alarms

S3 + CloudFront:

  • Sustained much higher request rates with no degradation
  • No scaling concerns
  • CloudFront absorbed load automatically

During an unexpected traffic surge from a popular link, the WordPress setup would have crashed. S3+CloudFront didn’t notice.

Real Cost Breakdown

Where the money used to go (and where it now goes):

Before: WordPress on EC2

The bulk of the bill was driven by:

  • Multiple long-running EC2 instances
  • Provisioned EBS volumes
  • An Application Load Balancer
  • A managed RDS database
  • Data transfer out
  • Automated EBS snapshot backups

This added up to a substantial monthly spend just to keep static-feeling content online.

After: S3 + CloudFront

The replacement bill collapsed to a handful of small line items:

  • S3 storage for the site contents
  • S3 GET requests
  • CloudFront data transfer
  • CloudFront requests
  • Route 53 hosted zone
  • ACM certificate (free)

Total monthly cost dropped to a fraction of the original.

The result was a large monthly cost reduction.

This doesn’t include the eliminated operational costs:

  • No more EC2 patching
  • No more WordPress/PHP updates
  • No more database backups to monitor
  • No more late-night “site down” alerts

The annualized savings were significant.

These savings funded engineer time for feature development instead of infrastructure maintenance.

Security Hardening: Lessons from Production

After initial deployment, our security team audited the setup, drawing on the kind of controls covered in this AWS security features guide. Three issues found:

Issue #1: S3 Bucket Public Access

Security team flagged that S3 bucket was publicly accessible. While necessary for S3 website hosting, it’s a red flag in security scans.

The solution: Use CloudFront with Origin Access Control (OAC) and block public S3 access:

# Block public access to S3 bucket
aws s3api put-public-access-block \
    --bucket docs-example-com \
    --public-access-block-configuration \
    "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

# Create Origin Access Control
aws cloudfront create-origin-access-control \
    --origin-access-control-config '{
        "Name": "docs-oac",
        "SigningProtocol": "sigv4",
        "SigningBehavior": "always",
        "OriginAccessControlOriginType": "s3"
    }'

# Update bucket policy to only allow CloudFront
aws s3api put-bucket-policy --bucket docs-example-com --policy '{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AllowCloudFrontServicePrincipal",
        "Effect": "Allow",
        "Principal": {"Service": "cloudfront.amazonaws.com"},
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::docs-example-com/*",
        "Condition": {
            "StringEquals": {
                "AWS:SourceArn": "arn:aws:cloudfront::ACCOUNT_ID:distribution/DISTRIBUTION_ID"
            }
        }
    }]
}'

Note: This requires using S3 bucket origin (not S3 website endpoint). Changed our CloudFront origin configuration.

Issue #2: Missing Security Headers

Initial deployment had no security headers. Added using CloudFront response headers policy:

aws cloudfront create-response-headers-policy \
    --response-headers-policy-config '{
        "Name": "security-headers-policy",
        "SecurityHeadersConfig": {
            "StrictTransportSecurity": {
                "Override": true,
                "AccessControlMaxAgeSec": 31536000,
                "IncludeSubdomains": true
            },
            "ContentTypeOptions": {"Override": true},
            "FrameOptions": {
                "Override": true,
                "FrameOption": "DENY"
            },
            "XSSProtection": {
                "Override": true,
                "Protection": true,
                "ModeBlock": true
            },
            "ReferrerPolicy": {
                "Override": true,
                "ReferrerPolicy": "strict-origin-when-cross-origin"
            }
        }
    }'

This fixed security scanner complaints and meaningfully improved our security scan score.

Issue #3: Access Logging

Security team required access logs for compliance. Enabled S3 and CloudFront logging:

# Create logging bucket
aws s3 mb s3://docs-example-com-logs

# Enable S3 access logging
aws s3api put-bucket-logging --bucket docs-example-com \
    --bucket-logging-status '{
        "LoggingEnabled": {
            "TargetBucket": "docs-example-com-logs",
            "TargetPrefix": "s3-access-logs/"
        }
    }'

# Enable CloudFront logging
aws cloudfront update-distribution --id DISTRIBUTION_ID --distribution-config '{
    "Logging": {
        "Enabled": true,
        "IncludeCookies": false,
        "Bucket": "docs-example-com-logs.s3.amazonaws.com",
        "Prefix": "cloudfront-logs/"
    }
}'

Logs proved valuable when investigating a traffic spike (turned out to be a bot, blocked via WAF).

Deployment Automation with GitHub Actions

Manual deployment worked for initial migration, but we needed CI/CD for ongoing updates. If you prefer running the same pipeline inside AWS, a CodeBuild buildspec walkthrough covers the equivalent build phases. Implemented GitHub Actions pipeline:

# .github/workflows/deploy.yml
name: Deploy to S3

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Sync to S3
        run: |
          aws s3 sync ./content s3://docs-example-com \
            --delete \
            --cache-control "max-age=3600" \
            --exclude "*.jpg" \
            --exclude "*.png" \
            --exclude "*.css" \
            --exclude "*.js"

          # Static assets get longer cache
          aws s3 sync ./content s3://docs-example-com \
            --cache-control "max-age=31536000, immutable" \
            --exclude "*" \
            --include "*.jpg" \
            --include "*.png" \
            --include "*.css" \
            --include "*.js"

      - name: Invalidate CloudFront cache
        run: |
          aws cloudfront create-invalidation \
            --distribution-id ${{ secrets.CLOUDFRONT_DISTRIBUTION_ID }} \
            --paths "/*.html" "/sitemap.xml" "/robots.txt"

        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_DEFAULT_REGION: us-east-1

Key optimizations:

  • Only invalidate HTML/sitemap/robots.txt (not images/CSS) to minimize costs
  • Use --delete flag to remove old files
  • Separate cache-control for different file types

This pipeline deploys very quickly, compared to the much slower WordPress deployment process it replaced.

Monitoring and Alerting

Set up CloudWatch alarms to monitor:

# Alert on 5xx error rate > 1%
aws cloudwatch put-metric-alarm \
    --alarm-name "docs-high-error-rate" \
    --alarm-description "CloudFront 5xx error rate above 1%" \
    --metric-name 5xxErrorRate \
    --namespace AWS/CloudFront \
    --statistic Average \
    --period 300 \
    --evaluation-periods 2 \
    --threshold 1.0 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=DistributionId,Value=$DISTRIBUTION_ID

# Alert on sudden traffic spike (potential DDoS)
aws cloudwatch put-metric-alarm \
    --alarm-name "docs-traffic-spike" \
    --alarm-description "Request rate 10x above baseline" \
    --metric-name Requests \
    --namespace AWS/CloudFront \
    --statistic Sum \
    --period 300 \
    --evaluation-periods 1 \
    --threshold 100000 \
    --comparison-operator GreaterThanThreshold

After running this in production for an extended period:

0
  • Effectively no 5xx error alerts (the site has been highly available)
  • A handful of traffic spike alerts (all legitimate traffic, not attacks)
  • Monthly cost stayed comfortably low

Lessons Learned from Running This in Production

What Went Right

  1. Performance improvement exceeded expectations: Much faster load times globally
  2. Cost savings were significant: A large reduction in monthly hosting spend
  3. Zero maintenance burden: No more security patches, no more late-night server crashes
  4. Effectively unlimited scalability: Handled traffic spikes with zero issues
  5. Improved security posture: Security scan score improved meaningfully

What Went Wrong (And How We Fixed It)

  1. Initial CloudFront misconfiguration: Cost real money in invalidation charges. Fixed by using correct origin endpoint.
  2. Broken links after migration: A batch of pages had incorrect relative links. Fixed with find/replace script before redeployment.
  3. Overly aggressive caching: HTML cached for a day caused stale content. Reduced TTL substantially.
  4. Missing security headers: Security team flagged. Fixed with CloudFront response headers policy.
  5. No access logging initially: Added after security audit required it for compliance.

Key Recommendations

For anyone considering S3 hosting migration:

  1. Use S3 website endpoint OR bucket origin with OAC, not both: They have different behaviors. Choose one approach and stick with it.

  2. Get caching right from the start: Long cache for assets, short cache for HTML. Wrong caching is expensive to fix.

  3. Enable access logging immediately: Required for security compliance and invaluable for debugging.

  4. Test thoroughly before cutover: We ran the new stack on a test subdomain for a meaningful soak period before switching production DNS.

  5. Have rollback plan: Keep the old WordPress environment running for a buffer window after cutover in case of issues.

  6. Monitor costs daily initially: Watch your AWS bill closely for the first while to ensure no surprises.

    1

Final Results: Worth Every Hour of Migration Effort

Migration timeline (high level):

  • Planning and testing
  • Export and data cleanup
  • S3 and CloudFront configuration
  • Security hardening
  • DNS cutover and monitoring

ROI after running this in production:

  • Significant cost savings versus the legacy stack
  • Eliminated operational burden (no patching, no incidents)
  • Much faster load times
  • Strong availability since cutover

The migration paid for itself quickly. Everything after that was pure savings and improved user experience.

For anyone running static/semi-static sites on traditional hosting: S3 + CloudFront is worth serious consideration. The cost savings alone justify the migration effort, and the performance improvements are a massive bonus. If your source content lives in a static site generator, this Hugo deploy via CodeBuild guide walks through wiring the same S3+CloudFront target into an automated build.

2

References and Further Reading

Question

Have you migrated from traditional hosting to S3? What challenges did you encounter?

Similar Articles

More from cloud

Knowledge Quiz

Test your general knowledge with this quick quiz!

The quiz consists of 5 multiple-choice questions.

Take as much time as you need.

Your score will be shown at the end.