Deployment & Infrastructure

Deploying Metabase on AWS - A Production-Ready Guide

Deploying Metabase on AWS in production means running the application container on a managed compute service (ECS Fargate or EC2), backed by an RDS Po...

šŸ“…
šŸ“–9 min read

Deploying Metabase on AWS: A Production-Ready Guide

Deploying Metabase on AWS in production means running the application container on a managed compute service (ECS Fargate or EC2), backed by an RDS PostgreSQL instance as the application database, behind an Application Load Balancer with HTTPS termination, with secrets managed through AWS Secrets Manager. This guide covers the complete architecture, Terraform configuration, and the decisions that determine whether a deployment is stable under real load.

---

Reference Architecture

Internet

│ ā–¼ Route 53 (DNS) │ ā–¼ Application Load Balancer (HTTPS :443) │ TLS termination, health checks ā–¼ ECS Fargate (Metabase container) │ Private subnet, no public IP ā”œā”€ā”€ā–¶ RDS PostgreSQL (application DB) │ Private subnet, port 5432 └──▶ Your data sources (RDS, Redshift, etc.) Private subnet or VPC peering

Secrets Manager └──▶ ECS task (DB password, API keys, embedding secret)

Key Design Decisions in This Architecture

ECS Fargate over EC2: Fargate eliminates the need to manage EC2 instances, OS patching, and container scheduling. You define the task and Fargate runs it. For a single-instance Metabase deployment, Fargate is the lowest-overhead option.

RDS PostgreSQL for the application database: Metabase requires an external PostgreSQL database for production. RDS gives you automated backups, Multi-AZ failover, and managed upgrades without running your own PostgreSQL server.

Private subnets for all compute: The ECS task and RDS instances have no public IP addresses. All inbound traffic enters through the ALB. This reduces the attack surface significantly.

Secrets Manager for credentials: Database passwords and the embedding secret key are stored in Secrets Manager and injected as environment variables at container startup — never stored in ECS task definitions or application code.

---

Prerequisites

  • AWS account with appropriate IAM permissions
  • Terraform >= 1.5 (or use the AWS Console / CloudFormation — the concepts are identical)
  • A registered domain (for the ALB HTTPS certificate)
  • An ACM certificate for your domain
  • ---

    Terraform Configuration

    VPC and Networking

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">networking.tf</h1>

    resource "aws_vpc" "metabase" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true

    tags = { Name = "metabase-vpc" } }

    resource "aws_subnet" "public" { count = 2 vpc_id = aws_vpc.metabase.id cidr_block = cidrsubnet("10.0.0.0/16", 8, count.index) availability_zone = data.aws_availability_zones.available.names[count.index]

    map_public_ip_on_launch = true tags = { Name = "metabase-public-${count.index}" } }

    resource "aws_subnet" "private" { count = 2 vpc_id = aws_vpc.metabase.id cidr_block = cidrsubnet("10.0.0.0/16", 8, count.index + 10) availability_zone = data.aws_availability_zones.available.names[count.index]

    tags = { Name = "metabase-private-${count.index}" } }

    resource "aws_internet_gateway" "metabase" { vpc_id = aws_vpc.metabase.id }

    resource "aws_nat_gateway" "metabase" { allocation_id = aws_eip.nat.id subnet_id = aws_subnet.public[0].id depends_on = [aws_internet_gateway.metabase] }

    resource "aws_eip" "nat" { domain = "vpc" }

    Security Groups

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">security_groups.tf</h1>

    <h1 class="text-4xl font-bold mb-6 text-slate-900">ALB: accepts HTTPS from the internet</h1> resource "aws_security_group" "alb" { name = "metabase-alb" vpc_id = aws_vpc.metabase.id

    ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }

    ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }

    egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

    <h1 class="text-4xl font-bold mb-6 text-slate-900">ECS task: accepts traffic only from the ALB</h1> resource "aws_security_group" "ecs" { name = "metabase-ecs" vpc_id = aws_vpc.metabase.id

    ingress { from_port = 3000 to_port = 3000 protocol = "tcp" security_groups = [aws_security_group.alb.id] }

    egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

    <h1 class="text-4xl font-bold mb-6 text-slate-900">RDS: accepts connections only from ECS tasks</h1> resource "aws_security_group" "rds" { name = "metabase-rds" vpc_id = aws_vpc.metabase.id

    ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.ecs.id] } }

    RDS PostgreSQL (Application Database)

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">rds.tf</h1>

    resource "aws_db_subnet_group" "metabase" { name = "metabase" subnet_ids = aws_subnet.private[*].id }

    resource "aws_db_instance" "metabase" { identifier = "metabase-app-db" engine = "postgres" engine_version = "15.4" instance_class = "db.t3.small" # increase for larger deployments allocated_storage = 20 storage_type = "gp3"

    db_name = "metabase" username = "metabase_app" password = random_password.db_password.result

    db_subnet_group_name = aws_db_subnet_group.metabase.name vpc_security_group_ids = [aws_security_group.rds.id]

    backup_retention_period = 7 # 7 days of automated backups backup_window = "03:00-04:00" maintenance_window = "sun:04:00-sun:05:00"

    deletion_protection = true # prevents accidental deletion skip_final_snapshot = false final_snapshot_identifier = "metabase-final-snapshot"

    # Enable enhanced monitoring monitoring_interval = 60 monitoring_role_arn = aws_iam_role.rds_monitoring.arn

    tags = { Name = "metabase-app-db" } }

    resource "random_password" "db_password" { length = 32 special = false # avoid characters that break connection strings }

    Secrets Manager

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">secrets.tf</h1>

    resource "aws_secretsmanager_secret" "metabase" { name = "metabase/config" }

    resource "aws_secretsmanager_secret_version" "metabase" { secret_id = aws_secretsmanager_secret.metabase.id secret_string = jsonencode({ db_password = random_password.db_password.result embedding_secret = random_password.embedding_secret.result }) }

    resource "random_password" "embedding_secret" { length = 64 special = false }

    ECS Fargate

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">ecs.tf</h1>

    resource "aws_ecs_cluster" "metabase" { name = "metabase"

    setting { name = "containerInsights" value = "enabled" } }

    resource "aws_ecs_task_definition" "metabase" { family = "metabase" network_mode = "awsvpc" requires_compatibilities = ["FARGATE"] cpu = "1024" # 1 vCPU memory = "3072" # 3 GB — JVM needs headroom above Xmx

    execution_role_arn = aws_iam_role.ecs_execution.arn task_role_arn = aws_iam_role.ecs_task.arn

    container_definitions = jsonencode([ { name = "metabase" image = "metabase/metabase:v0.50.0" # pin to specific version

    portMappings = [ { containerPort = 3000, protocol = "tcp" } ]

    environment = [ { name = "MB_DB_TYPE", value = "postgres" }, { name = "MB_DB_HOST", value = aws_db_instance.metabase.address }, { name = "MB_DB_PORT", value = "5432" }, { name = "MB_DB_DBNAME", value = "metabase" }, { name = "MB_DB_USER", value = "metabase_app" }, { name = "MB_SITE_URL", value = "https://${var.domain_name}" }, { name = "JAVA_OPTS", value = "-Xmx2g -Xms512m" }, { name = "MB_ANON_TRACKING_ENABLED", value = "false" } ]

    secrets = [ { name = "MB_DB_PASS" valueFrom = "${aws_secretsmanager_secret.metabase.arn}:db_password::" }, { name = "MB_EMBEDDING_SECRET_KEY" valueFrom = "${aws_secretsmanager_secret.metabase.arn}:embedding_secret::" } ]

    logConfiguration = { logDriver = "awslogs" options = { "awslogs-group" = "/ecs/metabase" "awslogs-region" = var.aws_region "awslogs-stream-prefix" = "metabase" } }

    healthCheck = { command = ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"] interval = 30 timeout = 10 retries = 5 startPeriod = 120 # Metabase takes ~90s on first start } } ]) }

    resource "aws_ecs_service" "metabase" { name = "metabase" cluster = aws_ecs_cluster.metabase.id task_definition = aws_ecs_task_definition.metabase.arn desired_count = 1 launch_type = "FARGATE"

    network_configuration { subnets = aws_subnet.private[*].id security_groups = [aws_security_group.ecs.id] assign_public_ip = false }

    load_balancer { target_group_arn = aws_lb_target_group.metabase.arn container_name = "metabase" container_port = 3000 }

    # Ensure new task is healthy before stopping the old one deployment_minimum_healthy_percent = 100 deployment_maximum_percent = 200

    depends_on = [aws_lb_listener.https] }

    Application Load Balancer

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">alb.tf</h1>

    resource "aws_lb" "metabase" { name = "metabase" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = aws_subnet.public[*].id

    enable_deletion_protection = true }

    resource "aws_lb_target_group" "metabase" { name = "metabase" port = 3000 protocol = "HTTP" vpc_id = aws_vpc.metabase.id target_type = "ip" # required for Fargate

    health_check { path = "/api/health" healthy_threshold = 2 unhealthy_threshold = 3 timeout = 10 interval = 30 matcher = "200" } }

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Redirect HTTP to HTTPS</h1> resource "aws_lb_listener" "http" { load_balancer_arn = aws_lb.metabase.arn port = 80 protocol = "HTTP"

    default_action { type = "redirect" redirect { port = "443" protocol = "HTTPS" status_code = "HTTP_301" } } }

    resource "aws_lb_listener" "https" { load_balancer_arn = aws_lb.metabase.arn port = 443 protocol = "HTTPS" ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06" certificate_arn = var.acm_certificate_arn

    default_action { type = "forward" target_group_arn = aws_lb_target_group.metabase.arn } }

    ---

    IAM Roles

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">iam.tf</h1>

    <h1 class="text-4xl font-bold mb-6 text-slate-900">ECS execution role: allows ECS to pull images and inject secrets</h1> resource "aws_iam_role" "ecs_execution" { name = "metabase-ecs-execution"

    assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [{ Action = "sts:AssumeRole" Effect = "Allow" Principal = { Service = "ecs-tasks.amazonaws.com" } }] }) }

    resource "aws_iam_role_policy_attachment" "ecs_execution" { role = aws_iam_role.ecs_execution.name policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" }

    resource "aws_iam_policy" "secrets_access" { name = "metabase-secrets-access" policy = jsonencode({ Version = "2012-10-17" Statement = [{ Effect = "Allow" Action = ["secretsmanager:GetSecretValue"] Resource = [aws_secretsmanager_secret.metabase.arn] }] }) }

    resource "aws_iam_role_policy_attachment" "secrets_access" { role = aws_iam_role.ecs_execution.name policy_arn = aws_iam_policy.secrets_access.arn }

    ---

    Upgrading Metabase on ECS

    To upgrade Metabase, update the image tag in the task definition and force a new deployment:

    bash
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Update task definition with new image version</h1> <h1 class="text-4xl font-bold mb-6 text-slate-900">(edit the image tag in ecs.tf, then apply)</h1> terraform apply -target=aws_ecs_task_definition.metabase

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Force new deployment (ECS pulls new task definition)</h1> aws ecs update-service \ --cluster metabase \ --service metabase \ --force-new-deployment

    ECS performs a rolling deployment: the new task starts and passes health checks before the old task is stopped. deployment_minimum_healthy_percent = 100 ensures there's no downtime during upgrades.

    ---

    Monitoring and Alerting

    CloudWatch Alarms

    hcl
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">cloudwatch.tf</h1>

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Alert if Metabase health check fails</h1> resource "aws_cloudwatch_metric_alarm" "unhealthy_hosts" { alarm_name = "metabase-unhealthy-hosts" comparison_operator = "GreaterThanThreshold" evaluation_periods = 2 metric_name = "UnHealthyHostCount" namespace = "AWS/ApplicationELB" period = 60 statistic = "Average" threshold = 0 alarm_description = "Metabase has unhealthy hosts" alarm_actions = [aws_sns_topic.alerts.arn]

    dimensions = { LoadBalancer = aws_lb.metabase.arn_suffix TargetGroup = aws_lb_target_group.metabase.arn_suffix } }

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Alert on high CPU (JVM under stress)</h1> resource "aws_cloudwatch_metric_alarm" "high_cpu" { alarm_name = "metabase-high-cpu" comparison_operator = "GreaterThanThreshold" evaluation_periods = 3 metric_name = "CPUUtilization" namespace = "AWS/ECS" period = 300 statistic = "Average" threshold = 80 alarm_actions = [aws_sns_topic.alerts.arn]

    dimensions = { ClusterName = aws_ecs_cluster.metabase.name ServiceName = aws_ecs_service.metabase.name } }

    ---

    Cost Estimates

    For a small-to-medium deployment (< 50 concurrent users):

    ComponentConfigurationApprox. Monthly Cost
    ECS Fargate1 vCPU, 3GB, always-on~$35
    RDS PostgreSQLdb.t3.small, 20GB gp3~$30
    ALBLow traffic~$20
    NAT GatewayLow traffic~$35
    Secrets Manager1 secret~$0.50
    CloudWatchBasic logging~$5
    Total~$125/month
    For larger deployments, scale up the Fargate CPU/memory and RDS instance class. Metabase does not require horizontal scaling for most deployments — a single task with adequate memory handles hundreds of concurrent users.

    ---

    Summary

    A production Metabase deployment on AWS uses ECS Fargate for the application container, RDS PostgreSQL for the application database, an Application Load Balancer for HTTPS termination, and Secrets Manager for credential injection. All compute runs in private subnets with no public IP addresses. The full architecture can be provisioned with Terraform in under 30 minutes. Key configuration decisions are: pin to a specific Metabase image version, set deployment_minimum_healthy_percent = 100 for zero-downtime upgrades, allocate at least 3GB of memory for the Fargate task, and enable automated RDS backups with a 7-day retention window.