Deploying Metabase on AWS - A Production-Ready Guide
Deploying Metabase on AWS in production means running the application container on a managed compute service (ECS Fargate or EC2), backed by an RDS Po...
Deploying Metabase on AWS: A Production-Ready Guide
Deploying Metabase on AWS in production means running the application container on a managed compute service (ECS Fargate or EC2), backed by an RDS PostgreSQL instance as the application database, behind an Application Load Balancer with HTTPS termination, with secrets managed through AWS Secrets Manager. This guide covers the complete architecture, Terraform configuration, and the decisions that determine whether a deployment is stable under real load.
---
Reference Architecture
Internet
ā ā¼ Route 53 (DNS) ā ā¼ Application Load Balancer (HTTPS :443) ā TLS termination, health checks ā¼ ECS Fargate (Metabase container) ā Private subnet, no public IP āāāā¶ RDS PostgreSQL (application DB) ā Private subnet, port 5432 āāāā¶ Your data sources (RDS, Redshift, etc.) Private subnet or VPC peering
Secrets Manager āāāā¶ ECS task (DB password, API keys, embedding secret)
Key Design Decisions in This Architecture
ECS Fargate over EC2: Fargate eliminates the need to manage EC2 instances, OS patching, and container scheduling. You define the task and Fargate runs it. For a single-instance Metabase deployment, Fargate is the lowest-overhead option.
RDS PostgreSQL for the application database: Metabase requires an external PostgreSQL database for production. RDS gives you automated backups, Multi-AZ failover, and managed upgrades without running your own PostgreSQL server.
Private subnets for all compute: The ECS task and RDS instances have no public IP addresses. All inbound traffic enters through the ALB. This reduces the attack surface significantly.
Secrets Manager for credentials: Database passwords and the embedding secret key are stored in Secrets Manager and injected as environment variables at container startup ā never stored in ECS task definitions or application code.
---
Prerequisites
- AWS account with appropriate IAM permissions
---
Terraform Configuration
VPC and Networking
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">networking.tf</h1>
resource "aws_vpc" "metabase" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true
tags = { Name = "metabase-vpc" } }
resource "aws_subnet" "public" { count = 2 vpc_id = aws_vpc.metabase.id cidr_block = cidrsubnet("10.0.0.0/16", 8, count.index) availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true tags = { Name = "metabase-public-${count.index}" } }
resource "aws_subnet" "private" { count = 2 vpc_id = aws_vpc.metabase.id cidr_block = cidrsubnet("10.0.0.0/16", 8, count.index + 10) availability_zone = data.aws_availability_zones.available.names[count.index]
tags = { Name = "metabase-private-${count.index}" } }
resource "aws_internet_gateway" "metabase" { vpc_id = aws_vpc.metabase.id }
resource "aws_nat_gateway" "metabase" { allocation_id = aws_eip.nat.id subnet_id = aws_subnet.public[0].id depends_on = [aws_internet_gateway.metabase] }
resource "aws_eip" "nat" { domain = "vpc" }
Security Groups
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">security_groups.tf</h1>
<h1 class="text-4xl font-bold mb-6 text-slate-900">ALB: accepts HTTPS from the internet</h1> resource "aws_security_group" "alb" { name = "metabase-alb" vpc_id = aws_vpc.metabase.id
ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }
ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }
<h1 class="text-4xl font-bold mb-6 text-slate-900">ECS task: accepts traffic only from the ALB</h1> resource "aws_security_group" "ecs" { name = "metabase-ecs" vpc_id = aws_vpc.metabase.id
ingress { from_port = 3000 to_port = 3000 protocol = "tcp" security_groups = [aws_security_group.alb.id] }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }
<h1 class="text-4xl font-bold mb-6 text-slate-900">RDS: accepts connections only from ECS tasks</h1> resource "aws_security_group" "rds" { name = "metabase-rds" vpc_id = aws_vpc.metabase.id
ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.ecs.id] } }
RDS PostgreSQL (Application Database)
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">rds.tf</h1>
resource "aws_db_subnet_group" "metabase" { name = "metabase" subnet_ids = aws_subnet.private[*].id }
resource "aws_db_instance" "metabase" { identifier = "metabase-app-db" engine = "postgres" engine_version = "15.4" instance_class = "db.t3.small" # increase for larger deployments allocated_storage = 20 storage_type = "gp3"
db_name = "metabase" username = "metabase_app" password = random_password.db_password.result
db_subnet_group_name = aws_db_subnet_group.metabase.name vpc_security_group_ids = [aws_security_group.rds.id]
backup_retention_period = 7 # 7 days of automated backups backup_window = "03:00-04:00" maintenance_window = "sun:04:00-sun:05:00"
deletion_protection = true # prevents accidental deletion skip_final_snapshot = false final_snapshot_identifier = "metabase-final-snapshot"
# Enable enhanced monitoring monitoring_interval = 60 monitoring_role_arn = aws_iam_role.rds_monitoring.arn
tags = { Name = "metabase-app-db" } }
resource "random_password" "db_password" { length = 32 special = false # avoid characters that break connection strings }
Secrets Manager
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">secrets.tf</h1>
resource "aws_secretsmanager_secret" "metabase" { name = "metabase/config" }
resource "aws_secretsmanager_secret_version" "metabase" { secret_id = aws_secretsmanager_secret.metabase.id secret_string = jsonencode({ db_password = random_password.db_password.result embedding_secret = random_password.embedding_secret.result }) }
resource "random_password" "embedding_secret" { length = 64 special = false }
ECS Fargate
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">ecs.tf</h1>
resource "aws_ecs_cluster" "metabase" { name = "metabase"
setting { name = "containerInsights" value = "enabled" } }
resource "aws_ecs_task_definition" "metabase" { family = "metabase" network_mode = "awsvpc" requires_compatibilities = ["FARGATE"] cpu = "1024" # 1 vCPU memory = "3072" # 3 GB ā JVM needs headroom above Xmx
execution_role_arn = aws_iam_role.ecs_execution.arn task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([ { name = "metabase" image = "metabase/metabase:v0.50.0" # pin to specific version
portMappings = [ { containerPort = 3000, protocol = "tcp" } ]
environment = [ { name = "MB_DB_TYPE", value = "postgres" }, { name = "MB_DB_HOST", value = aws_db_instance.metabase.address }, { name = "MB_DB_PORT", value = "5432" }, { name = "MB_DB_DBNAME", value = "metabase" }, { name = "MB_DB_USER", value = "metabase_app" }, { name = "MB_SITE_URL", value = "https://${var.domain_name}" }, { name = "JAVA_OPTS", value = "-Xmx2g -Xms512m" }, { name = "MB_ANON_TRACKING_ENABLED", value = "false" } ]
secrets = [ { name = "MB_DB_PASS" valueFrom = "${aws_secretsmanager_secret.metabase.arn}:db_password::" }, { name = "MB_EMBEDDING_SECRET_KEY" valueFrom = "${aws_secretsmanager_secret.metabase.arn}:embedding_secret::" } ]
logConfiguration = { logDriver = "awslogs" options = { "awslogs-group" = "/ecs/metabase" "awslogs-region" = var.aws_region "awslogs-stream-prefix" = "metabase" } }
healthCheck = { command = ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"] interval = 30 timeout = 10 retries = 5 startPeriod = 120 # Metabase takes ~90s on first start } } ]) }
resource "aws_ecs_service" "metabase" { name = "metabase" cluster = aws_ecs_cluster.metabase.id task_definition = aws_ecs_task_definition.metabase.arn desired_count = 1 launch_type = "FARGATE"
network_configuration { subnets = aws_subnet.private[*].id security_groups = [aws_security_group.ecs.id] assign_public_ip = false }
load_balancer { target_group_arn = aws_lb_target_group.metabase.arn container_name = "metabase" container_port = 3000 }
# Ensure new task is healthy before stopping the old one deployment_minimum_healthy_percent = 100 deployment_maximum_percent = 200
depends_on = [aws_lb_listener.https] }
Application Load Balancer
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">alb.tf</h1>
resource "aws_lb" "metabase" { name = "metabase" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = aws_subnet.public[*].id
enable_deletion_protection = true }
resource "aws_lb_target_group" "metabase" { name = "metabase" port = 3000 protocol = "HTTP" vpc_id = aws_vpc.metabase.id target_type = "ip" # required for Fargate
health_check { path = "/api/health" healthy_threshold = 2 unhealthy_threshold = 3 timeout = 10 interval = 30 matcher = "200" } }
<h1 class="text-4xl font-bold mb-6 text-slate-900">Redirect HTTP to HTTPS</h1> resource "aws_lb_listener" "http" { load_balancer_arn = aws_lb.metabase.arn port = 80 protocol = "HTTP"
default_action { type = "redirect" redirect { port = "443" protocol = "HTTPS" status_code = "HTTP_301" } } }
resource "aws_lb_listener" "https" { load_balancer_arn = aws_lb.metabase.arn port = 443 protocol = "HTTPS" ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06" certificate_arn = var.acm_certificate_arn
default_action { type = "forward" target_group_arn = aws_lb_target_group.metabase.arn } }
---
IAM Roles
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">iam.tf</h1>
<h1 class="text-4xl font-bold mb-6 text-slate-900">ECS execution role: allows ECS to pull images and inject secrets</h1> resource "aws_iam_role" "ecs_execution" { name = "metabase-ecs-execution"
assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [{ Action = "sts:AssumeRole" Effect = "Allow" Principal = { Service = "ecs-tasks.amazonaws.com" } }] }) }
resource "aws_iam_role_policy_attachment" "ecs_execution" { role = aws_iam_role.ecs_execution.name policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" }
resource "aws_iam_policy" "secrets_access" { name = "metabase-secrets-access" policy = jsonencode({ Version = "2012-10-17" Statement = [{ Effect = "Allow" Action = ["secretsmanager:GetSecretValue"] Resource = [aws_secretsmanager_secret.metabase.arn] }] }) }
resource "aws_iam_role_policy_attachment" "secrets_access" { role = aws_iam_role.ecs_execution.name policy_arn = aws_iam_policy.secrets_access.arn }
---
Upgrading Metabase on ECS
To upgrade Metabase, update the image tag in the task definition and force a new deployment:
bash
<h1 class="text-4xl font-bold mb-6 text-slate-900">Update task definition with new image version</h1> <h1 class="text-4xl font-bold mb-6 text-slate-900">(edit the image tag in ecs.tf, then apply)</h1> terraform apply -target=aws_ecs_task_definition.metabase
<h1 class="text-4xl font-bold mb-6 text-slate-900">Force new deployment (ECS pulls new task definition)</h1> aws ecs update-service \ --cluster metabase \ --service metabase \ --force-new-deployment
ECS performs a rolling deployment: the new task starts and passes health checks before the old task is stopped. deployment_minimum_healthy_percent = 100 ensures there's no downtime during upgrades.
---
Monitoring and Alerting
CloudWatch Alarms
hcl
<h1 class="text-4xl font-bold mb-6 text-slate-900">cloudwatch.tf</h1>
<h1 class="text-4xl font-bold mb-6 text-slate-900">Alert if Metabase health check fails</h1> resource "aws_cloudwatch_metric_alarm" "unhealthy_hosts" { alarm_name = "metabase-unhealthy-hosts" comparison_operator = "GreaterThanThreshold" evaluation_periods = 2 metric_name = "UnHealthyHostCount" namespace = "AWS/ApplicationELB" period = 60 statistic = "Average" threshold = 0 alarm_description = "Metabase has unhealthy hosts" alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = { LoadBalancer = aws_lb.metabase.arn_suffix TargetGroup = aws_lb_target_group.metabase.arn_suffix } }
<h1 class="text-4xl font-bold mb-6 text-slate-900">Alert on high CPU (JVM under stress)</h1> resource "aws_cloudwatch_metric_alarm" "high_cpu" { alarm_name = "metabase-high-cpu" comparison_operator = "GreaterThanThreshold" evaluation_periods = 3 metric_name = "CPUUtilization" namespace = "AWS/ECS" period = 300 statistic = "Average" threshold = 80 alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = { ClusterName = aws_ecs_cluster.metabase.name ServiceName = aws_ecs_service.metabase.name } }
---
Cost Estimates
For a small-to-medium deployment (< 50 concurrent users):
| Component | Configuration | Approx. Monthly Cost |
|---|---|---|
| ECS Fargate | 1 vCPU, 3GB, always-on | ~$35 |
| RDS PostgreSQL | db.t3.small, 20GB gp3 | ~$30 |
| ALB | Low traffic | ~$20 |
| NAT Gateway | Low traffic | ~$35 |
| Secrets Manager | 1 secret | ~$0.50 |
| CloudWatch | Basic logging | ~$5 |
| Total | ~$125/month |
---
Summary
A production Metabase deployment on AWS uses ECS Fargate for the application container, RDS PostgreSQL for the application database, an Application Load Balancer for HTTPS termination, and Secrets Manager for credential injection. All compute runs in private subnets with no public IP addresses. The full architecture can be provisioned with Terraform in under 30 minutes. Key configuration decisions are: pin to a specific Metabase image version, set deployment_minimum_healthy_percent = 100 for zero-downtime upgrades, allocate at least 3GB of memory for the Fargate task, and enable automated RDS backups with a 7-day retention window.