Running Metabase on Kubernetes
Running Metabase on Kubernetes means deploying it as a Deployment with a single replica, backed by an external PostgreSQL database, exposed through a...
Running Metabase on Kubernetes
Running Metabase on Kubernetes means deploying it as a Deployment with a single replica, backed by an external PostgreSQL database, exposed through a Service and Ingress, with configuration managed via ConfigMaps and Secrets. Metabase is a stateful application (its state lives in the application database), so Kubernetes orchestration focuses on reliability, zero-downtime upgrades, and clean secret management rather than horizontal autoscaling.
This guide covers a production Kubernetes manifest set, Helm chart usage, common configuration mistakes, and the specific Metabase behaviors that affect how you configure liveness and readiness probes.
---
Architecture Overview
Ingress (HTTPS)
ā ā¼ Service (ClusterIP :3000) ā ā¼ Deployment (1 replica) āāāāāāāāāāāāāāāāāāāāāāā ā metabase container ā ā image: metabase/ ā ā metabase:v0.50.0 ā āāāāāāāāāāāāāāāāāāāāāāā ā āāāāāāāāāāā“āāāāāāāāāāā ā¼ ā¼ External PostgreSQL Your Data Sources (RDS, Cloud SQL, (via network policy or in-cluster PG) or VPN)
Why Single Replica
Metabase is not designed for active-active horizontal scaling. Multiple replicas sharing the same application database will cause conflicts ā both instances attempt to run scheduled jobs, cache warming, and schema syncs simultaneously. Run one replica in production. For high availability, use Kubernetes rolling updates and a health-check-based readiness probe to ensure zero-downtime deployments.
---
Namespace and RBAC
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">namespace.yaml</h1> apiVersion: v1 kind: Namespace metadata: name: metabase labels: app.kubernetes.io/name: metabase
---
Secrets
Never store sensitive values in ConfigMaps. Use Kubernetes Secrets (or an external secrets manager like Vault, AWS Secrets Manager via External Secrets Operator, or Sealed Secrets):
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">secret.yaml</h1> apiVersion: v1 kind: Secret metadata: name: metabase-secrets namespace: metabase type: Opaque stringData: MB_DB_PASS: "your-db-password-here" MB_EMBEDDING_SECRET_KEY: "your-64-char-embedding-secret-here"
For production, generate the secret from your secrets manager rather than committing it to git:
bash
<h1 class="text-4xl font-bold mb-6 text-slate-900">Using kubectl with values from environment</h1> kubectl create secret generic metabase-secrets \ --namespace metabase \ --from-literal=MB_DB_PASS="$DB_PASSWORD" \ --from-literal=MB_EMBEDDING_SECRET_KEY="$EMBEDDING_SECRET"
Or using External Secrets Operator with AWS Secrets Manager:
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">external-secret.yaml</h1> apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: metabase-secrets namespace: metabase spec: refreshInterval: 1h secretStoreRef: name: aws-secrets-manager kind: ClusterSecretStore target: name: metabase-secrets data: - secretKey: MB_DB_PASS remoteRef: key: metabase/config property: db_password - secretKey: MB_EMBEDDING_SECRET_KEY remoteRef: key: metabase/config property: embedding_secret
---
ConfigMap
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">configmap.yaml</h1> apiVersion: v1 kind: ConfigMap metadata: name: metabase-config namespace: metabase data: MB_DB_TYPE: "postgres" MB_DB_HOST: "your-postgres-host.rds.amazonaws.com" MB_DB_PORT: "5432" MB_DB_DBNAME: "metabase" MB_DB_USER: "metabase_app" MB_SITE_URL: "https://analytics.yourapp.com" MB_SITE_NAME: "Analytics" MB_ANON_TRACKING_ENABLED: "false" JAVA_OPTS: "-Xmx2g -Xms512m" MB_JETTY_PORT: "3000"
---
Deployment
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">deployment.yaml</h1> apiVersion: apps/v1 kind: Deployment metadata: name: metabase namespace: metabase labels: app: metabase spec: replicas: 1 selector: matchLabels: app: metabase strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 # never take down the old pod before new one is ready maxSurge: 1 # allow one extra pod during rollout template: metadata: labels: app: metabase spec: terminationGracePeriodSeconds: 60
containers: - name: metabase image: metabase/metabase:v0.50.0 # always pin to a specific version imagePullPolicy: IfNotPresent ports: - containerPort: 3000 name: http
envFrom: - configMapRef: name: metabase-config
env: - name: MB_DB_PASS valueFrom: secretKeyRef: name: metabase-secrets key: MB_DB_PASS - name: MB_EMBEDDING_SECRET_KEY valueFrom: secretKeyRef: name: metabase-secrets key: MB_EMBEDDING_SECRET_KEY
resources: requests: memory: "1.5Gi" cpu: "500m" limits: memory: "3Gi" cpu: "2000m"
# Startup probe: allows up to 10 minutes for first startup # (JVM init + DB migrations can take 90+ seconds) startupProbe: httpGet: path: /api/health port: 3000 failureThreshold: 30 # 30 Ć 20s = 10 minutes periodSeconds: 20
# Readiness probe: only send traffic when Metabase is fully ready readinessProbe: httpGet: path: /api/health port: 3000 initialDelaySeconds: 10 periodSeconds: 10 failureThreshold: 3 successThreshold: 1
# Liveness probe: restart the container if Metabase stops responding livenessProbe: httpGet: path: /api/health port: 3000 initialDelaySeconds: 120 periodSeconds: 30 failureThreshold: 5 timeoutSeconds: 10
Probe Configuration Explained
Metabase has an unusually long startup time compared to most web services, which makes probe configuration critical:
startupProbe ā Kubernetes will not run liveness or readiness probes until the startup probe succeeds. Setting failureThreshold: 30 with periodSeconds: 20 gives Metabase up to 10 minutes to start (covering first-start migrations and JVM initialization) before Kubernetes declares it failed.
readinessProbe ā Controls when the pod receives traffic from the Service. Set initialDelaySeconds: 10 after the startup probe succeeds (not from container start). A failing readiness probe removes the pod from the load balancer without restarting it.
livenessProbe ā Restarts the container if it becomes unresponsive. Set initialDelaySeconds conservatively (120s) to prevent restart loops during legitimate slow queries.
---
Service
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">service.yaml</h1> apiVersion: v1 kind: Service metadata: name: metabase namespace: metabase spec: selector: app: metabase ports: - port: 80 targetPort: 3000 name: http type: ClusterIP
---
Ingress
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">ingress.yaml</h1> apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: metabase namespace: metabase annotations: # nginx-ingress nginx.ingress.kubernetes.io/proxy-connect-timeout: "300" nginx.ingress.kubernetes.io/proxy-send-timeout: "300" nginx.ingress.kubernetes.io/proxy-read-timeout: "300" # cert-manager for automatic TLS cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: ingressClassName: nginx tls: - hosts: - analytics.yourapp.com secretName: metabase-tls rules: - host: analytics.yourapp.com http: paths: - path: / pathType: Prefix backend: service: name: metabase port: name: http
Timeout Configuration
Metabase queries can be long-running. Set proxy timeouts generously (300 seconds) to prevent the Ingress from cutting off legitimate queries. Without this, users running complex queries will see unexpected "connection reset" errors.
---
PodDisruptionBudget
Ensure at least one Metabase instance is always running during cluster maintenance:
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">pdb.yaml</h1> apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: metabase namespace: metabase spec: minAvailable: 1 selector: matchLabels: app: metabase
This prevents kubectl drain from evicting the Metabase pod if it would leave zero instances running.
---
Horizontal Pod Autoscaler ā Don't Use It
HPA is not appropriate for Metabase. Metabase is not designed to run multiple replicas concurrently. If your load exceeds what a single pod can handle, scale vertically (increase CPU and memory limits) rather than horizontally.
---
Using the Official Helm Chart
Metabase provides an official Helm chart that encapsulates most of the above configuration:
bash
helm repo add metabase https://www.metabase.com/helm-chart helm repo update
helm install metabase metabase/metabase \ --namespace metabase \ --create-namespace \ --set database.type=postgres \ --set database.host=your-postgres-host \ --set database.port=5432 \ --set database.dbname=metabase \ --set database.username=metabase_app \ --set database.password=your-password \ --set siteUrl=https://analytics.yourapp.com
Or with a values.yaml:
yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">values.yaml</h1> replicaCount: 1
image: repository: metabase/metabase tag: v0.50.0 pullPolicy: IfNotPresent
database: type: postgres host: your-postgres-host.rds.amazonaws.com port: 5432 dbname: metabase username: metabase_app # password via existing secret: existingSecret: metabase-secrets existingSecretPasswordKey: MB_DB_PASS
siteUrl: https://analytics.yourapp.com
resources: requests: memory: "1.5Gi" cpu: "500m" limits: memory: "3Gi" cpu: "2000m"
ingress: enabled: true className: nginx hosts: - host: analytics.yourapp.com paths: - path: / pathType: Prefix tls: - secretName: metabase-tls hosts: - analytics.yourapp.com
bash
helm upgrade --install metabase metabase/metabase \ --namespace metabase \ --create-namespace \ --values values.yaml
---
Upgrading Metabase
bash
<h1 class="text-4xl font-bold mb-6 text-slate-900">Update the image tag in values.yaml, then:</h1> helm upgrade metabase metabase/metabase \ --namespace metabase \ --values values.yaml
<h1 class="text-4xl font-bold mb-6 text-slate-900">Or for raw manifests, update the image tag in deployment.yaml and apply:</h1> kubectl apply -f deployment.yaml
Kubernetes performs a rolling deployment: the new pod starts and passes the startup probe before the old pod is terminated. With maxUnavailable: 0, there's no downtime.
Monitor the rollout:
bash
kubectl rollout status deployment/metabase -n metabase
Rollback if needed:
bash
kubectl rollout undo deployment/metabase -n metabase
---
Troubleshooting
Pod is stuck in Pending Usually a resource scheduling issue ā the node doesn't have enough CPU or memory. Check: kubectl describe pod .
Pod crashes immediately (CrashLoopBackOff) Check logs: kubectl logs . Common causes:
- Can't connect to the application database (wrong host, credentials, or network policy)
Pod starts but readiness probe fails indefinitely Metabase started but /api/health isn't returning 200. This often means the database migration is still running or failed. Check logs for migration errors.
Queries time out at exactly 60 seconds The Ingress proxy timeout is too short. Increase proxy-read-timeout to 300 seconds in the Ingress annotations.
OOMKilled The JVM heap exceeded the container memory limit. Increase the memory limit and adjust JAVA_OPTS: -Xmx2g should be comfortably below the container memory limit.
---
Summary
Running Metabase on Kubernetes uses a single-replica Deployment (Metabase doesn't support active-active horizontal scaling), a ClusterIP Service, an Ingress with generous proxy timeouts, and secrets injected from Kubernetes Secrets or an external secrets manager. The most critical configuration details are the startup probe (Metabase takes 90+ seconds to start), maxUnavailable: 0 for zero-downtime rolling updates, and memory limits set with enough headroom above the JVM heap size. The official Helm chart handles most of this correctly and is the recommended approach for teams already using Helm.