Deployment & Infrastructure

Running Metabase on Kubernetes

Running Metabase on Kubernetes means deploying it as a Deployment with a single replica, backed by an external PostgreSQL database, exposed through a...

šŸ“…
šŸ“–7 min read

Running Metabase on Kubernetes

Running Metabase on Kubernetes means deploying it as a Deployment with a single replica, backed by an external PostgreSQL database, exposed through a Service and Ingress, with configuration managed via ConfigMaps and Secrets. Metabase is a stateful application (its state lives in the application database), so Kubernetes orchestration focuses on reliability, zero-downtime upgrades, and clean secret management rather than horizontal autoscaling.

This guide covers a production Kubernetes manifest set, Helm chart usage, common configuration mistakes, and the specific Metabase behaviors that affect how you configure liveness and readiness probes.

---

Architecture Overview

Ingress (HTTPS)

│ ā–¼ Service (ClusterIP :3000) │ ā–¼ Deployment (1 replica) ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ metabase container │ │ image: metabase/ │ │ metabase:v0.50.0 │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā–¼ ā–¼ External PostgreSQL Your Data Sources (RDS, Cloud SQL, (via network policy or in-cluster PG) or VPN)

Why Single Replica

Metabase is not designed for active-active horizontal scaling. Multiple replicas sharing the same application database will cause conflicts — both instances attempt to run scheduled jobs, cache warming, and schema syncs simultaneously. Run one replica in production. For high availability, use Kubernetes rolling updates and a health-check-based readiness probe to ensure zero-downtime deployments.

---

Namespace and RBAC

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">namespace.yaml</h1> apiVersion: v1 kind: Namespace metadata: name: metabase labels: app.kubernetes.io/name: metabase

---

Secrets

Never store sensitive values in ConfigMaps. Use Kubernetes Secrets (or an external secrets manager like Vault, AWS Secrets Manager via External Secrets Operator, or Sealed Secrets):

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">secret.yaml</h1> apiVersion: v1 kind: Secret metadata: name: metabase-secrets namespace: metabase type: Opaque stringData: MB_DB_PASS: "your-db-password-here" MB_EMBEDDING_SECRET_KEY: "your-64-char-embedding-secret-here"

For production, generate the secret from your secrets manager rather than committing it to git:

bash

<h1 class="text-4xl font-bold mb-6 text-slate-900">Using kubectl with values from environment</h1> kubectl create secret generic metabase-secrets \ --namespace metabase \ --from-literal=MB_DB_PASS="$DB_PASSWORD" \ --from-literal=MB_EMBEDDING_SECRET_KEY="$EMBEDDING_SECRET"

Or using External Secrets Operator with AWS Secrets Manager:

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">external-secret.yaml</h1> apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: metabase-secrets namespace: metabase spec: refreshInterval: 1h secretStoreRef: name: aws-secrets-manager kind: ClusterSecretStore target: name: metabase-secrets data: - secretKey: MB_DB_PASS remoteRef: key: metabase/config property: db_password - secretKey: MB_EMBEDDING_SECRET_KEY remoteRef: key: metabase/config property: embedding_secret

---

ConfigMap

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">configmap.yaml</h1> apiVersion: v1 kind: ConfigMap metadata: name: metabase-config namespace: metabase data: MB_DB_TYPE: "postgres" MB_DB_HOST: "your-postgres-host.rds.amazonaws.com" MB_DB_PORT: "5432" MB_DB_DBNAME: "metabase" MB_DB_USER: "metabase_app" MB_SITE_URL: "https://analytics.yourapp.com" MB_SITE_NAME: "Analytics" MB_ANON_TRACKING_ENABLED: "false" JAVA_OPTS: "-Xmx2g -Xms512m" MB_JETTY_PORT: "3000"

---

Deployment

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">deployment.yaml</h1> apiVersion: apps/v1 kind: Deployment metadata: name: metabase namespace: metabase labels: app: metabase spec: replicas: 1 selector: matchLabels: app: metabase strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 # never take down the old pod before new one is ready maxSurge: 1 # allow one extra pod during rollout template: metadata: labels: app: metabase spec: terminationGracePeriodSeconds: 60

containers: - name: metabase image: metabase/metabase:v0.50.0 # always pin to a specific version imagePullPolicy: IfNotPresent ports: - containerPort: 3000 name: http

envFrom: - configMapRef: name: metabase-config

env: - name: MB_DB_PASS valueFrom: secretKeyRef: name: metabase-secrets key: MB_DB_PASS - name: MB_EMBEDDING_SECRET_KEY valueFrom: secretKeyRef: name: metabase-secrets key: MB_EMBEDDING_SECRET_KEY

resources: requests: memory: "1.5Gi" cpu: "500m" limits: memory: "3Gi" cpu: "2000m"

# Startup probe: allows up to 10 minutes for first startup # (JVM init + DB migrations can take 90+ seconds) startupProbe: httpGet: path: /api/health port: 3000 failureThreshold: 30 # 30 Ɨ 20s = 10 minutes periodSeconds: 20

# Readiness probe: only send traffic when Metabase is fully ready readinessProbe: httpGet: path: /api/health port: 3000 initialDelaySeconds: 10 periodSeconds: 10 failureThreshold: 3 successThreshold: 1

# Liveness probe: restart the container if Metabase stops responding livenessProbe: httpGet: path: /api/health port: 3000 initialDelaySeconds: 120 periodSeconds: 30 failureThreshold: 5 timeoutSeconds: 10

Probe Configuration Explained

Metabase has an unusually long startup time compared to most web services, which makes probe configuration critical:

startupProbe — Kubernetes will not run liveness or readiness probes until the startup probe succeeds. Setting failureThreshold: 30 with periodSeconds: 20 gives Metabase up to 10 minutes to start (covering first-start migrations and JVM initialization) before Kubernetes declares it failed.

readinessProbe — Controls when the pod receives traffic from the Service. Set initialDelaySeconds: 10 after the startup probe succeeds (not from container start). A failing readiness probe removes the pod from the load balancer without restarting it.

livenessProbe — Restarts the container if it becomes unresponsive. Set initialDelaySeconds conservatively (120s) to prevent restart loops during legitimate slow queries.

---

Service

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">service.yaml</h1> apiVersion: v1 kind: Service metadata: name: metabase namespace: metabase spec: selector: app: metabase ports: - port: 80 targetPort: 3000 name: http type: ClusterIP

---

Ingress

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">ingress.yaml</h1> apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: metabase namespace: metabase annotations: # nginx-ingress nginx.ingress.kubernetes.io/proxy-connect-timeout: "300" nginx.ingress.kubernetes.io/proxy-send-timeout: "300" nginx.ingress.kubernetes.io/proxy-read-timeout: "300" # cert-manager for automatic TLS cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: ingressClassName: nginx tls: - hosts: - analytics.yourapp.com secretName: metabase-tls rules: - host: analytics.yourapp.com http: paths: - path: / pathType: Prefix backend: service: name: metabase port: name: http

Timeout Configuration

Metabase queries can be long-running. Set proxy timeouts generously (300 seconds) to prevent the Ingress from cutting off legitimate queries. Without this, users running complex queries will see unexpected "connection reset" errors.

---

PodDisruptionBudget

Ensure at least one Metabase instance is always running during cluster maintenance:

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">pdb.yaml</h1> apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: metabase namespace: metabase spec: minAvailable: 1 selector: matchLabels: app: metabase

This prevents kubectl drain from evicting the Metabase pod if it would leave zero instances running.

---

Horizontal Pod Autoscaler — Don't Use It

HPA is not appropriate for Metabase. Metabase is not designed to run multiple replicas concurrently. If your load exceeds what a single pod can handle, scale vertically (increase CPU and memory limits) rather than horizontally.

---

Using the Official Helm Chart

Metabase provides an official Helm chart that encapsulates most of the above configuration:

bash

helm repo add metabase https://www.metabase.com/helm-chart helm repo update

helm install metabase metabase/metabase \ --namespace metabase \ --create-namespace \ --set database.type=postgres \ --set database.host=your-postgres-host \ --set database.port=5432 \ --set database.dbname=metabase \ --set database.username=metabase_app \ --set database.password=your-password \ --set siteUrl=https://analytics.yourapp.com

Or with a values.yaml:

yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">values.yaml</h1> replicaCount: 1

image: repository: metabase/metabase tag: v0.50.0 pullPolicy: IfNotPresent

database: type: postgres host: your-postgres-host.rds.amazonaws.com port: 5432 dbname: metabase username: metabase_app # password via existing secret: existingSecret: metabase-secrets existingSecretPasswordKey: MB_DB_PASS

siteUrl: https://analytics.yourapp.com

resources: requests: memory: "1.5Gi" cpu: "500m" limits: memory: "3Gi" cpu: "2000m"

ingress: enabled: true className: nginx hosts: - host: analytics.yourapp.com paths: - path: / pathType: Prefix tls: - secretName: metabase-tls hosts: - analytics.yourapp.com

bash

helm upgrade --install metabase metabase/metabase \ --namespace metabase \ --create-namespace \ --values values.yaml

---

Upgrading Metabase

bash

<h1 class="text-4xl font-bold mb-6 text-slate-900">Update the image tag in values.yaml, then:</h1> helm upgrade metabase metabase/metabase \ --namespace metabase \ --values values.yaml

<h1 class="text-4xl font-bold mb-6 text-slate-900">Or for raw manifests, update the image tag in deployment.yaml and apply:</h1> kubectl apply -f deployment.yaml

Kubernetes performs a rolling deployment: the new pod starts and passes the startup probe before the old pod is terminated. With maxUnavailable: 0, there's no downtime.

Monitor the rollout:

bash

kubectl rollout status deployment/metabase -n metabase

Rollback if needed:

bash

kubectl rollout undo deployment/metabase -n metabase

---

Troubleshooting

Pod is stuck in Pending Usually a resource scheduling issue — the node doesn't have enough CPU or memory. Check: kubectl describe pod -n metabase.

Pod crashes immediately (CrashLoopBackOff) Check logs: kubectl logs -n metabase. Common causes:

  • Can't connect to the application database (wrong host, credentials, or network policy)
  • Insufficient memory (OOMKilled)
  • Pod starts but readiness probe fails indefinitely Metabase started but /api/health isn't returning 200. This often means the database migration is still running or failed. Check logs for migration errors.

    Queries time out at exactly 60 seconds The Ingress proxy timeout is too short. Increase proxy-read-timeout to 300 seconds in the Ingress annotations.

    OOMKilled The JVM heap exceeded the container memory limit. Increase the memory limit and adjust JAVA_OPTS: -Xmx2g should be comfortably below the container memory limit.

    ---

    Summary

    Running Metabase on Kubernetes uses a single-replica Deployment (Metabase doesn't support active-active horizontal scaling), a ClusterIP Service, an Ingress with generous proxy timeouts, and secrets injected from Kubernetes Secrets or an external secrets manager. The most critical configuration details are the startup probe (Metabase takes 90+ seconds to start), maxUnavailable: 0 for zero-downtime rolling updates, and memory limits set with enough headroom above the JVM heap size. The official Helm chart handles most of this correctly and is the recommended approach for teams already using Helm.