Performance & Scale

Optimizing Metabase Query Performance for Large Datasets

Query performance in Metabase is almost entirely determined by the performance of your underlying database — Metabase is a query interface, not a quer...

📅
📖9 min read

Optimizing Metabase Query Performance for Large Datasets

Query performance in Metabase is almost entirely determined by the performance of your underlying database — Metabase is a query interface, not a query engine. When a dashboard loads slowly, the bottleneck is almost never Metabase itself; it's the SQL being executed against your database. Optimizing Metabase query performance means optimizing the queries Metabase generates and the database structures it queries against, combined with strategic use of Metabase's caching layer to reduce how often expensive queries need to run.

---

Diagnosing Slow Queries

Before optimizing, identify what's actually slow.

Step 1: Find the Slow Questions

In Admin → Troubleshooting → Query log, Metabase shows recent queries with execution times. Look for:

  • Queries taking more than 5 seconds
  • Queries running frequently that could be cached
  • Queries that fan out to many database calls
  • Step 2: Inspect the Generated SQL

    For query builder questions, view the SQL Metabase generates:

  • Open the question
  • Click the database icon in the bottom left (or View the SQL)
  • Copy the SQL and run it directly in your database client with EXPLAIN ANALYZE
  • sql
    

    EXPLAIN ANALYZE SELECT date_trunc('week', created_at) AS week, COUNT(*) AS order_count, SUM(amount) AS revenue FROM orders WHERE created_at > NOW() - INTERVAL '1 year' AND organization_id = 99 GROUP BY 1 ORDER BY 1;

    The EXPLAIN ANALYZE output shows:

  • Which indexes are being used (or not used)
  • Row estimates vs. actual rows
  • Which parts of the query are most expensive
  • Sequential scans on large tables (a common culprit)
  • Step 3: Check Database-Level Slow Query Logs

    Enable slow query logging in your database to catch expensive queries regardless of whether they originate from Metabase:

    sql
    

    -- PostgreSQL: log queries taking more than 1 second ALTER SYSTEM SET log_min_duration_statement = 1000; SELECT pg_reload_conf();

    For RDS, enable Performance Insights — it surfaces the top SQL statements by CPU time and provides execution plan analysis without manual EXPLAIN calls.

    ---

    Indexing for Metabase Query Patterns

    Metabase's query builder generates SQL with predictable patterns. Most Metabase queries involve:

  • Filtering (WHERE) on date columns and foreign key columns
  • Grouping (GROUP BY) on dimension columns
  • Aggregating (COUNT, SUM, AVG) on metric columns
  • Joining to related tables via foreign keys
  • Index all columns that appear in these roles.

    Essential Indexes for Metabase Workloads

    sql
    

    -- Date column used in time-series filtering and breakout CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at DESC);

    -- Tenant identifier used in sandbox filters and dashboard parameters CREATE INDEX CONCURRENTLY idx_orders_org_id ON orders(organization_id);

    -- Compound index for the most common query pattern: -- filter by org + date, group by date CREATE INDEX CONCURRENTLY idx_orders_org_date ON orders(organization_id, created_at DESC);

    -- Foreign key for JOIN performance CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

    -- Status column used in WHERE filters CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status) WHERE status != 'completed'; -- partial index if most rows are 'completed'

    Composite Index Strategy

    Metabase sandbox queries add a tenant filter (WHERE organization_id = X) to every query. A composite index on (organization_id, date_column) dramatically outperforms two separate indexes for the common case of "filter by org, group by date":

    sql
    

    -- Without composite index: two index scans + merge -- With composite index: single index scan CREATE INDEX CONCURRENTLY idx_orders_org_created ON orders(organization_id, created_at);

    -- For a dashboard showing revenue by product within an org: CREATE INDEX CONCURRENTLY idx_orders_org_product ON orders(organization_id, product_id, created_at);

    BRIN Indexes for Time-Series Tables

    For very large append-only time-series tables where data is inserted in chronological order, BRIN (Block Range INdex) indexes are much smaller than B-tree and nearly as effective for date range queries:

    sql
    

    -- BRIN index for a 100M+ row events table CREATE INDEX CONCURRENTLY idx_events_created_brin ON events USING brin(created_at) WITH (pages_per_range = 128);

    ---

    Optimizing Metabase-Generated SQL

    Avoid Querying Unsummarized Tables

    Metabase's query builder generates SELECT * when users browse a table. For large tables, this is expensive. Limit what users can browse:

  • Set table permissions to restrict which groups can browse which tables
  • Create views that pre-aggregate or pre-filter data, and point Metabase at the views instead of base tables
  • sql
    

    -- A view Metabase can query efficiently instead of the raw events table CREATE VIEW daily_events_summary AS SELECT organization_id, date_trunc('day', created_at) AS event_date, event_type, COUNT(*) AS event_count FROM events GROUP BY 1, 2, 3;

    -- Querying this view returns pre-aggregated data — much faster

    Use Native SQL for Complex Questions

    The query builder is great for simple aggregations but generates suboptimal SQL for complex multi-step logic. For questions that are consistently slow in the query builder, rewrite them as native SQL questions with optimized query structure:

    sql
    

    -- Query builder might generate multiple subqueries -- Native SQL lets you write optimized CTEs WITH monthly_revenue AS ( SELECT organization_id, date_trunc('month', created_at) AS month, SUM(amount) AS revenue FROM orders WHERE created_at >= NOW() - INTERVAL '12 months' GROUP BY 1, 2 ), previous_month AS ( SELECT organization_id, month, revenue, LAG(revenue) OVER (PARTITION BY organization_id ORDER BY month) AS prev_revenue FROM monthly_revenue ) SELECT organization_id, month, revenue, ROUND(((revenue - prev_revenue) / NULLIF(prev_revenue, 0)) * 100, 1) AS growth_pct FROM previous_month ORDER BY organization_id, month;

    Limit Default Row Counts

    Metabase's query builder adds a default LIMIT 2000 to raw table queries. For aggregated questions this doesn't apply, but ensure your aggregated questions don't return more rows than needed — large result sets increase network transfer time and browser rendering time.

    ---

    Pre-Aggregation with dbt or Materialized Views

    For dashboards that run expensive aggregations over large datasets, pre-aggregate the data so Metabase queries fast summary tables rather than raw data.

    Materialized Views (PostgreSQL)

    sql
    

    -- Create a materialized view of weekly revenue by org CREATE MATERIALIZED VIEW weekly_revenue_by_org AS SELECT organization_id, date_trunc('week', created_at) AS week, COUNT(*) AS order_count, SUM(amount) AS revenue, COUNT(DISTINCT customer_id) AS unique_customers FROM orders GROUP BY 1, 2 WITH DATA;

    -- Index the materialized view CREATE INDEX ON weekly_revenue_by_org(organization_id, week DESC);

    -- Refresh on a schedule (e.g., hourly via pg_cron) REFRESH MATERIALIZED VIEW CONCURRENTLY weekly_revenue_by_org;

    Metabase queries weekly_revenue_by_org instead of orders — a 100M row table becomes a 50K row summary.

    dbt Models for Pre-Aggregation

    If your stack includes dbt, define aggregation models that Metabase queries:

    sql
    

    -- models/marts/analytics/weekly_revenue.sql {{ config( materialized='incremental', unique_key='week_org_key', on_schema_change='append_new_columns' ) }}

    SELECT organization_id, date_trunc('week', created_at)::date AS week, MD5(organization_id::text || date_trunc('week', created_at)::text) AS week_org_key, COUNT(*) AS order_count, SUM(amount) AS revenue, COUNT(DISTINCT customer_id) AS unique_customers FROM {{ ref('stg_orders') }} {% if is_incremental() %} WHERE created_at >= (SELECT MAX(week) FROM {{ this }}) {% endif %} GROUP BY 1, 2, 3

    Point Metabase at the weekly_revenue model instead of stg_orders. When dbt runs, it refreshes the model; Metabase always queries fresh pre-aggregated data.

    ---

    Data Warehouse-Specific Optimizations

    Snowflake

    sql
    

    -- Use clustering keys for columns commonly used in Metabase filters ALTER TABLE orders CLUSTER BY (organization_id, DATE_TRUNC('day', created_at));

    -- Use appropriate warehouse size — scale up for heavy dashboards, scale down after -- Metabase queries can be routed to a dedicated warehouse via connection settings

    -- Enable result caching (on by default in Snowflake) -- Snowflake caches query results for 24 hours; identical queries reuse results

    Warehouse sizing: A single Metabase dashboard may fire 5–15 queries simultaneously when it loads. Size the Snowflake warehouse to handle the peak concurrent query load, not just individual query cost.

    BigQuery

    sql
    

    -- Partition tables by date column to reduce bytes scanned per query CREATE TABLE project.dataset.orders PARTITION BY DATE(created_at) CLUSTER BY organization_id AS SELECT * FROM source_orders;

    -- Metabase date filters translate to partition pruning -- WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31' -- scans only the relevant partitions, not the full table

    Cost control: BigQuery charges per byte scanned. Partitioning and clustering dramatically reduce scan costs for Metabase's date-filtered queries. Enable Require partition filter on large tables to prevent accidental full-table scans.

    Redshift

    sql
    

    -- Define sort keys for Metabase query patterns CREATE TABLE orders ( id BIGINT, organization_id BIGINT, created_at TIMESTAMP, amount DECIMAL(10,2) ) SORTKEY (organization_id, created_at) DISTKEY (organization_id); -- distribute by tenant for co-location

    -- Run VACUUM and ANALYZE after bulk loads VACUUM orders; ANALYZE orders;

    ---

    Read Replicas

    Direct Metabase to a read replica of your production database to isolate analytical query load from your application's write traffic:

    bash
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Point Metabase at the read replica endpoint, not the primary</h1> MB_DB_HOST=your-db-read-replica.rds.amazonaws.com

    Or in Admin → Databases, add the read replica as a separate database connection. Read replicas typically lag the primary by milliseconds to seconds — acceptable for most analytics use cases.

    ---

    Connection Pooling

    Metabase maintains a connection pool to each connected database. For databases with strict connection limits (especially if many Metabase users run queries concurrently), consider adding PgBouncer between Metabase and PostgreSQL:

    yaml
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">docker-compose addition: PgBouncer for connection pooling</h1> pgbouncer: image: pgbouncer/pgbouncer:latest environment: DATABASE_URL: "postgres://metabase_reader:password@postgres:5432/analytics" POOL_MODE: transaction MAX_CLIENT_CONN: 100 DEFAULT_POOL_SIZE: 20

    Point Metabase at PgBouncer's port rather than PostgreSQL directly. This is most valuable when Metabase is embedded in a product with many concurrent users.

    ---

    Query Timeout Configuration

    Long-running queries block the connection pool and degrade performance for other users. Set a query timeout in Metabase:

    Admin → Settings → General → Query timeout

    A 5-minute (300 second) timeout is a reasonable default. Users running queries that exceed this limit receive an error message and should optimize their query or use caching.

    ---

    Summary

    Metabase query performance is primarily a database performance problem. The optimization sequence is: diagnose with EXPLAIN ANALYZE and slow query logs, add indexes on columns used in Metabase filters and groupings (especially compound indexes combining tenant ID and date), pre-aggregate large tables using materialized views or dbt models, use Metabase's caching layer to serve frequently-accessed dashboards from cache, and direct Metabase to a read replica to isolate analytical load. The caching guide in this series covers Metabase's caching configuration in depth.