Performance & Scale

Connecting Metabase to a Data Warehouse (Snowflake, BigQuery, Redshift)

Connecting Metabase to a data warehouse — Snowflake, BigQuery, or Amazon Redshift — is the recommended architecture for analytics at scale. Operationa...

📅
📖9 min read

Connecting Metabase to a Data Warehouse (Snowflake, BigQuery, Redshift)

Connecting Metabase to a data warehouse — Snowflake, BigQuery, or Amazon Redshift — is the recommended architecture for analytics at scale. Operational databases (PostgreSQL, MySQL) are optimized for transactional workloads: fast point lookups, concurrent writes, and small result sets. Data warehouses are optimized for analytical workloads: full-table scans, complex aggregations, and large result sets across billions of rows. Using Metabase with a warehouse instead of directly against your production database separates analytical load from application load and unlocks query performance that operational databases can't match.

---

When to Add a Data Warehouse

Teams typically reach for a data warehouse when:

  • Analytical queries are slowing down the production database
  • You need to query across multiple data sources in a single dashboard
  • Query history or time-series analysis requires data older than what's practical to keep in operational databases
  • You're running dbt transformations and want Metabase to query transformed models
  • You need sub-second aggregation over 100M+ rows
  • A data warehouse is not required for Metabase to function — many teams run Metabase directly against PostgreSQL or MySQL. Add a warehouse when the operational database becomes the bottleneck.

    ---

    Snowflake

    Connection Setup

    In Admin → Databases → Add Database → Snowflake:

    FieldValueNotes
    Account nameyourorg.snowflakecomputing.comFound in your Snowflake URL
    UsernameMETABASE_USERDedicated read-only user
    PasswordYour passwordOr use key pair auth
    DatabaseANALYTICSThe Snowflake database to connect
    WarehouseMETABASE_WHThe virtual warehouse for queries
    SchemaPUBLICOptional: restrict to one schema
    RoleMETABASE_ROLEThe role granted to the user

    Creating a Dedicated Snowflake User and Role

    sql
    

    -- Create a dedicated role for Metabase CREATE ROLE METABASE_ROLE;

    -- Create a dedicated warehouse (size based on query complexity) CREATE WAREHOUSE METABASE_WH WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 60 -- suspend after 60s of inactivity AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE;

    -- Grant warehouse usage GRANT USAGE ON WAREHOUSE METABASE_WH TO ROLE METABASE_ROLE;

    -- Grant database and schema access GRANT USAGE ON DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT USAGE ON ALL SCHEMAS IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT SELECT ON ALL TABLES IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT SELECT ON ALL VIEWS IN DATABASE ANALYTICS TO ROLE METABASE_ROLE;

    -- Grant access to future tables and views (critical for dbt-created models) GRANT SELECT ON FUTURE TABLES IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT SELECT ON FUTURE VIEWS IN DATABASE ANALYTICS TO ROLE METABASE_ROLE;

    -- Create the user CREATE USER METABASE_USER PASSWORD = 'a-strong-password' DEFAULT_ROLE = METABASE_ROLE DEFAULT_WAREHOUSE = METABASE_WH;

    GRANT ROLE METABASE_ROLE TO USER METABASE_USER;

    Snowflake Performance Considerations

    Warehouse sizing: Snowflake warehouses scale in T-shirt sizes (X-Small, Small, Medium, Large...). Query execution time scales roughly proportionally to warehouse size — a Medium runs queries about 2x faster than a Small, at 2x the credit cost.

    For Metabase dashboards, a Small or Medium warehouse is usually sufficient. Dashboard queries are typically aggregations that benefit from more compute, but a single concurrent user rarely needs more than a Small.

    Concurrent usersRecommended warehouse size
    < 10X-Small
    10–50Small
    50–200Medium
    200+Large or multiple warehouses
    Auto-suspend and auto-resume: Set AUTO_SUSPEND = 60 (60 seconds) to avoid paying for idle warehouse time. AUTO_RESUME = TRUE means the warehouse starts automatically when Metabase runs a query. The cold start adds ~2–3 seconds to the first query after suspension — acceptable for most dashboard use cases.

    Result cache: Snowflake caches query results for 24 hours per virtual warehouse. Identical queries return cached results without consuming credits. This complements Metabase's own caching — the two layers together mean most dashboard loads require no compute at all.

    Clustering keys: For frequently-filtered large tables, define clustering keys on columns used in Metabase filters:

    sql
    

    ALTER TABLE orders CLUSTER BY (organization_id, DATE_TRUNC('day', created_at));

    ---

    Google BigQuery

    Connection Setup

    BigQuery uses service account authentication. In Admin → Databases → Add Database → BigQuery:

    FieldValueNotes
    Project IDyour-gcp-project-idGCP project containing the dataset
    Dataset filtersanalytics,reportingOptional: limit which datasets Metabase syncs
    Service account JSONUpload the JSON key fileFrom GCP IAM

    Creating a Service Account

    bash
    

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Create service account</h1> gcloud iam service-accounts create metabase-reader \ --display-name="Metabase Reader" \ --project=your-gcp-project

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Grant BigQuery read access</h1> gcloud projects add-iam-policy-binding your-gcp-project \ --member="serviceAccount:metabase-reader@your-gcp-project.iam.gserviceaccount.com" \ --role="roles/bigquery.dataViewer"

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Grant BigQuery job creation (required to run queries)</h1> gcloud projects add-iam-policy-binding your-gcp-project \ --member="serviceAccount:metabase-reader@your-gcp-project.iam.gserviceaccount.com" \ --role="roles/bigquery.jobUser"

    <h1 class="text-4xl font-bold mb-6 text-slate-900">Create and download the JSON key</h1> gcloud iam service-accounts keys create metabase-key.json \ --iam-account=metabase-reader@your-gcp-project.iam.gserviceaccount.com

    BigQuery Performance Considerations

    Partitioned tables: BigQuery charges per byte scanned. Partition your tables on the date column that Metabase filters on most frequently. Metabase's date filters translate to partition pruning — only the relevant partitions are scanned:

    sql
    

    CREATE TABLE project.analytics.orders PARTITION BY DATE(created_at) CLUSTER BY organization_id, status AS SELECT * FROM source_orders;

    A date-partitioned, clustered table can reduce bytes scanned by 90%+ for typical Metabase queries.

    Slot reservations: By default, BigQuery uses on-demand pricing (per byte scanned). For teams running many concurrent Metabase queries, flat-rate slot reservations can be more cost-effective and provide consistent performance:

    On-demand: $5/TB scanned — unpredictable cost, unlimited concurrency
    

    Slots: $10,000/month for 500 slots — predictable cost, limited by slot count

    Evaluate based on your query volume and cost tolerance.

    Dataset filters in connection: Use the Dataset filters option in Metabase to limit which datasets Metabase syncs. BigQuery projects can contain many datasets; syncing all of them adds unnecessary schema sync overhead and exposes datasets that aren't intended for analytics.

    BigQuery Cost Controls

    sql
    

    -- Set a per-query byte limit to prevent runaway scans -- In BigQuery: set maximum bytes billed per query job -- Via Metabase: not directly configurable, but you can enforce via IAM conditions

    -- Create a custom role with byte billing limits -- gcloud iam roles create MetabaseReader with limited quota

    For SaaS products where embedded users run ad-hoc queries against BigQuery, set a maximum bytes billed to prevent individual queries from becoming expensive:

    javascript
    

    // If using BigQuery directly via API, set bytesBudgetExceeded // For Metabase, use the Admin Query Timeout as a proxy control

    ---

    Amazon Redshift

    Connection Setup

    In Admin → Databases → Add Database → Redshift:

    FieldValueNotes
    Hostyour-cluster.region.redshift.amazonaws.comCluster endpoint
    Port5439Default Redshift port
    DatabaseanalyticsDatabase name
    Usernamemetabase_readerDedicated read-only user
    PasswordYour password
    Redshift speaks PostgreSQL protocol — if you don't see a Redshift option, a PostgreSQL connection often works, though the native Redshift driver handles Redshift-specific features better.

    Creating a Read-Only User

    sql
    

    -- Create group for Metabase access CREATE GROUP metabase_access;

    -- Create the user CREATE USER metabase_reader WITH PASSWORD 'strong-password'; ALTER GROUP metabase_access ADD USER metabase_reader;

    -- Grant schema access GRANT USAGE ON SCHEMA analytics TO GROUP metabase_access; GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO GROUP metabase_access; ALTER DEFAULT PRIVILEGES IN SCHEMA analytics GRANT SELECT ON TABLES TO GROUP metabase_access;

    Redshift Performance Considerations

    Distribution and sort keys: Redshift performance depends heavily on distribution and sort key design. For Metabase workloads:

    sql
    

    -- Distribute fact tables by the most common join key -- (co-locates related rows on the same slice for fast joins) CREATE TABLE orders ( organization_id BIGINT, created_at TIMESTAMP, amount DECIMAL(10,2), ... ) DISTKEY(organization_id) SORTKEY(created_at); -- or compound: SORTKEY(organization_id, created_at)

    Redshift Spectrum: For historical data that's too large and too infrequently accessed to keep in Redshift's managed storage, use Redshift Spectrum to query S3 directly:

    sql
    

    -- Create external schema pointing to S3 CREATE EXTERNAL SCHEMA spectrum FROM DATA CATALOG DATABASE 'analytics_archive' IAM_ROLE 'arn:aws:iam::account-id:role/RedshiftSpectrumRole' CREATE EXTERNAL DATABASE IF NOT EXISTS;

    -- Query S3 data from Metabase as if it were a normal table SELECT * FROM spectrum.orders_archive WHERE year = 2022;

    Concurrency scaling: Redshift Concurrency Scaling automatically adds capacity when query queues build up. Enable it for Metabase workloads with variable concurrency:

    sql
    

    -- Enable concurrency scaling for Metabase's user group ALTER USER metabase_reader WITH QUERY_GROUP 'analytics'; -- Configure Concurrency Scaling in the Redshift parameter group

    ---

    Metabase Data Model Features for Warehouses

    When connecting to a warehouse, take advantage of Metabase's data model features to make the warehouse more navigable for non-technical users:

    Field Renaming

    Warehouse column names are often abbreviated or technical. Rename them in Metabase:

    Admin → Data Model → [Database] → [Table] → [Field]:

  • org_id → "Organization ID"
  • gmv_usd → "Gross Revenue (USD)"
  • cnt_mau → "Monthly Active Users"
  • Field Visibility

    Hide technical or internal columns from the query builder:

  • _dbt_source_relation → Hidden
  • _fivetran_synced → Hidden
  • surrogate_key → Hidden
  • Metrics

    Define reusable metrics in the data model so users can select them from the query builder without knowing the aggregation formula:

  • "Revenue" = SUM(amount_usd)
  • "AOV" = SUM(amount_usd) / COUNT(DISTINCT order_id)
  • "Retention Rate" = (custom SQL metric)
  • Table Relationships

    Define foreign key relationships between tables so Metabase's query builder can automatically suggest JOINs:

  • orders.customer_idcustomers.id
  • orders.product_idproducts.id
  • ---

    Connecting Multiple Warehouses

    Metabase can connect to multiple databases and warehouses simultaneously. A common pattern:

  • Operational database (PostgreSQL): real-time data, used for live-updating dashboards
  • Data warehouse (Snowflake): historical analysis, used for trend dashboards and long-running queries
  • Specialized warehouse (BigQuery): ML feature store or event data
  • Users choose which database to query when creating a question. Dashboards can mix questions from multiple data sources.

    ---

    Summary

    Connecting Metabase to a data warehouse separates analytical query load from your operational database, unlocking performance for large-scale analytics. Snowflake, BigQuery, and Redshift each require a dedicated read-only service account or user; follow the principle of least privilege, granting only SELECT on the specific databases, schemas, and tables Metabase needs. Key performance decisions are warehouse sizing (Snowflake), partitioning and clustering (BigQuery), and distribution/sort keys (Redshift). After connecting, configure Metabase's data model with human-readable field names, hide technical columns, and define reusable metrics to make the warehouse accessible to non-technical users without requiring them to write SQL.