Connecting Metabase to a Data Warehouse (Snowflake, BigQuery, Redshift)

Connecting Metabase to a data warehouse — Snowflake, BigQuery, or Amazon Redshift — is the recommended architecture for analytics at scale. Operational databases (PostgreSQL, MySQL) are optimized for transactional workloads: fast point lookups, concurrent writes, and small result sets. Data warehouses are optimized for analytical workloads: full-table scans, complex aggregations, and large result sets across billions of rows. Using Metabase with a warehouse instead of directly against your production database separates analytical load from application load and unlocks query performance that operational databases can't match.

---

When to Add a Data Warehouse

Teams typically reach for a data warehouse when:

Analytical queries are slowing down the production database

You need to query across multiple data sources in a single dashboard

Query history or time-series analysis requires data older than what's practical to keep in operational databases

You're running dbt transformations and want Metabase to query transformed models

You need sub-second aggregation over 100M+ rows

A data warehouse is not required for Metabase to function — many teams run Metabase directly against PostgreSQL or MySQL. Add a warehouse when the operational database becomes the bottleneck.

---

Snowflake

Connection Setup

In Admin → Databases → Add Database → Snowflake:

Field	Value	Notes
Account name	`yourorg.snowflakecomputing.com`	Found in your Snowflake URL
Username	`METABASE_USER`	Dedicated read-only user
Password	Your password	Or use key pair auth
Database	`ANALYTICS`	The Snowflake database to connect
Warehouse	`METABASE_WH`	The virtual warehouse for queries
Schema	`PUBLIC`	Optional: restrict to one schema
Role	`METABASE_ROLE`	The role granted to the user

Creating a Dedicated Snowflake User and Role

sql
-- Create a dedicated role for Metabase CREATE ROLE METABASE_ROLE; 
-- Create a dedicated warehouse (size based on query complexity) CREATE WAREHOUSE METABASE_WH WITH WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 60        -- suspend after 60s of inactivity AUTO_RESUME = TRUE INITIALLY_SUSPENDED = TRUE; 
-- Grant warehouse usage GRANT USAGE ON WAREHOUSE METABASE_WH TO ROLE METABASE_ROLE; 
-- Grant database and schema access GRANT USAGE ON DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT USAGE ON ALL SCHEMAS IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT SELECT ON ALL TABLES IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT SELECT ON ALL VIEWS IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; 
-- Grant access to future tables and views (critical for dbt-created models) GRANT SELECT ON FUTURE TABLES IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; GRANT SELECT ON FUTURE VIEWS IN DATABASE ANALYTICS TO ROLE METABASE_ROLE; 
-- Create the user CREATE USER METABASE_USER PASSWORD = 'a-strong-password' DEFAULT_ROLE = METABASE_ROLE DEFAULT_WAREHOUSE = METABASE_WH; GRANT ROLE METABASE_ROLE TO USER METABASE_USER;

Snowflake Performance Considerations

Warehouse sizing: Snowflake warehouses scale in T-shirt sizes (X-Small, Small, Medium, Large...). Query execution time scales roughly proportionally to warehouse size — a Medium runs queries about 2x faster than a Small, at 2x the credit cost.

For Metabase dashboards, a Small or Medium warehouse is usually sufficient. Dashboard queries are typically aggregations that benefit from more compute, but a single concurrent user rarely needs more than a Small.

Concurrent users	Recommended warehouse size
< 10	X-Small
10–50	Small
50–200	Medium
200+	Large or multiple warehouses

Auto-suspend and auto-resume: Set AUTO_SUSPEND = 60 (60 seconds) to avoid paying for idle warehouse time. AUTO_RESUME = TRUE means the warehouse starts automatically when Metabase runs a query. The cold start adds ~2–3 seconds to the first query after suspension — acceptable for most dashboard use cases.

Result cache: Snowflake caches query results for 24 hours per virtual warehouse. Identical queries return cached results without consuming credits. This complements Metabase's own caching — the two layers together mean most dashboard loads require no compute at all.

Clustering keys: For frequently-filtered large tables, define clustering keys on columns used in Metabase filters:

sql
ALTER TABLE orders CLUSTER BY (organization_id, DATE_TRUNC('day', created_at));

---

Google BigQuery

Connection Setup

BigQuery uses service account authentication. In Admin → Databases → Add Database → BigQuery:

Field	Value	Notes
Project ID	`your-gcp-project-id`	GCP project containing the dataset
Dataset filters	`analytics,reporting`	Optional: limit which datasets Metabase syncs
Service account JSON	Upload the JSON key file	From GCP IAM

Creating a Service Account

bash
<h1 class="text-4xl font-bold mb-6 text-slate-900">Create service account</h1> gcloud iam service-accounts create metabase-reader \ --display-name="Metabase Reader" \ --project=your-gcp-project 
<h1 class="text-4xl font-bold mb-6 text-slate-900">Grant BigQuery read access</h1> gcloud projects add-iam-policy-binding your-gcp-project \ --member="serviceAccount:metabase-reader@your-gcp-project.iam.gserviceaccount.com" \ --role="roles/bigquery.dataViewer" 
<h1 class="text-4xl font-bold mb-6 text-slate-900">Grant BigQuery job creation (required to run queries)</h1> gcloud projects add-iam-policy-binding your-gcp-project \ --member="serviceAccount:metabase-reader@your-gcp-project.iam.gserviceaccount.com" \ --role="roles/bigquery.jobUser" <h1 class="text-4xl font-bold mb-6 text-slate-900">Create and download the JSON key</h1> gcloud iam service-accounts keys create metabase-key.json \ --iam-account=metabase-reader@your-gcp-project.iam.gserviceaccount.com

BigQuery Performance Considerations

Partitioned tables: BigQuery charges per byte scanned. Partition your tables on the date column that Metabase filters on most frequently. Metabase's date filters translate to partition pruning — only the relevant partitions are scanned:

sql
CREATE TABLE project.analytics.orders PARTITION BY DATE(created_at) CLUSTER BY organization_id, status AS SELECT * FROM source_orders;

A date-partitioned, clustered table can reduce bytes scanned by 90%+ for typical Metabase queries.

Slot reservations: By default, BigQuery uses on-demand pricing (per byte scanned). For teams running many concurrent Metabase queries, flat-rate slot reservations can be more cost-effective and provide consistent performance:

On-demand: $5/TB scanned — unpredictable cost, unlimited concurrency

Slots: $10,000/month for 500 slots — predictable cost, limited by slot count

Evaluate based on your query volume and cost tolerance.

Dataset filters in connection: Use the Dataset filters option in Metabase to limit which datasets Metabase syncs. BigQuery projects can contain many datasets; syncing all of them adds unnecessary schema sync overhead and exposes datasets that aren't intended for analytics.

BigQuery Cost Controls

sql -- Set a per-query byte limit to prevent runaway scans -- In BigQuery: set maximum bytes billed per query job -- Via Metabase: not directly configurable, but you can enforce via IAM conditions

-- Create a custom role with byte billing limits -- gcloud iam roles create MetabaseReader with limited quota

For SaaS products where embedded users run ad-hoc queries against BigQuery, set a maximum bytes billed to prevent individual queries from becoming expensive:

javascript

// If using BigQuery directly via API, set bytesBudgetExceeded // For Metabase, use the Admin Query Timeout as a proxy control

---

Amazon Redshift

Connection Setup

In Admin → Databases → Add Database → Redshift:

Field	Value	Notes
Host	`your-cluster.region.redshift.amazonaws.com`	Cluster endpoint
Port	`5439`	Default Redshift port
Database	`analytics`	Database name
Username	`metabase_reader`	Dedicated read-only user
Password	Your password

Redshift speaks PostgreSQL protocol — if you don't see a Redshift option, a PostgreSQL connection often works, though the native Redshift driver handles Redshift-specific features better.

Creating a Read-Only User

sql
-- Create group for Metabase access CREATE GROUP metabase_access; 
-- Create the user CREATE USER metabase_reader WITH PASSWORD 'strong-password'; ALTER GROUP metabase_access ADD USER metabase_reader; -- Grant schema access GRANT USAGE ON SCHEMA analytics TO GROUP metabase_access; GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO GROUP metabase_access; ALTER DEFAULT PRIVILEGES IN SCHEMA analytics GRANT SELECT ON TABLES TO GROUP metabase_access;

Redshift Performance Considerations

Distribution and sort keys: Redshift performance depends heavily on distribution and sort key design. For Metabase workloads:

sql
-- Distribute fact tables by the most common join key -- (co-locates related rows on the same slice for fast joins) CREATE TABLE orders ( organization_id BIGINT, created_at TIMESTAMP, amount DECIMAL(10,2), ... ) DISTKEY(organization_id) SORTKEY(created_at);  -- or compound: SORTKEY(organization_id, created_at)

Redshift Spectrum: For historical data that's too large and too infrequently accessed to keep in Redshift's managed storage, use Redshift Spectrum to query S3 directly:

sql
-- Create external schema pointing to S3 CREATE EXTERNAL SCHEMA spectrum FROM DATA CATALOG DATABASE 'analytics_archive' IAM_ROLE 'arn:aws:iam::account-id:role/RedshiftSpectrumRole' CREATE EXTERNAL DATABASE IF NOT EXISTS; -- Query S3 data from Metabase as if it were a normal table SELECT * FROM spectrum.orders_archive WHERE year = 2022;

Concurrency scaling: Redshift Concurrency Scaling automatically adds capacity when query queues build up. Enable it for Metabase workloads with variable concurrency:

sql

-- Enable concurrency scaling for Metabase's user group ALTER USER metabase_reader WITH QUERY_GROUP 'analytics'; -- Configure Concurrency Scaling in the Redshift parameter group

---

Metabase Data Model Features for Warehouses

When connecting to a warehouse, take advantage of Metabase's data model features to make the warehouse more navigable for non-technical users:

Field Renaming

Warehouse column names are often abbreviated or technical. Rename them in Metabase:

Admin → Data Model → [Database] → [Table] → [Field]:

org_id → "Organization ID"

gmv_usd → "Gross Revenue (USD)"

cnt_mau → "Monthly Active Users"

Field Visibility

Hide technical or internal columns from the query builder:

_dbt_source_relation → Hidden

_fivetran_synced → Hidden

surrogate_key → Hidden

Metrics

Define reusable metrics in the data model so users can select them from the query builder without knowing the aggregation formula:

"Revenue" = SUM(amount_usd)

"AOV" = SUM(amount_usd) / COUNT(DISTINCT order_id)

"Retention Rate" = (custom SQL metric)

Table Relationships

Define foreign key relationships between tables so Metabase's query builder can automatically suggest JOINs:

orders.customer_id → customers.id

orders.product_id → products.id

---

Connecting Multiple Warehouses

Metabase can connect to multiple databases and warehouses simultaneously. A common pattern:

Operational database (PostgreSQL): real-time data, used for live-updating dashboards

Data warehouse (Snowflake): historical analysis, used for trend dashboards and long-running queries

Specialized warehouse (BigQuery): ML feature store or event data

Users choose which database to query when creating a question. Dashboards can mix questions from multiple data sources.

---

Summary

Connecting Metabase to a data warehouse separates analytical query load from your operational database, unlocking performance for large-scale analytics. Snowflake, BigQuery, and Redshift each require a dedicated read-only service account or user; follow the principle of least privilege, granting only SELECT on the specific databases, schemas, and tables Metabase needs. Key performance decisions are warehouse sizing (Snowflake), partitioning and clustering (BigQuery), and distribution/sort keys (Redshift). After connecting, configure Metabase's data model with human-readable field names, hide technical columns, and define reusable metrics to make the warehouse accessible to non-technical users without requiring them to write SQL.