Data

Do you need a data warehouse? A dev's guide to the honest answer

Do you need a data warehouse? Understand when traditional databases stop scaling and when a warehouse becomes essential for analytics.

📅
📖6 min read

Do you need a data warehouse? A dev's guide to the honest answer

Quick answers

QuestionAnswer
What is a data warehouse?A separate database optimized for analytical queries rather than transactional workloads. Data is copied from your production systems and transformed before being loaded.
Do I need one right now?Probably not yet. Most teams can query their production database directly with Metabase and get useful, current answers. A warehouse becomes necessary when analytical queries are affecting production performance, or when you need to regularly join data from many different sources.
What's the difference between a data warehouse and a production database?Production databases are optimized for fast reads and writes on individual records — transactional workloads. Warehouses are optimized for aggregations across large datasets and complex joins — analytical workloads. Different engines, different query patterns.
When does Metabase work without a warehouse?Always, for the right use cases. Metabase connects directly to Postgres, MySQL, and most production databases. If your analytical queries are fast enough and aren't affecting production performance, you don't need a warehouse before you get value.
What are the popular data warehouses?BigQuery (Google), Snowflake, Redshift (AWS), and Databricks are the main cloud options. All connect directly to Metabase.
What's ELT vs ETL?ETL transforms data before loading it. ELT loads raw data first, then transforms it inside the warehouse — typically with dbt. ELT is the modern default. Metabase's ETL/ELT guide covers the full tradeoffs.
---

"Do we need a data warehouse?" is one of the most common questions growing data teams ask — and the answer is almost always "not yet, and maybe never."

Here's how to think through it without cargo-culting the modern data stack.

What a data warehouse actually does

A data warehouse is a separate database built for analytical queries: aggregations across large datasets, joins across multiple sources, historical trend analysis. Unlike a production database — which is optimized for fast reads and writes on individual records — a warehouse is designed around different query patterns entirely.

The classic architecture: data flows from your production systems into a warehouse via ETL or ELT, gets transformed into analytics-friendly structures, and then gets queried by your BI tool.

The question is whether you actually need that middle layer.

When you don't need a warehouse yet

You can skip a warehouse and query your production database directly with Metabase if:

Your analytical queries don't affect production performance. The real concern isn't query speed — it's whether analytical queries are competing with user-facing queries for database resources. If they're not, you're fine.

You're mostly querying one database. A small SaaS with simple data: users table, subscriptions table, usage events in the same DB. You don't need to split services; you can query everything directly.

You're still figuring out what questions to ask. Building a warehouse before you know what metrics matter is premature. The patterns that actually need a warehouse only become clear once you've been querying your data for a while.

Metabase connects directly to your production database — Postgres, MySQL, and 20+ others — and queries it in real time. No warehouse required to get started.

When a warehouse starts making sense

Analytical queries are affecting your production app. Long-running aggregations that scan large tables can compete with user-facing queries for resources. A read replica solves this at first; a warehouse is the right answer when you outgrow that.

You need to join data from multiple systems regularly. Product DB + Stripe + Salesforce + marketing data is a sign. If you're writing one-off scripts to combine data from different sources before you can answer a question, a warehouse gives you one place to load and join them. Metabase's SQL editor queries the warehouse directly once it's in place.

You need historical snapshots. Production databases store the current state of records. If you need to answer "what did our churn look like six months ago?", you need a system that records history — which is a core warehouse pattern.

Query performance is genuinely bottlenecked at scale. Cloud data warehouses separate storage and compute, letting you scale each independently. For very large datasets and complex aggregations, that architecture wins. But this is a scale problem, not a day-one problem.

What a read replica gets you first

Before committing to a full warehouse, a read replica of your production database is often the right intermediate step:

  • Analytical queries go to the replica, not production
  • No data movement or transformation pipeline needed
  • Metabase connects to it identically — just point it at the replica
  • This buys you significant headroom without the overhead of a full ELT pipeline.

    The practical path

  • Start with direct production queries via Metabase. See what questions you can answer and where you hit limits.
  • Add a read replica if analytical load is affecting production. Same data, less risk.
  • Add a warehouse when you need cross-source joins regularly, you need historical snapshots, or your read replica isn't keeping up with query complexity.
  • Connect Metabase to the warehouse using the same database connection setup. Your existing questions still work.
  • For the data transformation layer once you have a warehouse, Metabase's ETL/ELT guide covers the landscape. For the full picture of how the data stack fits together, the data landscape guide is a good foundation.

    Key takeaways

    PointDetail
    Don't add a warehouse before you need itQuery production directly first; the constraints become clear over time
    The real signal is production load, not data sizeAnalytical queries competing with user queries is the trigger, not row count
    A read replica is the right intermediate stepSame data, no transformation overhead, analytical load goes elsewhere
    Metabase works with or without a warehouseConnects to Postgres, MySQL, BigQuery, Redshift, Snowflake, and 20+ others the same way
    Add a warehouse for cross-source joins, historical snapshots, or scaleNot as a default — as a solution to a specific problem you've hit