Common data model mistakes that break your analytics
Avoid common data model mistakes: improper normalization, inconsistent definitions, and architectural decisions that tank analytics performance.
Common data model mistakes that break your analytics
Quick answers
| Question | Answer |
|---|---|
| What is a data model in analytics? | The structure of your tables and how they relate — what columns exist, what they mean, how they join. A good data model makes analytics easy. A bad one makes every query a puzzle. |
| What's the most common data model mistake? | Storing everything in one wide table and adding columns as features ship. Works until it doesn't — then every query is a JOIN nightmare or a column soup nobody understands. |
| Why do metrics drift? | Because the same concept is calculated differently in different queries. No shared definition, no authoritative source. Metabase Data Studio fixes this by defining metrics once. |
| What's wrong with using the production database for analytics? | Nothing, until analytical queries start competing with user-facing queries for resources. The fix is either read replicas or a separate analytics layer. |
| How do I know if my data model is bad? | If every new analytics question requires a custom join or a long explanation of which table to use, the model is working against you. |
| Where can I learn more? | Metabase's common data model mistakes guide and the analytics engineering for fact tables article. |
Most analytics problems aren't visualization problems or tooling problems. They're data model problems. The queries are slow, the joins are confusing, the metrics are inconsistent — and all of it traces back to decisions made early about how data was structured.
Here are the most common mistakes, and how to fix them.
Mistake 1: The ever-widening table
The pattern: start with a users table, add columns as features ship. Activation flag, plan, last_login, referral_source, feature_x_enabled, feature_y_enabled...
Eventually you have a table with 80 columns, half of them null for most users, and nobody is sure what any of them mean or when they were populated.
The fix: separate behavioral data from user attributes. User properties belong in a users table. Events and actions belong in an events table with an event_name column. This makes both simpler and makes analytics significantly easier.
Mistake 2: No shared metric definitions
The pattern: every analyst writes their own version of the activation query. Marketing's version counts email confirmation. Product's version counts first core action. Finance's version counts first payment.
Three "activation rates" from the same database. All defensible. None the same.
The fix: define canonical metrics in Metabase Data Studio. Write the definition once, document what it means and why, and mark it as trusted. Every question and dashboard that references the metric draws from the same calculation.
Mistake 3: Storing state, not history
The pattern: your subscriptions table stores the current subscription status. When a user upgrades, you update the row. The previous state is gone.
You can answer "what plan are users on now?" You can't answer "what plan were users on three months ago?" or "how long did users stay on the free plan before upgrading?"
The fix: log state changes as events. Either append-only tables or a separate subscription_events table with a change_type column and timestamps. This is the foundation of cohort analysis, MRR tracking, and churn calculation.
Mistake 4: Null as a meaningful value
The pattern: activated_at is null for users who haven't activated and also null for users who were imported before the column existed and also null for users where the tracking broke.
Three different situations, same null. Every query has to guess which null means what.
The fix: use separate columns or separate tables for separate concepts. Null should mean "unknown" not "one of several possible things."
Mistake 5: Analytical queries hitting production directly under load
The pattern: Metabase is connected to production. As the team grows and more people run queries, production slows down for users.
The fix options: read replica (easiest), query scheduling off-peak, or a separate analytics database. Metabase connects to read replicas the same way it connects to primary databases — just point it at the replica in the connection settings.
Mistake 6: No documentation
The pattern: you know what dim_user_state_v3 means. Nobody else does. The original author left six months ago. Now every new analyst has to reverse-engineer the table before they can use it.
The fix: Metabase Data Studio's data structure features let you add table descriptions, column descriptions, and business term definitions directly in the tool — where people will actually find them when they're trying to build a query.
Key takeaways
| Mistake | Fix |
|---|---|
| Ever-widening table | Separate user attributes from events; use an events table |
| No shared metric definitions | Define once in Metabase Data Studio |
| Storing state, not history | Log state changes as events with timestamps |
| Null means multiple things | Use separate columns or tables for separate concepts |
| Analytics queries hitting production | Use a read replica; same Metabase connection setup |
| No documentation | Add descriptions in Data Studio; they surface in the query builder |