Custom Models¶
Build intermediate layers and data marts for analytics.
Overview¶
Custom models are where you implement business logic, create reusable transformations, and build final analytics tables. Unlike staging models (which are auto-generated templates), custom models are written by you.
Two Types:
- Intermediate Models - Reusable business logic and joins
- Marts Models - Final tables optimized for BI and reporting
Key Features:
- Full SQL and dbt functionality
- Version controlled with Git
- Tested and documented
- Materialized as tables (required for Metabase compatibility)
- Build on top of staging models
Creating Custom Models:
Use dango model add to create new models via an interactive wizard:
This creates a template file you can edit with your SQL logic.
Quick Start¶
Create Your First Mart¶
Step 1: Create the model file
Step 2: Edit the generated template
Open dbt/models/marts/customer_metrics.sql and replace with your SQL:
-- dbt/models/marts/customer_metrics.sql
{{ config(materialized='table') }}
WITH customer_orders AS (
SELECT
customer_id,
COUNT(*) as order_count,
SUM(amount / 100.0) as total_spent,
MIN(created) as first_order_date,
MAX(created) as last_order_date
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
GROUP BY customer_id
),
customers AS (
SELECT
id,
email,
created as customer_since
FROM {{ ref('stg_stripe_customers') }}
)
SELECT
c.id as customer_id,
c.email,
c.customer_since,
COALESCE(co.order_count, 0) as lifetime_orders,
COALESCE(co.total_spent, 0) as lifetime_value,
co.first_order_date,
co.last_order_date,
DATEDIFF('day', co.first_order_date, co.last_order_date) as customer_lifespan_days
FROM customers c
LEFT JOIN customer_orders co ON c.id = co.customer_id
Step 3: Run the model
Query in Metabase or DuckDB:
Project Structure¶
Directory Layout¶
dbt/models/
├── staging/ # Auto-generated (don't modify)
│ └── stg_*.sql
├── intermediate/ # Your reusable logic
│ ├── int_customer_orders.sql
│ ├── int_product_performance.sql
│ └── schema.yml
└── marts/ # Your final analytics tables
├── finance/
│ ├── revenue_by_month.sql
│ ├── mrr_analysis.sql
│ └── schema.yml
├── marketing/
│ ├── campaign_performance.sql
│ ├── customer_acquisition.sql
│ └── schema.yml
└── operations/
├── fulfillment_metrics.sql
└── schema.yml
Naming Conventions¶
| Layer | Prefix | Example | Materialization |
|---|---|---|---|
| Staging | stg_ | stg_stripe_charges | table |
| Intermediate | int_ | int_customer_orders | table |
| Marts | (any) | customer_metrics, revenue_by_month | table |
Materialization
All models use table materialization in Dango for Metabase compatibility. The int_ prefix is auto-added when you create intermediate models via dango model add.
Intermediate Models¶
Purpose¶
Intermediate models encapsulate reusable business logic that:
- Joins multiple staging tables
- Applies business rules consistently
- Creates reusable building blocks
- Reduces code duplication
Materialization: table (required for Metabase compatibility)
Creating intermediate models:
Example: Customer Orders¶
-- dbt/models/intermediate/int_customer_orders.sql
{{ config(
materialized='table',
schema='intermediate'
) }}
WITH customers AS (
SELECT
id,
email,
created as customer_since
FROM {{ ref('stg_stripe_customers') }}
),
charges AS (
SELECT
id as charge_id,
customer,
amount / 100.0 as amount_usd,
currency,
status,
created as charge_date
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
),
joined AS (
SELECT
c.id as customer_id,
c.email,
c.customer_since,
ch.charge_id,
ch.amount_usd,
ch.currency,
ch.charge_date
FROM customers c
INNER JOIN charges ch ON c.id = ch.customer
)
SELECT * FROM joined
Example: Product Performance¶
-- dbt/models/intermediate/int_product_performance.sql
{{ config(materialized='table', schema='intermediate') }}
WITH products AS (
SELECT
id as product_id,
name,
category,
price / 100.0 as price_usd
FROM {{ ref('stg_shopify_products') }}
),
order_items AS (
SELECT
product_id,
quantity,
price / 100.0 as item_price_usd,
order_date
FROM {{ ref('stg_shopify_order_items') }}
),
aggregated AS (
SELECT
oi.product_id,
p.name,
p.category,
p.price_usd,
COUNT(DISTINCT oi.order_date) as days_with_sales,
SUM(oi.quantity) as total_units_sold,
SUM(oi.item_price_usd) as total_revenue
FROM order_items oi
INNER JOIN products p ON oi.product_id = p.product_id
GROUP BY 1, 2, 3, 4
)
SELECT
*,
total_revenue / NULLIF(total_units_sold, 0) as avg_sale_price
FROM aggregated
When to Use Intermediate¶
Use intermediate models when:
- ✅ Logic is reused in multiple marts
- ✅ Complex joins need to be standardized
- ✅ Business rules apply across domains
- ✅ You want to simplify downstream models
Don't use intermediate when:
- ❌ Logic is only used once (put it in the mart)
- ❌ Model is simple enough for a CTE
- ❌ Performance requires materialization (use mart instead)
Marts Models¶
Purpose¶
Marts are the final analytics tables designed for:
- Business intelligence tools (Metabase, Tableau, etc.)
- SQL queries by analysts
- Reporting and dashboards
- Data exports
Materialization: table (required for Metabase compatibility)
Creating marts models:
Fact Tables¶
Fact tables store measurable events or transactions.
-- dbt/models/marts/finance/fct_revenue.sql
{{ config(
materialized='table',
schema='marts'
) }}
WITH daily_revenue AS (
SELECT
DATE_TRUNC('day', charge_date) as date,
currency,
COUNT(*) as transaction_count,
SUM(amount_usd) as revenue_usd,
COUNT(DISTINCT customer_id) as unique_customers
FROM {{ ref('int_customer_orders') }}
GROUP BY 1, 2
)
SELECT
date,
currency,
transaction_count,
revenue_usd,
unique_customers,
revenue_usd / NULLIF(transaction_count, 0) as avg_transaction_value
FROM daily_revenue
ORDER BY date DESC
Dimension Tables¶
Dimension tables store attributes about entities.
-- dbt/models/marts/operations/dim_customers.sql
{{ config(
materialized='table',
schema='marts'
) }}
WITH customer_base AS (
SELECT
customer_id,
email,
customer_since,
lifetime_orders,
lifetime_value,
first_order_date,
last_order_date
FROM {{ ref('int_customer_orders') }}
),
customer_segments AS (
SELECT
*,
CASE
WHEN lifetime_value >= 10000 THEN 'vip'
WHEN lifetime_value >= 1000 THEN 'high_value'
WHEN lifetime_value >= 100 THEN 'medium_value'
ELSE 'low_value'
END as customer_segment,
CASE
WHEN lifetime_orders = 1 THEN 'one_time'
WHEN lifetime_orders <= 5 THEN 'occasional'
WHEN lifetime_orders <= 20 THEN 'regular'
ELSE 'power_user'
END as customer_type
FROM customer_base
)
SELECT * FROM customer_segments
Aggregate Tables¶
Aggregate tables pre-compute metrics for performance.
-- dbt/models/marts/marketing/monthly_cohort_retention.sql
{{ config(materialized='table') }}
WITH customer_cohorts AS (
SELECT
customer_id,
DATE_TRUNC('month', customer_since) as cohort_month,
DATE_TRUNC('month', charge_date) as activity_month
FROM {{ ref('int_customer_orders') }}
),
cohort_sizes AS (
SELECT
cohort_month,
COUNT(DISTINCT customer_id) as cohort_size
FROM customer_cohorts
GROUP BY cohort_month
),
cohort_activity AS (
SELECT
c.cohort_month,
c.activity_month,
COUNT(DISTINCT c.customer_id) as active_customers
FROM customer_cohorts c
GROUP BY 1, 2
)
SELECT
ca.cohort_month,
ca.activity_month,
cs.cohort_size,
ca.active_customers,
ca.active_customers::FLOAT / cs.cohort_size as retention_rate,
DATEDIFF('month', ca.cohort_month, ca.activity_month) as months_since_cohort
FROM cohort_activity ca
INNER JOIN cohort_sizes cs ON ca.cohort_month = cs.cohort_month
ORDER BY ca.cohort_month, ca.activity_month
Common Patterns¶
Time Series Analysis¶
-- dbt/models/marts/finance/revenue_by_month.sql
{{ config(materialized='table') }}
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', charge_date) as month,
SUM(amount_usd) as revenue,
COUNT(DISTINCT customer_id) as customers,
COUNT(*) as transactions
FROM {{ ref('int_customer_orders') }}
GROUP BY 1
),
with_previous AS (
SELECT
month,
revenue,
customers,
transactions,
LAG(revenue) OVER (ORDER BY month) as prev_month_revenue
FROM monthly_revenue
)
SELECT
month,
revenue,
customers,
transactions,
prev_month_revenue,
revenue - prev_month_revenue as revenue_change,
(revenue - prev_month_revenue) / NULLIF(prev_month_revenue, 0) * 100 as revenue_growth_pct
FROM with_previous
ORDER BY month DESC
Funnel Analysis¶
-- dbt/models/marts/marketing/conversion_funnel.sql
{{ config(materialized='table') }}
WITH funnel_events AS (
SELECT
user_id,
MAX(CASE WHEN event_type = 'page_view' THEN 1 ELSE 0 END) as viewed,
MAX(CASE WHEN event_type = 'signup' THEN 1 ELSE 0 END) as signed_up,
MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) as purchased
FROM {{ ref('stg_analytics_events') }}
WHERE event_date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY user_id
)
SELECT
COUNT(*) as total_users,
SUM(viewed) as step_1_viewed,
SUM(signed_up) as step_2_signed_up,
SUM(purchased) as step_3_purchased,
SUM(signed_up)::FLOAT / NULLIF(SUM(viewed), 0) * 100 as conversion_rate_1_to_2,
SUM(purchased)::FLOAT / NULLIF(SUM(signed_up), 0) * 100 as conversion_rate_2_to_3,
SUM(purchased)::FLOAT / NULLIF(SUM(viewed), 0) * 100 as overall_conversion_rate
FROM funnel_events
Slowly Changing Dimensions (SCD Type 2)¶
-- dbt/models/marts/operations/dim_products_scd.sql
{{ config(materialized='table') }}
WITH product_changes AS (
SELECT
id,
name,
price,
_dlt_extracted_at as valid_from,
LEAD(_dlt_extracted_at) OVER (PARTITION BY id ORDER BY _dlt_extracted_at) as valid_to,
CASE
WHEN LEAD(_dlt_extracted_at) OVER (PARTITION BY id ORDER BY _dlt_extracted_at) IS NULL
THEN TRUE
ELSE FALSE
END as is_current
FROM {{ ref('stg_shopify_products') }}
)
SELECT
{{ dbt_utils.generate_surrogate_key(['id', 'valid_from']) }} as product_key,
id as product_id,
name,
price,
valid_from,
COALESCE(valid_to, '9999-12-31'::TIMESTAMP) as valid_to,
is_current
FROM product_changes
Advanced Techniques¶
Using dbt Macros¶
Create reusable SQL snippets:
-- dbt/macros/currency_conversion.sql
{% macro convert_to_usd(amount_column, currency_column) %}
CASE {{ currency_column }}
WHEN 'usd' THEN {{ amount_column }}
WHEN 'eur' THEN {{ amount_column }} * 1.10
WHEN 'gbp' THEN {{ amount_column }} * 1.27
ELSE {{ amount_column }}
END
{% endmacro %}
Use in models:
SELECT
id,
amount,
currency,
{{ convert_to_usd('amount', 'currency') }} as amount_usd
FROM {{ ref('stg_stripe_charges') }}
Incremental Models¶
For large, append-only datasets:
-- dbt/models/marts/events/fct_user_events.sql
{{ config(
materialized='incremental',
unique_key='event_id',
on_schema_change='append_new_columns'
) }}
SELECT
event_id,
user_id,
event_type,
event_timestamp,
properties
FROM {{ ref('stg_analytics_events') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}
Dynamic SQL with Jinja¶
-- dbt/models/marts/finance/revenue_by_product_line.sql
{{ config(materialized='table') }}
{% set product_lines = ['electronics', 'clothing', 'home_goods'] %}
SELECT
date,
{% for line in product_lines %}
SUM(CASE WHEN product_line = '{{ line }}' THEN revenue ELSE 0 END) as {{ line }}_revenue,
{% endfor %}
SUM(revenue) as total_revenue
FROM {{ ref('int_product_performance') }}
GROUP BY date
Using dbt Packages¶
Install dbt_utils:
Use in models:
SELECT
{{ dbt_utils.generate_surrogate_key(['customer_id', 'order_id']) }} as unique_key,
customer_id,
order_id
FROM {{ ref('int_customer_orders') }}
Materialization Strategies¶
Tables (Default in Dango)¶
All Dango models use table materialization by default for Metabase compatibility.
Why tables?
Metabase requires tables for reliable schema discovery. Views can work in some cases but may have inconsistent behavior with Metabase features.
Incremental (For Large Tables)¶
Use for:
- Large event logs (millions of rows)
- Append-only data
- Historical archives
Pros: Efficient updates, handles scale Cons: More complex logic, harder to debug
Ephemeral (Advanced)¶
Use for:
- Helper CTEs that shouldn't create tables
- Small utility models
- Reducing table clutter
Pros: No tables created, clean namespace Cons: Can't query directly in Metabase
Documentation¶
Schema YAML¶
Document all custom models:
# dbt/models/marts/finance/schema.yml
version: 2
models:
- name: fct_revenue
description: |
Daily revenue metrics aggregated from successful Stripe charges.
Updates daily via `dango sync && dango run`.
columns:
- name: date
description: Revenue date (UTC timezone)
tests:
- not_null
- unique
- name: revenue_usd
description: Total revenue in USD (converted from cents)
tests:
- not_null
- name: transaction_count
description: Number of successful transactions
tests:
- not_null
- name: unique_customers
description: Count of distinct customers who made purchases
Inline Documentation¶
-- dbt/models/marts/customer_metrics.sql
{{ config(materialized='table') }}
/*
Customer Lifetime Value Calculation
====================================
This model calculates key metrics for each customer:
- Total orders and revenue
- Average order value
- Customer lifespan
Updated: Daily
Owner: Analytics Team
*/
WITH customer_orders AS (
-- Aggregate all successful charges per customer
SELECT ...
Testing Custom Models¶
Schema Tests¶
# dbt/models/marts/schema.yml
models:
- name: customer_metrics
tests:
# Model-level tests
- dbt_utils.expression_is_true:
expression: "lifetime_value >= 0"
columns:
- name: customer_id
tests:
- unique
- not_null
- relationships:
to: ref('stg_stripe_customers')
field: id
- name: lifetime_orders
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
Custom SQL Tests¶
-- dbt/tests/assert_revenue_matches_charges.sql
WITH revenue_from_mart AS (
SELECT SUM(revenue_usd) as total
FROM {{ ref('fct_revenue') }}
),
revenue_from_source AS (
SELECT SUM(amount / 100.0) as total
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
)
SELECT *
FROM revenue_from_mart
CROSS JOIN revenue_from_source
WHERE ABS(revenue_from_mart.total - revenue_from_source.total) > 0.01
Run tests:
Performance Optimization¶
Optimize Query Performance¶
-- Bad: Multiple scans of same table
SELECT
customer_id,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.id) as order_count,
(SELECT SUM(amount) FROM orders WHERE customer_id = c.id) as total
FROM customers c
-- Good: Single scan with aggregation
WITH order_stats AS (
SELECT
customer_id,
COUNT(*) as order_count,
SUM(amount) as total
FROM orders
GROUP BY customer_id
)
SELECT
c.customer_id,
COALESCE(o.order_count, 0) as order_count,
COALESCE(o.total, 0) as total
FROM customers c
LEFT JOIN order_stats o ON c.id = o.customer_id
Use Indexes (if supported)¶
{{ config(
materialized='table',
post_hook=[
"CREATE INDEX IF NOT EXISTS idx_customer_email ON {{ this }} (email)",
"CREATE INDEX IF NOT EXISTS idx_customer_segment ON {{ this }} (customer_segment)"
]
) }}
Partition Large Tables¶
{{ config(
materialized='incremental',
partition_by={
"field": "date",
"data_type": "date",
"granularity": "month"
}
) }}
Best Practices¶
1. Use CTEs for Readability¶
-- Good
WITH base AS (...),
filtered AS (...),
aggregated AS (...)
SELECT * FROM aggregated
-- Avoid
SELECT ... FROM (...) JOIN (...) WHERE ...
2. Layer Your Logic¶
3. Document Business Rules¶
- name: customer_segment
description: |
Customer segmentation based on lifetime value:
- vip: >= $10,000
- high_value: >= $1,000
- medium_value: >= $100
- low_value: < $100
4. Test Assumptions¶
5. Keep Models Focused¶
One model = one business concept
Next Steps¶
- Testing - Comprehensive data quality testing guide
- dbt Basics - Learn dbt fundamentals
- Staging Models - Understand the foundation
- dbt Documentation - Official dbt resources