Transformations¶
Transform raw data into analytics-ready tables with dbt and DuckDB.
Overview¶
Transformations in Dango use dbt (data build tool) to convert raw ingested data into clean, modeled tables for analytics. Dango combines the power of dbt with DuckDB to give you:
- Auto-generated staging model templates from raw tables
- Interactive model creation via
dango model add - Custom SQL transformations for business logic
- Auto-generated data quality tests (unique, not_null)
- dbt documentation viewable via Web UI
Tech Stack:
- dbt 1.7+ - SQL transformation framework
- DuckDB 0.10+ - Embedded analytics database
- Jinja - SQL templating for dynamic logic
Data Flow¶
Understanding how data transforms through layers:
graph LR
A[Raw Layer] -->|dango sync| B[Staging Models]
B -->|dango model add| C[Intermediate Models]
C -->|dango model add| D[Marts]
D --> E[Metabase]
style A fill:#f3e5f5
style B fill:#fff9c4
style C fill:#e1f5ff
style D fill:#ffebee
style E fill:#e0f2f1 Transformation Layers¶
| Layer | Purpose | Created By | Materialization |
|---|---|---|---|
| Raw | Source of truth from dlt | dango sync | Tables |
| Staging | Template for cleaning data | dango sync (auto) | Tables |
| Intermediate | Reusable business logic | dango model add | Tables |
| Marts | Final analytics tables | dango model add | Tables |
Why Tables?
All models are materialized as tables (not views) for Metabase compatibility.
Quick Start¶
End-to-End Example¶
# 1. Load raw data (also generates staging model templates)
dango sync --source stripe_payments
# 2. Create a custom mart using the wizard
dango model add
# Select "marts", enter "customer_metrics", add description
# 3. Edit the generated model file
# Open dbt/models/marts/customer_metrics.sql in your editor
# Replace the template with your SQL logic
# 4. Run transformations
dango run
# 5. View results in Metabase
dango start
# Open http://localhost:8800 → Metabase
Example Mart SQL¶
After creating a model with dango model add, edit the file:
-- dbt/models/marts/customer_metrics.sql
{{ config(materialized='table') }}
WITH customer_orders AS (
SELECT
customer,
COUNT(*) as order_count,
SUM(amount / 100.0) as total_spent
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
GROUP BY customer
)
SELECT
c.id as customer_id,
c.email,
COALESCE(o.order_count, 0) as lifetime_orders,
COALESCE(o.total_spent, 0) as lifetime_value
FROM {{ ref('stg_stripe_customers') }} c
LEFT JOIN customer_orders o ON c.id = o.customer
Transformation Guides¶
-
dbt Basics
Learn dbt fundamentals and how it integrates with Dango.
- dbt project structure
- Materializations (view, table, incremental)
- Jinja templating and macros
- Running and debugging models
-
Staging Models
Deep dive into Dango's auto-generated staging model templates.
- How staging templates are generated
- Customizing staging models
- Column naming conventions
- When to regenerate
-
Custom Models
Build intermediate layers and data marts for analytics.
- Intermediate vs. marts models
- Common SQL patterns (cohorts, funnels, time series)
- Advanced techniques (macros, incremental)
- Performance optimization
-
Testing
Comprehensive data quality testing with dbt.
- Schema tests (unique, not_null, relationships)
- Custom SQL tests
- dbt_utils test library
- Testing best practices
Key Concepts¶
Auto-Generated Staging¶
When you run dango sync, staging model templates are automatically generated:
Creates:
dbt/models/staging/
├── stg_stripe_charges.sql # Staging template
├── stg_stripe_customers.sql # Staging template
├── stg_stripe_subscriptions.sql # Staging template
├── _stg_stripe__sources.yml # Source references
└── _stg_stripe__schema.yml # Tests & documentation
Generated SQL (template):
-- stg_stripe_charges.sql
{{ config(materialized='table', schema='staging') }}
SELECT * FROM {{ source('stripe', 'charges') }}
Template Design
Staging templates are intentionally minimal (SELECT *). You customize them by adding:
- Column selection
- Type casting
- Deduplication logic (if needed)
See the Staging Models Guide for customization patterns.
Custom Business Logic¶
You write intermediate and marts models:
-- dbt/models/marts/revenue_by_month.sql
{{ config(materialized='table') }}
SELECT
DATE_TRUNC('month', created) as month,
COUNT(*) as transactions,
SUM(amount / 100.0) as revenue_usd
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
GROUP BY month
ORDER BY month DESC
Data Testing¶
Add tests in schema YAML:
# dbt/models/marts/schema.yml
models:
- name: revenue_by_month
columns:
- name: month
tests:
- unique
- not_null
- name: revenue_usd
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
Common Workflows¶
Development Workflow¶
graph TD
A[Write SQL] --> B[dango run]
B --> C{Tests Pass?}
C -->|No| D[Debug & Fix]
D --> A
C -->|Yes| E[Commit to Git]
E --> F[Deploy] Commands:
# 1. Edit model
vim dbt/models/marts/customer_metrics.sql
# 2. Run model
dango run --select customer_metrics
# 3. Test model
dbt test --profiles-dir dbt --project-dir dbt --select customer_metrics
# 4. View in Metabase (via dango start)
# Open http://localhost:8800
Adding a New Mart¶
Option 1: Using the wizard (recommended)
# 1. Create model via wizard
dango model add
# Select "marts", enter "monthly_mrr", add description
# 2. Edit the generated template
vim dbt/models/marts/monthly_mrr.sql
# Replace SELECT * with your SQL logic
# 3. Run and test
dango run --select monthly_mrr
dbt test --profiles-dir dbt --project-dir dbt --select monthly_mrr
Option 2: Manual creation
# 1. Create SQL file
cat > dbt/models/marts/monthly_mrr.sql << 'EOF'
{{ config(materialized='table') }}
SELECT
DATE_TRUNC('month', created) as month,
SUM(amount / 100.0) as mrr
FROM {{ ref('stg_stripe_subscriptions') }}
WHERE status = 'active'
GROUP BY month
EOF
# 2. Run and test
dango run --select monthly_mrr
dbt test --profiles-dir dbt --project-dir dbt --select monthly_mrr
Schema files
dango model add auto-generates basic tests in schema.yml. For manual creation, add your own schema.yml file with documentation and tests.
Schema Changes¶
When source data changes:
# 1. Sync new data (auto-regenerates staging templates)
dango sync --source stripe_payments
# 2. Update downstream models if needed
vim dbt/models/marts/customer_metrics.sql
# 3. Run transformations
dango run
Automatic regeneration
dango sync automatically regenerates staging model templates when source schemas change. You only need to run dango generate manually if you want to regenerate without syncing data.
Project Structure¶
Standard dbt Layout¶
dbt/
├── dbt_project.yml # Project configuration
├── profiles.yml # DuckDB connection (auto-configured)
├── packages.yml # dbt package dependencies
├── models/
│ ├── staging/ # Auto-generated (dango sync)
│ │ ├── stg_*.sql
│ │ ├── _stg_*__sources.yml
│ │ └── _stg_*__schema.yml
│ ├── intermediate/ # Your reusable logic (dango model add)
│ │ ├── int_customer_orders.sql
│ │ └── schema.yml
│ └── marts/ # Your final analytics tables (dango model add)
│ ├── finance/
│ │ ├── revenue_by_month.sql
│ │ ├── mrr_analysis.sql
│ │ └── schema.yml
│ ├── marketing/
│ │ ├── customer_acquisition.sql
│ │ └── schema.yml
│ └── operations/
│ └── fulfillment_metrics.sql
├── tests/ # Custom SQL tests
│ ├── assert_revenue_reconciliation.sql
│ └── assert_no_duplicates.sql
├── macros/ # Reusable SQL snippets
│ └── currency_conversion.sql
└── snapshots/ # Slowly changing dimensions
└── customer_snapshot.sql
Schema Organization in DuckDB¶
-- View schemas
duckdb data/warehouse.duckdb "SHOW SCHEMAS;"
-- Output:
raw_stripe -- Raw data from dlt (source-specific, e.g., raw_stripe, raw_hubspot)
staging -- Auto-generated staging models (stg_* tables)
intermediate -- Your intermediate models (int_* tables)
marts -- Your marts models
Schema naming
Raw schemas are prefixed with raw_ followed by the source name (e.g., raw_stripe, raw_hubspot). This keeps data from different sources isolated.
dbt Commands Reference¶
Via Dango (Recommended)¶
# Run all transformations
dango run
# Run specific model
dango run --select customer_metrics
# Create a new model (intermediate or marts)
dango model add
# Remove a model
dango model remove customer_metrics
# Regenerate staging templates (usually not needed - sync does this)
dango generate
Viewing documentation
dbt docs are available at http://localhost:8800/dbt-docs when dango start is running.
Via dbt Directly¶
# Run all models
dbt run --profiles-dir dbt --project-dir dbt
# Run specific model
dbt run --profiles-dir dbt --project-dir dbt --select customer_metrics
# Test all models
dbt test --profiles-dir dbt --project-dir dbt
# Generate and serve docs
dbt docs generate --profiles-dir dbt --project-dir dbt
dbt docs serve --profiles-dir dbt --project-dir dbt --port 8081
Selective Execution¶
# Run one model (via dbt directly)
dbt run --profiles-dir dbt --project-dir dbt --select customer_metrics
# Run model and downstream dependencies
dbt run --profiles-dir dbt --project-dir dbt --select customer_metrics+
# Run model and upstream dependencies
dbt run --profiles-dir dbt --project-dir dbt --select +customer_metrics
# Run all marts
dbt run --profiles-dir dbt --project-dir dbt --select marts.*
# Run by tag
dbt run --profiles-dir dbt --project-dir dbt --select tag:finance
Best Practices¶
1. Keep Staging Simple¶
Staging models should only:
- Select columns
- Rename columns
- Cast data types
- Deduplicate
Don't add business logic in staging.
2. Layer Your Transformations¶
3. Use CTEs for Readability¶
-- Good
WITH base AS (...),
filtered AS (...),
aggregated AS (...)
SELECT * FROM aggregated
-- Avoid
SELECT ... FROM (...) JOIN (...) WHERE ...
4. Document Everything¶
models:
- name: customer_metrics
description: |
Customer lifetime value and engagement metrics.
Updated daily via dango sync && dango run.
columns:
- name: customer_id
description: Stripe customer ID (primary key)
5. Test Critical Assumptions¶
tests:
- unique
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- dbt_utils.expression_is_true:
expression: "lifetime_value >= 0"
6. Version Control Everything¶
Performance Tips¶
Materialization in Dango¶
All Dango models use table materialization by default for Metabase compatibility.
Why tables?
Metabase requires tables for reliable schema discovery. Views can work but may have inconsistent behavior with some Metabase features.
For large datasets with frequent updates, consider incremental materialization:
| Use Case | Materialization | Why |
|---|---|---|
| Most models | table | Metabase compatibility, fast queries |
| Large event tables | incremental | Efficient updates |
Optimize Complex Queries¶
-- Bad: Multiple scans
SELECT
customer_id,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.id),
(SELECT SUM(amount) FROM orders WHERE customer_id = c.id)
FROM customers c
-- Good: Single scan
WITH order_stats AS (
SELECT customer_id, COUNT(*) as cnt, SUM(amount) as total
FROM orders GROUP BY customer_id
)
SELECT c.customer_id, o.cnt, o.total
FROM customers c
LEFT JOIN order_stats o ON c.id = o.customer_id
Use Incremental for Scale¶
{{ config(
materialized='incremental',
unique_key='event_id'
) }}
SELECT * FROM {{ ref('stg_events') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}
Troubleshooting¶
Model Fails to Run¶
Check dependencies:
View compiled SQL:
Staging Models Out of Date¶
If staging templates don't reflect source schema changes:
# Option 1: Re-sync (recommended - also updates data)
dango sync --source stripe_payments
# Option 2: Regenerate templates only
dango generate
Tests Failing¶
Run with debug:
Check failed rows:
dbt test --profiles-dir dbt --project-dir dbt --store-failures
duckdb data/warehouse.duckdb "SELECT * FROM dbt_test__audit.unique_customer_metrics_customer_id"
Example: Complete Transformation¶
1. Source Configuration¶
# .dango/sources.yml
sources:
- name: stripe_payments
type: stripe
enabled: true
stripe:
stripe_secret_key_env: STRIPE_API_KEY
start_date: 2024-01-01
2. Load Raw Data¶
Creates:
raw_stripe.chargesraw_stripe.customersraw_stripe.subscriptions
3. Staging Templates (Auto-Generated)¶
dango sync automatically creates staging templates:
dbt/models/staging/stg_stripe_charges.sqldbt/models/staging/stg_stripe_customers.sql- Schema YAML files with basic tests
4. Create Custom Mart¶
-- dbt/models/marts/customer_lifetime_value.sql
{{ config(materialized='table') }}
WITH customer_orders AS (
SELECT
customer,
COUNT(*) as order_count,
SUM(amount / 100.0) as total_spent,
MIN(created) as first_order,
MAX(created) as last_order
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
GROUP BY customer
),
customers AS (
SELECT id, email, created
FROM {{ ref('stg_stripe_customers') }}
)
SELECT
c.id as customer_id,
c.email,
c.created as customer_since,
COALESCE(o.order_count, 0) as lifetime_orders,
COALESCE(o.total_spent, 0) as lifetime_value,
o.first_order,
o.last_order
FROM customers c
LEFT JOIN customer_orders o ON c.id = o.customer
5. Add Tests¶
# dbt/models/marts/schema.yml
version: 2
models:
- name: customer_lifetime_value
description: Customer LTV metrics
columns:
- name: customer_id
tests: [unique, not_null]
- name: lifetime_value
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
6. Run and Test¶
7. Query Results¶
-- In Metabase or DuckDB
SELECT * FROM marts.customer_lifetime_value
ORDER BY lifetime_value DESC
LIMIT 10;
Next Steps¶
-
Core Concepts
Understand Dango's data architecture and layering strategy.
-
Data Sources
Learn how to ingest data from APIs, databases, and files.
-
Dashboards
Visualize your transformed data in Metabase.
-
dbt Documentation
Explore official dbt resources and advanced features.