Skip to content

Transformations

Transform raw data into analytics-ready tables with dbt and DuckDB.


Overview

Transformations in Dango use dbt (data build tool) to convert raw ingested data into clean, modeled tables for analytics. Dango combines the power of dbt with DuckDB to give you:

  • Auto-generated staging model templates from raw tables
  • Interactive model creation via dango model add
  • Custom SQL transformations for business logic
  • Auto-generated data quality tests (unique, not_null)
  • dbt documentation viewable via Web UI

Tech Stack:

  • dbt 1.7+ - SQL transformation framework
  • DuckDB 0.10+ - Embedded analytics database
  • Jinja - SQL templating for dynamic logic

Data Flow

Understanding how data transforms through layers:

graph LR
    A[Raw Layer] -->|dango sync| B[Staging Models]
    B -->|dango model add| C[Intermediate Models]
    C -->|dango model add| D[Marts]
    D --> E[Metabase]

    style A fill:#f3e5f5
    style B fill:#fff9c4
    style C fill:#e1f5ff
    style D fill:#ffebee
    style E fill:#e0f2f1

Transformation Layers

Layer Purpose Created By Materialization
Raw Source of truth from dlt dango sync Tables
Staging Template for cleaning data dango sync (auto) Tables
Intermediate Reusable business logic dango model add Tables
Marts Final analytics tables dango model add Tables

Why Tables?

All models are materialized as tables (not views) for Metabase compatibility.


Quick Start

End-to-End Example

# 1. Load raw data (also generates staging model templates)
dango sync --source stripe_payments

# 2. Create a custom mart using the wizard
dango model add
# Select "marts", enter "customer_metrics", add description

# 3. Edit the generated model file
# Open dbt/models/marts/customer_metrics.sql in your editor
# Replace the template with your SQL logic

# 4. Run transformations
dango run

# 5. View results in Metabase
dango start
# Open http://localhost:8800 → Metabase

Example Mart SQL

After creating a model with dango model add, edit the file:

-- dbt/models/marts/customer_metrics.sql
{{ config(materialized='table') }}

WITH customer_orders AS (
    SELECT
        customer,
        COUNT(*) as order_count,
        SUM(amount / 100.0) as total_spent
    FROM {{ ref('stg_stripe_charges') }}
    WHERE status = 'succeeded'
    GROUP BY customer
)

SELECT
    c.id as customer_id,
    c.email,
    COALESCE(o.order_count, 0) as lifetime_orders,
    COALESCE(o.total_spent, 0) as lifetime_value
FROM {{ ref('stg_stripe_customers') }} c
LEFT JOIN customer_orders o ON c.id = o.customer

Transformation Guides

  • dbt Basics


    Learn dbt fundamentals and how it integrates with Dango.

    • dbt project structure
    • Materializations (view, table, incremental)
    • Jinja templating and macros
    • Running and debugging models

    dbt Basics Guide

  • Staging Models


    Deep dive into Dango's auto-generated staging model templates.

    • How staging templates are generated
    • Customizing staging models
    • Column naming conventions
    • When to regenerate

    Staging Models Guide

  • Custom Models


    Build intermediate layers and data marts for analytics.

    • Intermediate vs. marts models
    • Common SQL patterns (cohorts, funnels, time series)
    • Advanced techniques (macros, incremental)
    • Performance optimization

    Custom Models Guide

  • Testing


    Comprehensive data quality testing with dbt.

    • Schema tests (unique, not_null, relationships)
    • Custom SQL tests
    • dbt_utils test library
    • Testing best practices

    Testing Guide


Key Concepts

Auto-Generated Staging

When you run dango sync, staging model templates are automatically generated:

dango sync --source stripe_payments
# Creates raw data AND staging templates

Creates:

dbt/models/staging/
├── stg_stripe_charges.sql          # Staging template
├── stg_stripe_customers.sql        # Staging template
├── stg_stripe_subscriptions.sql    # Staging template
├── _stg_stripe__sources.yml        # Source references
└── _stg_stripe__schema.yml         # Tests & documentation

Generated SQL (template):

-- stg_stripe_charges.sql
{{ config(materialized='table', schema='staging') }}

SELECT * FROM {{ source('stripe', 'charges') }}

Template Design

Staging templates are intentionally minimal (SELECT *). You customize them by adding:

  • Column selection
  • Type casting
  • Deduplication logic (if needed)

See the Staging Models Guide for customization patterns.

Custom Business Logic

You write intermediate and marts models:

-- dbt/models/marts/revenue_by_month.sql
{{ config(materialized='table') }}

SELECT
    DATE_TRUNC('month', created) as month,
    COUNT(*) as transactions,
    SUM(amount / 100.0) as revenue_usd
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
GROUP BY month
ORDER BY month DESC

Data Testing

Add tests in schema YAML:

# dbt/models/marts/schema.yml
models:
  - name: revenue_by_month
    columns:
      - name: month
        tests:
          - unique
          - not_null

      - name: revenue_usd
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"

Common Workflows

Development Workflow

graph TD
    A[Write SQL] --> B[dango run]
    B --> C{Tests Pass?}
    C -->|No| D[Debug & Fix]
    D --> A
    C -->|Yes| E[Commit to Git]
    E --> F[Deploy]

Commands:

# 1. Edit model
vim dbt/models/marts/customer_metrics.sql

# 2. Run model
dango run --select customer_metrics

# 3. Test model
dbt test --profiles-dir dbt --project-dir dbt --select customer_metrics

# 4. View in Metabase (via dango start)
# Open http://localhost:8800

Adding a New Mart

Option 1: Using the wizard (recommended)

# 1. Create model via wizard
dango model add
# Select "marts", enter "monthly_mrr", add description

# 2. Edit the generated template
vim dbt/models/marts/monthly_mrr.sql
# Replace SELECT * with your SQL logic

# 3. Run and test
dango run --select monthly_mrr
dbt test --profiles-dir dbt --project-dir dbt --select monthly_mrr

Option 2: Manual creation

# 1. Create SQL file
cat > dbt/models/marts/monthly_mrr.sql << 'EOF'
{{ config(materialized='table') }}

SELECT
    DATE_TRUNC('month', created) as month,
    SUM(amount / 100.0) as mrr
FROM {{ ref('stg_stripe_subscriptions') }}
WHERE status = 'active'
GROUP BY month
EOF

# 2. Run and test
dango run --select monthly_mrr
dbt test --profiles-dir dbt --project-dir dbt --select monthly_mrr

Schema files

dango model add auto-generates basic tests in schema.yml. For manual creation, add your own schema.yml file with documentation and tests.

Schema Changes

When source data changes:

# 1. Sync new data (auto-regenerates staging templates)
dango sync --source stripe_payments

# 2. Update downstream models if needed
vim dbt/models/marts/customer_metrics.sql

# 3. Run transformations
dango run

Automatic regeneration

dango sync automatically regenerates staging model templates when source schemas change. You only need to run dango generate manually if you want to regenerate without syncing data.


Project Structure

Standard dbt Layout

dbt/
├── dbt_project.yml              # Project configuration
├── profiles.yml                 # DuckDB connection (auto-configured)
├── packages.yml                 # dbt package dependencies
├── models/
│   ├── staging/                 # Auto-generated (dango sync)
│   │   ├── stg_*.sql
│   │   ├── _stg_*__sources.yml
│   │   └── _stg_*__schema.yml
│   ├── intermediate/            # Your reusable logic (dango model add)
│   │   ├── int_customer_orders.sql
│   │   └── schema.yml
│   └── marts/                   # Your final analytics tables (dango model add)
│       ├── finance/
│       │   ├── revenue_by_month.sql
│       │   ├── mrr_analysis.sql
│       │   └── schema.yml
│       ├── marketing/
│       │   ├── customer_acquisition.sql
│       │   └── schema.yml
│       └── operations/
│           └── fulfillment_metrics.sql
├── tests/                       # Custom SQL tests
│   ├── assert_revenue_reconciliation.sql
│   └── assert_no_duplicates.sql
├── macros/                      # Reusable SQL snippets
│   └── currency_conversion.sql
└── snapshots/                   # Slowly changing dimensions
    └── customer_snapshot.sql

Schema Organization in DuckDB

-- View schemas
duckdb data/warehouse.duckdb "SHOW SCHEMAS;"

-- Output:
raw_stripe       -- Raw data from dlt (source-specific, e.g., raw_stripe, raw_hubspot)
staging          -- Auto-generated staging models (stg_* tables)
intermediate     -- Your intermediate models (int_* tables)
marts            -- Your marts models

Schema naming

Raw schemas are prefixed with raw_ followed by the source name (e.g., raw_stripe, raw_hubspot). This keeps data from different sources isolated.


dbt Commands Reference

# Run all transformations
dango run

# Run specific model
dango run --select customer_metrics

# Create a new model (intermediate or marts)
dango model add

# Remove a model
dango model remove customer_metrics

# Regenerate staging templates (usually not needed - sync does this)
dango generate

Viewing documentation

dbt docs are available at http://localhost:8800/dbt-docs when dango start is running.

Via dbt Directly

# Run all models
dbt run --profiles-dir dbt --project-dir dbt

# Run specific model
dbt run --profiles-dir dbt --project-dir dbt --select customer_metrics

# Test all models
dbt test --profiles-dir dbt --project-dir dbt

# Generate and serve docs
dbt docs generate --profiles-dir dbt --project-dir dbt
dbt docs serve --profiles-dir dbt --project-dir dbt --port 8081

Selective Execution

# Run one model (via dbt directly)
dbt run --profiles-dir dbt --project-dir dbt --select customer_metrics

# Run model and downstream dependencies
dbt run --profiles-dir dbt --project-dir dbt --select customer_metrics+

# Run model and upstream dependencies
dbt run --profiles-dir dbt --project-dir dbt --select +customer_metrics

# Run all marts
dbt run --profiles-dir dbt --project-dir dbt --select marts.*

# Run by tag
dbt run --profiles-dir dbt --project-dir dbt --select tag:finance

Best Practices

1. Keep Staging Simple

Staging models should only:

  • Select columns
  • Rename columns
  • Cast data types
  • Deduplicate

Don't add business logic in staging.

2. Layer Your Transformations

Raw → Staging → Intermediate → Marts
 ↓       ↓           ↓            ↓
sync   sync      model add    model add
       (auto)

3. Use CTEs for Readability

-- Good
WITH base AS (...),
     filtered AS (...),
     aggregated AS (...)
SELECT * FROM aggregated

-- Avoid
SELECT ... FROM (...) JOIN (...) WHERE ...

4. Document Everything

models:
  - name: customer_metrics
    description: |
      Customer lifetime value and engagement metrics.
      Updated daily via dango sync && dango run.
    columns:
      - name: customer_id
        description: Stripe customer ID (primary key)

5. Test Critical Assumptions

tests:
  - unique
  - not_null
  - relationships:
      to: ref('dim_customers')
      field: customer_id
  - dbt_utils.expression_is_true:
      expression: "lifetime_value >= 0"

6. Version Control Everything

git add dbt/models/
git commit -m "Add customer metrics mart"

Performance Tips

Materialization in Dango

All Dango models use table materialization by default for Metabase compatibility.

Why tables?

Metabase requires tables for reliable schema discovery. Views can work but may have inconsistent behavior with some Metabase features.

For large datasets with frequent updates, consider incremental materialization:

Use Case Materialization Why
Most models table Metabase compatibility, fast queries
Large event tables incremental Efficient updates

Optimize Complex Queries

-- Bad: Multiple scans
SELECT
    customer_id,
    (SELECT COUNT(*) FROM orders WHERE customer_id = c.id),
    (SELECT SUM(amount) FROM orders WHERE customer_id = c.id)
FROM customers c

-- Good: Single scan
WITH order_stats AS (
    SELECT customer_id, COUNT(*) as cnt, SUM(amount) as total
    FROM orders GROUP BY customer_id
)
SELECT c.customer_id, o.cnt, o.total
FROM customers c
LEFT JOIN order_stats o ON c.id = o.customer_id

Use Incremental for Scale

{{ config(
    materialized='incremental',
    unique_key='event_id'
) }}

SELECT * FROM {{ ref('stg_events') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

Troubleshooting

Model Fails to Run

Check dependencies:

dbt compile --profiles-dir dbt --project-dir dbt --select customer_metrics

View compiled SQL:

cat dbt/target/compiled/dango/models/marts/customer_metrics.sql

Staging Models Out of Date

If staging templates don't reflect source schema changes:

# Option 1: Re-sync (recommended - also updates data)
dango sync --source stripe_payments

# Option 2: Regenerate templates only
dango generate

Tests Failing

Run with debug:

dbt test --profiles-dir dbt --project-dir dbt --select customer_metrics --debug

Check failed rows:

dbt test --profiles-dir dbt --project-dir dbt --store-failures
duckdb data/warehouse.duckdb "SELECT * FROM dbt_test__audit.unique_customer_metrics_customer_id"

Example: Complete Transformation

1. Source Configuration

# .dango/sources.yml
sources:
  - name: stripe_payments
    type: stripe
    enabled: true
    stripe:
      stripe_secret_key_env: STRIPE_API_KEY
      start_date: 2024-01-01

2. Load Raw Data

dango sync --source stripe_payments

Creates:

  • raw_stripe.charges
  • raw_stripe.customers
  • raw_stripe.subscriptions

3. Staging Templates (Auto-Generated)

dango sync automatically creates staging templates:

  • dbt/models/staging/stg_stripe_charges.sql
  • dbt/models/staging/stg_stripe_customers.sql
  • Schema YAML files with basic tests

4. Create Custom Mart

-- dbt/models/marts/customer_lifetime_value.sql
{{ config(materialized='table') }}

WITH customer_orders AS (
    SELECT
        customer,
        COUNT(*) as order_count,
        SUM(amount / 100.0) as total_spent,
        MIN(created) as first_order,
        MAX(created) as last_order
    FROM {{ ref('stg_stripe_charges') }}
    WHERE status = 'succeeded'
    GROUP BY customer
),

customers AS (
    SELECT id, email, created
    FROM {{ ref('stg_stripe_customers') }}
)

SELECT
    c.id as customer_id,
    c.email,
    c.created as customer_since,
    COALESCE(o.order_count, 0) as lifetime_orders,
    COALESCE(o.total_spent, 0) as lifetime_value,
    o.first_order,
    o.last_order
FROM customers c
LEFT JOIN customer_orders o ON c.id = o.customer

5. Add Tests

# dbt/models/marts/schema.yml
version: 2
models:
  - name: customer_lifetime_value
    description: Customer LTV metrics
    columns:
      - name: customer_id
        tests: [unique, not_null]
      - name: lifetime_value
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"

6. Run and Test

dango run
dbt test --profiles-dir dbt --project-dir dbt

7. Query Results

-- In Metabase or DuckDB
SELECT * FROM marts.customer_lifetime_value
ORDER BY lifetime_value DESC
LIMIT 10;

Next Steps

  • Core Concepts


    Understand Dango's data architecture and layering strategy.

    Core Concepts

  • Data Sources


    Learn how to ingest data from APIs, databases, and files.

    Data Sources

  • Dashboards


    Visualize your transformed data in Metabase.

    Dashboards

  • dbt Documentation


    Explore official dbt resources and advanced features.

    dbt Docs