Testing¶
Comprehensive data quality testing with dbt.
Overview¶
Data testing ensures your transformations produce accurate, reliable results. Dango uses dbt's testing framework to validate:
- Schema constraints - Uniqueness, nullability, accepted values
- Referential integrity - Foreign key relationships
- Business logic - Custom validations
- Data freshness - Timeliness checks
- Cross-model consistency - Data reconciliation
Key Benefits:
- Catch errors before they reach dashboards
- Document data quality assumptions
- Build confidence in analytics
- Enable continuous delivery of data
Quick Start¶
Add Basic Tests¶
# dbt/models/marts/schema.yml
version: 2
models:
- name: customer_metrics
columns:
- name: customer_id
tests:
- unique
- not_null
- name: email
tests:
- not_null
- name: lifetime_value
tests:
- not_null
Run Tests¶
# Run all tests
dbt test --profiles-dir dbt --project-dir dbt
# Run tests for specific model
dbt test --profiles-dir dbt --project-dir dbt --select customer_metrics
# Run tests for all marts
dbt test --profiles-dir dbt --project-dir dbt --select marts.*
Test Output¶
Running with dbt=1.7.0
Found 12 models, 8 tests, 0 snapshots, 0 analyses, 0 macros, 0 operations
Completed successfully
Done. PASS=8 WARN=0 ERROR=0 SKIP=0 TOTAL=8
Auto-Generated Tests¶
When dango sync generates staging models, it also creates basic tests for primary key columns:
# Auto-generated in _stg_<source>__schema.yml
models:
- name: stg_stripe_charges
columns:
- name: id
tests:
- unique
- not_null
Dango automatically adds unique and not_null tests for columns that appear to be primary keys (columns named id, uuid, or ending in _id/_key).
Test Types¶
1. Schema Tests¶
Defined in YAML files, applied to columns or models.
Built-in Tests¶
dbt provides four built-in tests:
models:
- name: customer_metrics
columns:
- name: customer_id
tests:
- unique # No duplicate values
- not_null # No NULL values
- name: status
tests:
- accepted_values: # Only allowed values
values: ['active', 'inactive', 'suspended']
- name: subscription_id
tests:
- relationships: # Foreign key constraint
to: ref('stg_stripe_subscriptions')
field: id
Test Descriptions¶
columns:
- name: customer_id
description: Unique Stripe customer identifier
tests:
- unique:
config:
severity: error # error or warn
error_if: ">0" # Fail if any failures
warn_if: ">10" # Warn if >10 failures
2. Data Tests¶
Custom SQL queries that test business logic.
-- dbt/tests/assert_positive_revenue.sql
-- Test fails if query returns any rows
SELECT
customer_id,
lifetime_value
FROM {{ ref('customer_metrics') }}
WHERE lifetime_value < 0
-- dbt/tests/assert_first_before_last_order.sql
SELECT
customer_id,
first_order_date,
last_order_date
FROM {{ ref('customer_metrics') }}
WHERE first_order_date > last_order_date
3. dbt_utils Tests¶
Extended test library from dbt-labs.
First, install dbt_utils:
Use in tests:
models:
- name: customer_metrics
columns:
- name: email
tests:
- dbt_utils.not_null_proportion:
at_least: 0.95 # At least 95% non-null
- name: customer_id
tests:
- dbt_utils.cardinality_equality:
field: id
to: ref('stg_stripe_customers')
tests:
# Model-level tests
- dbt_utils.expression_is_true:
expression: "lifetime_value >= 0"
- dbt_utils.equal_rowcount:
compare_model: ref('stg_stripe_customers')
4. Freshness Tests¶
Ensure data is up-to-date:
# dbt/models/staging/_stg_stripe__sources.yml
version: 2
sources:
- name: stripe
schema: raw_stripe
tables:
- name: charges
loaded_at_field: _dlt_extracted_at
freshness:
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
Test freshness:
Test Configuration¶
Severity Levels¶
tests:
- unique:
config:
severity: error # Fail build
- not_null:
config:
severity: warn # Warn but continue
Error Thresholds¶
tests:
- unique:
config:
error_if: ">0" # Any failures = error
warn_if: ">10" # >10 failures = warning
Store Test Results¶
Query failed tests:
Common Test Patterns¶
Primary Key Tests¶
models:
- name: customer_metrics
columns:
- name: customer_id
description: Primary key
tests:
- unique
- not_null
Foreign Key Tests¶
models:
- name: fct_orders
columns:
- name: customer_id
description: Foreign key to dim_customers
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
Range Tests¶
models:
- name: customer_metrics
columns:
- name: lifetime_value
tests:
- dbt_utils.expression_is_true:
expression: ">= 0"
- name: discount_percentage
tests:
- dbt_utils.expression_is_true:
expression: "BETWEEN 0 AND 100"
Format Tests¶
models:
- name: dim_customers
columns:
- name: email
tests:
- dbt_utils.not_null_proportion:
at_least: 0.95
- dbt_utils.expression_is_true:
expression: "LIKE '%@%'" # Basic email format
Date Logic Tests¶
models:
- name: customer_metrics
tests:
- dbt_utils.expression_is_true:
expression: "first_order_date <= last_order_date"
- dbt_utils.expression_is_true:
expression: "customer_since <= first_order_date"
Completeness Tests¶
models:
- name: fct_revenue
tests:
# All dates should be present (no gaps)
- dbt_utils.sequential_values:
column_name: date
interval: 1
datepart: day
Advanced Testing¶
Cross-Model Reconciliation¶
Ensure data consistency across models:
-- dbt/tests/assert_revenue_reconciliation.sql
WITH revenue_from_fact AS (
SELECT SUM(revenue_usd) as total
FROM {{ ref('fct_revenue') }}
),
revenue_from_staging AS (
SELECT SUM(amount / 100.0) as total
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
)
SELECT
f.total as fact_revenue,
s.total as staging_revenue,
ABS(f.total - s.total) as difference
FROM revenue_from_fact f
CROSS JOIN revenue_from_staging s
WHERE ABS(f.total - s.total) > 1.00 -- Allow $1 rounding difference
Row Count Validation¶
models:
- name: dim_customers
tests:
# Should have same count as staging
- dbt_utils.equal_rowcount:
compare_model: ref('stg_stripe_customers')
Duplicate Detection¶
-- dbt/tests/assert_no_duplicate_customers.sql
WITH duplicates AS (
SELECT
email,
COUNT(*) as duplicate_count
FROM {{ ref('dim_customers') }}
GROUP BY email
HAVING COUNT(*) > 1
)
SELECT * FROM duplicates
Time Series Validation¶
-- dbt/tests/assert_no_future_dates.sql
SELECT
customer_id,
order_date
FROM {{ ref('fct_orders') }}
WHERE order_date > CURRENT_DATE
Metric Bounds¶
-- dbt/tests/assert_reasonable_metrics.sql
SELECT
customer_id,
lifetime_value
FROM {{ ref('customer_metrics') }}
WHERE lifetime_value > 1000000 -- Flag customers over $1M
Test Organization¶
File Structure¶
dbt/
├── models/
│ ├── staging/
│ │ └── _stg_stripe__schema.yml # Staging tests
│ ├── intermediate/
│ │ └── schema.yml # Intermediate tests
│ └── marts/
│ ├── finance/
│ │ └── schema.yml # Finance mart tests
│ └── marketing/
│ └── schema.yml # Marketing mart tests
└── tests/
├── assert_revenue_reconciliation.sql
├── assert_no_duplicates.sql
└── assert_positive_values.sql
Schema YAML Organization¶
# dbt/models/marts/finance/schema.yml
version: 2
models:
- name: fct_revenue
description: Daily revenue metrics
tests:
# Model-level tests
- dbt_utils.expression_is_true:
expression: "revenue_usd >= 0"
columns:
- name: date
description: Revenue date
tests:
- unique
- not_null
- name: revenue_usd
description: Total revenue in USD
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
- name: fct_mrr
description: Monthly recurring revenue
# Additional tests...
Testing Workflows¶
Development Workflow¶
# 1. Build model
dbt run --select customer_metrics
# 2. Test model
dbt test --select customer_metrics
# 3. Fix issues and repeat
# Edit SQL -> dbt run -> dbt test
CI/CD Workflow¶
#!/bin/bash
# Script: test_and_deploy.sh
# Run all models
dango run
# Run all tests
dbt test --profiles-dir dbt --project-dir dbt
# Check exit code
if [ $? -eq 0 ]; then
echo "All tests passed!"
# Deploy to production
else
echo "Tests failed!"
exit 1
fi
Selective Testing¶
# Test specific model and dependencies
dbt test --profiles-dir dbt --project-dir dbt --select customer_metrics+
# Test all marts
dbt test --profiles-dir dbt --project-dir dbt --select marts.*
# Test modified models only
dbt test --profiles-dir dbt --project-dir dbt --select state:modified+
# Test by tag
dbt test --profiles-dir dbt --project-dir dbt --select tag:daily
Test Results and Debugging¶
View Test Results¶
# Verbose output
dbt test --profiles-dir dbt --project-dir dbt --debug
# Store failures
dbt test --profiles-dir dbt --project-dir dbt --store-failures
# View failures in database
duckdb data/warehouse.duckdb "SELECT * FROM dbt_test__audit.unique_customer_metrics_customer_id"
Investigate Failures¶
-- If unique test fails on customer_id
SELECT
customer_id,
COUNT(*) as occurrence_count
FROM {{ ref('customer_metrics') }}
GROUP BY customer_id
HAVING COUNT(*) > 1
ORDER BY occurrence_count DESC
Test with Limits (Development)¶
Best Practices¶
1. Test Primary Keys¶
Always test primary keys:
2. Test Foreign Keys¶
Verify referential integrity:
3. Test Business Logic¶
Document and test assumptions:
models:
- name: customer_metrics
tests:
# Business rule: customers can't have negative LTV
- dbt_utils.expression_is_true:
expression: "lifetime_value >= 0"
config:
severity: error
4. Test Critical Metrics¶
Add reconciliation tests for important numbers:
5. Use Tags for Organization¶
6. Document Test Rationale¶
tests:
- dbt_utils.expression_is_true:
expression: "gross_margin <= 1.0"
config:
severity: error
# Why: Gross margin can never exceed 100%
7. Set Appropriate Severity¶
tests:
- unique:
config:
severity: error # Critical - must pass
- dbt_utils.not_null_proportion:
at_least: 0.95
config:
severity: warn # Important but not critical
Performance Optimization¶
Limit Test Scope¶
Sample Large Tables¶
Use Incremental Testing¶
Example: Complete Testing Suite¶
Schema Tests¶
# dbt/models/marts/finance/schema.yml
version: 2
models:
- name: fct_revenue
description: Daily revenue aggregated from Stripe charges
# Model-level tests
tests:
- dbt_utils.expression_is_true:
expression: "revenue_usd >= 0"
config:
severity: error
- dbt_utils.expression_is_true:
expression: "transaction_count >= 0"
config:
severity: error
columns:
- name: date
description: Revenue date (UTC)
tests:
- unique
- not_null
- dbt_utils.sequential_values:
interval: 1
datepart: day
- name: revenue_usd
description: Total revenue in USD
tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
- name: transaction_count
description: Number of successful transactions
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "> 0"
- name: unique_customers
description: Count of distinct customers
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "<= transaction_count"
Data Tests¶
-- dbt/tests/finance/assert_revenue_reconciliation.sql
-- Ensures fct_revenue matches raw charges data
WITH revenue_from_fact AS (
SELECT SUM(revenue_usd) as total
FROM {{ ref('fct_revenue') }}
),
revenue_from_staging AS (
SELECT SUM(amount / 100.0) as total
FROM {{ ref('stg_stripe_charges') }}
WHERE status = 'succeeded'
)
SELECT
f.total as fact_total,
s.total as staging_total,
ABS(f.total - s.total) as difference
FROM revenue_from_fact f
CROSS JOIN revenue_from_staging s
WHERE ABS(f.total - s.total) > 1.00
-- dbt/tests/finance/assert_no_future_revenue.sql
-- Revenue dates should not be in the future
SELECT
date,
revenue_usd
FROM {{ ref('fct_revenue') }}
WHERE date > CURRENT_DATE
Troubleshooting¶
Test Failures¶
Problem: Tests fail unexpectedly
Solutions:
-
Check test logic:
-
Examine failed rows:
-
Review recent data changes:
Performance Issues¶
Problem: Tests take too long
Solutions:
-
Add WHERE clauses:
-
Use sampling:
-
Run tests selectively:
False Positives¶
Problem: Tests fail on edge cases
Solutions:
-
Adjust thresholds:
-
Use warnings instead of errors:
Integration with Dango¶
Run Tests After Sync¶
# Load data and run transformations
dango sync && dango run
# Run tests
dbt test --profiles-dir dbt --project-dir dbt
# Check results
if [ $? -eq 0 ]; then
echo "Data quality checks passed"
else
echo "Data quality issues detected"
fi
Complete Pipeline Script¶
#!/bin/bash
# Run full pipeline with tests
# Sync data (auto-generates staging templates)
dango sync --source stripe_payments
# Run transformations
dango run
# Run tests
dbt test --profiles-dir dbt --project-dir dbt
echo "Pipeline complete"
Next Steps¶
- dbt Basics - Learn dbt fundamentals
- Custom Models - Build testable marts
- Staging Models - Understand the foundation
- dbt Testing Documentation - Official dbt testing guide