Skip to content

SQL Queries

DuckDB SQL patterns for querying Dango data in Metabase.


Overview

Metabase's SQL editor provides direct access to your DuckDB warehouse. This guide covers Dango-specific patterns and DuckDB syntax essentials.

Full DuckDB Reference

For complete SQL syntax and functions, see the DuckDB documentation.


Quick Start

Open SQL Editor

  1. Navigate to http://localhost:3000
  2. Click "+ New" → "SQL query"
  3. Select "DuckDB" database
  4. Write SQL and click "Get Answer"

First Queries

-- List all marts tables
SHOW TABLES FROM marts;

-- Query a marts table
SELECT *
FROM marts.customer_metrics
ORDER BY lifetime_value DESC
LIMIT 10;

Dango Data Schemas

Available Schemas

Schema Purpose Example Tables
raw Single-table sources (CSV) raw.csv_uploads
raw_* Multi-table sources (APIs) raw_stripe.charges, raw_stripe.customers
staging Auto-generated staging models staging.stg_stripe_charges
intermediate Custom business logic intermediate.int_customer_orders
marts Analytics-ready tables marts.customer_metrics

Schema-Qualified Names

Always use schema prefix:

-- Correct: Schema.table
SELECT * FROM marts.revenue_by_month

-- Avoid: Unqualified (may fail)
SELECT * FROM revenue_by_month

Explore Tables

-- Tables in specific schema
SHOW TABLES FROM marts;

-- Show columns and types
DESCRIBE marts.customer_metrics;

Querying Dango Data Layers

Raw Data (dlt Source)

Query unprocessed data from dlt:

SELECT
    id,
    customer,
    amount / 100.0 as amount_usd,
    status,
    created,
    _dlt_load_id,
    _dlt_extracted_at
FROM raw_stripe.charges
WHERE created >= CURRENT_DATE - INTERVAL 30 DAY;

dlt Metadata Columns:

Column Description
_dlt_load_id Unique ID for each dlt sync run
_dlt_extracted_at Timestamp when data was extracted
_dlt_id Unique row identifier

Use cases: Debugging source data, auditing data lineage, comparing raw vs. transformed.

Staging Data (Auto-Generated)

Query cleaned, deduplicated data:

SELECT
    id,
    customer_id,
    amount / 100.0 as amount_usd,
    status,
    created
FROM staging.stg_stripe_charges
WHERE status = 'succeeded'
ORDER BY created DESC;

Characteristics: Deduplicated, standardized column names, no business logic.

Marts Data (Analytics-Ready)

Query business-ready tables:

SELECT
    customer_id,
    email,
    lifetime_value,
    lifetime_orders
FROM marts.customer_metrics
WHERE lifetime_value > 1000
ORDER BY lifetime_value DESC;

Best practice: Always prefer marts for dashboards—faster and more meaningful.


DuckDB SQL Essentials

Date Functions

Essential for time-series analysis:

-- Current date/time
SELECT CURRENT_DATE                    -- 2024-12-09
SELECT CURRENT_TIMESTAMP               -- 2024-12-09 14:30:00

-- Date truncation (for grouping)
SELECT DATE_TRUNC('month', created) as month

-- Date arithmetic
SELECT CURRENT_DATE - INTERVAL 7 DAY   -- 7 days ago
SELECT CURRENT_DATE - INTERVAL 1 MONTH -- 1 month ago

-- Extract parts
SELECT EXTRACT(YEAR FROM created) as year
SELECT EXTRACT(MONTH FROM created) as month

Common Patterns

Monthly aggregation:

SELECT
    DATE_TRUNC('month', created) as month,
    COUNT(*) as customer_count,
    SUM(lifetime_value) as total_ltv
FROM marts.customer_metrics
GROUP BY month
ORDER BY month DESC

Date filtering:

SELECT *
FROM marts.customer_metrics
WHERE created >= CURRENT_DATE - INTERVAL 30 DAY
  AND created < CURRENT_DATE

Stripe Amount Conversion

Stripe stores amounts in cents:

SELECT
    id,
    amount / 100.0 as amount_usd
FROM raw_stripe.charges

Type Casting

-- String to integer
SELECT '123'::INTEGER

-- String to date
SELECT DATE '2024-12-09'

Query Variables (Filters)

Make queries interactive with Metabase variables:

Basic Variable

SELECT *
FROM marts.customer_metrics
WHERE email = {{email_address}}

Date Range

SELECT *
FROM marts.customer_metrics
WHERE created >= {{start_date}}
  AND created < {{end_date}}

Optional Filters

Use [[double brackets]] for optional clauses:

SELECT *
FROM marts.customer_metrics
WHERE 1=1
  [[AND region = {{region}}]]
  [[AND created >= {{start_date}}]]
ORDER BY lifetime_value DESC

If variable is empty, clause is omitted.

Query Variables Reference

For detailed variable configuration, see Metabase SQL Parameters.


Common Analytics Queries

Monthly Revenue Trend

SELECT
    DATE_TRUNC('month', created) as month,
    COUNT(DISTINCT customer_id) as customers,
    SUM(amount / 100.0) as revenue
FROM staging.stg_stripe_charges
WHERE status = 'succeeded'
GROUP BY month
ORDER BY month;

Top Customers

SELECT
    customer_id,
    email,
    lifetime_value,
    lifetime_orders
FROM marts.customer_metrics
ORDER BY lifetime_value DESC
LIMIT 10;

Period Comparison

SELECT
    'This Month' as period,
    COUNT(*) as customers,
    SUM(lifetime_value) as total_ltv
FROM marts.customer_metrics
WHERE created >= DATE_TRUNC('month', CURRENT_DATE)

UNION ALL

SELECT
    'Last Month' as period,
    COUNT(*) as customers,
    SUM(lifetime_value) as total_ltv
FROM marts.customer_metrics
WHERE created >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL 1 MONTH)
  AND created < DATE_TRUNC('month', CURRENT_DATE);

Year-over-Year Growth

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', created) as month,
        SUM(amount / 100.0) as revenue
    FROM staging.stg_stripe_charges
    WHERE status = 'succeeded'
    GROUP BY month
)

SELECT
    month,
    revenue,
    LAG(revenue, 12) OVER (ORDER BY month) as revenue_last_year,
    (revenue - LAG(revenue, 12) OVER (ORDER BY month)) /
        LAG(revenue, 12) OVER (ORDER BY month) * 100 as yoy_growth_pct
FROM monthly_revenue
ORDER BY month DESC;

Query Optimization

Use Marts Instead of Raw

-- Bad: Aggregating raw data every time
SELECT DATE_TRUNC('month', created) as month, SUM(amount) as revenue
FROM raw_stripe.charges
GROUP BY month;

-- Good: Query pre-aggregated mart
SELECT month, total_revenue
FROM marts.revenue_by_month;

Rule: Create a dbt mart for repeated aggregations.

Limit Result Sets

-- Always use LIMIT for exploration
SELECT * FROM staging.stg_stripe_charges LIMIT 100;

Filter Early in Joins

-- Good: Filter before join
SELECT c.*, o.*
FROM staging.stg_customers c
JOIN (
    SELECT * FROM staging.stg_orders
    WHERE created >= CURRENT_DATE - INTERVAL 30 DAY
) o ON c.id = o.customer_id;

Troubleshooting

Table Not Found

Error: Table "revenue_by_month" not found

Solution: Add schema prefix:

SELECT * FROM marts.revenue_by_month

Column Not Found

Solution: Check spelling with DESCRIBE:

DESCRIBE marts.customer_metrics;

Type Mismatch

Error: Cannot compare INTEGER with VARCHAR

Solution: Cast to matching type:

WHERE customer_id = 123           -- Not '123'
WHERE customer_id = '123'::INTEGER

Division by Zero

SELECT
    revenue,
    CASE WHEN order_count > 0 THEN revenue / order_count ELSE 0 END as avg_order_value
FROM marts.revenue_by_month;

Best Practices

  1. Always qualify table names: marts.customer_metrics not customer_metrics
  2. Use meaningful aliases: c for customers, o for orders, not a, b
  3. Handle NULLs explicitly: COALESCE(region, 'Unknown')
  4. Comment complex logic: Explain business rules in SQL comments
  5. Format for readability: One clause per line, consistent indentation

Next Steps

  • Creating Dashboards


    Build interactive dashboards using your SQL queries.

    Creating Dashboards

  • Transformations


    Create marts tables with dbt for better query performance.

    Transformations

  • DuckDB Documentation


    Complete SQL reference: window functions, JSON, regex, arrays.

    DuckDB Docs

  • Metabase SQL Guide


    SQL editor features, variables, and native queries.

    Metabase Docs