Data Sources¶

Connect to CSV files, APIs, and databases through Dango's unified configuration.

Overview¶

Dango supports multiple types of data sources through dlt (data load tool). Whether you're working with local CSV files, cloud APIs, or existing databases, Dango provides a unified configuration interface.

Wizard vs Manual Sources

Wizard-supported sources (7 sources): Can be added via dango source add interactive wizard:

CSV, Stripe, Google Sheets, Facebook Ads, Google Analytics 4, Google Ads, dlt Native (advanced)

Manual sources: Require dlt_native configuration in sources.yml:

HubSpot, Notion, Asana, databases, and other dlt sources
See Custom Sources for manual setup guide

60+ additional sources available via dlt_native for advanced users.

Available Source Types:

CSV Files - Upload and auto-sync flat files (wizard-supported)
OAuth Sources - Google Sheets, Facebook Ads, GA4, Google Ads (wizard-supported)
Stripe - Payment data (wizard-supported)
Database Sources - PostgreSQL, MySQL, etc. via dlt_native (experimental)
Custom Sources - Build your own or use any dlt verified source via dlt_native

For dlt Users¶

If you're already familiar with dlt (data load tool), here's how Dango relates:

Dango wraps dlt with:

YAML configuration instead of Python scripts
Automatic dbt staging model generation
Unified CLI (dango sync) for all sources
Web UI for monitoring and management

What stays the same:

Credentials in .dlt/secrets.toml (same format)
All dlt verified sources available via dlt_native
Standard dlt decorators (@dlt.source, @dlt.resource)

When to use what:

Scenario	Use
Standard sources (Stripe, Google Sheets, etc.)	Dango wizard or YAML config
Custom API with simple logic	Dango `dlt_native` + Python file
Complex pipelines, custom destinations	Pure dlt (Dango not needed)

Learn more:

Custom Sources - "dlt vs. Dango Workflow" comparison
Database Sources - "How This Differs from Standard dlt" table
dlt Documentation - Official dlt docs for advanced topics

Quick Start¶

Add Your First Source¶

Choose your source type and follow the guide:

CSV FileOAuth (Google Sheets)DatabaseCustom API

# Recommended: Use the wizard
dango source add
# Select "CSV Files" and follow prompts

Or configure manually in .dango/sources.yml:

sources:
  - name: sales_data
    type: csv
    enabled: true
    csv:
      directory: data/uploads/sales_data
      file_pattern: "*.csv"

Then copy files and sync:

cp my_sales.csv data/uploads/sales_data/
dango sync --source sales_data

Learn more →

# Interactive setup
dango source add
# Select "Google Sheets" from the list
# Follow OAuth flow in browser

# Sync
dango sync --source my_sheets

Learn more →

# Configure .dlt/secrets.toml
[sources.sql_database]
credentials = "postgresql://user:pass@host:5432/db"

# Edit .dango/sources.yml
sources:
  - name: my_postgres
    type: dlt_native
    dlt_native:
      source_module: sql_database
      source_function: sql_database
      function_kwargs:
        schema: "public"

# Sync
dango sync --source my_postgres

Learn more →

# custom_sources/my_api.py
import dlt
import requests

@dlt.source
def my_api():
    @dlt.resource(name="data")
    def get_data():
        return requests.get("https://api.example.com/data").json()
    return [get_data()]

# .dango/sources.yml
sources:
  - name: my_api
    type: dlt_native
    dlt_native:
      source_module: my_api
      source_function: my_api

Learn more →

Source Type Guides¶

CSV Files

Upload and sync CSV files with automatic schema detection and file watching.
- Simple file-based data loading
- Auto-sync on file changes
- Multiple delimiters supported
CSV Files Guide
OAuth Sources

Connect to cloud services using OAuth 2.0 authentication.
- Google Sheets, GA4, Facebook Ads
- Automatic token management
- Browser-based authentication
OAuth Sources Guide
Database Sources

Connect to PostgreSQL, MySQL, SQL Server via dlt_native.
- Full table or incremental loading
- SSL/TLS support
- Experimental (not fully tested)
Database Sources Guide
Custom Sources

Build custom integrations using Python and dlt.
- REST APIs
- Web scraping
- Custom data formats
Custom Sources Guide
Built-in Sources

Explore wizard-supported sources and dlt_native options.
- 7 wizard-supported sources
- 60+ available via dlt_native
- Community-maintained dlt sources
Built-in Sources Catalog

Common Workflows¶

Adding a New Source¶

Choose source type based on your data
Configure credentials (if needed)
Add to sources.yml or use dango source add
Sync with dango sync --source <name>
Verify in Metabase or with SQL

Managing Multiple Sources¶

# .dango/sources.yml
version: '1.0'
sources:
  # Production Stripe data
  - name: stripe_prod
    type: stripe
    enabled: true
    stripe:
      stripe_secret_key_env: STRIPE_PROD_KEY

  # Google Sheets for manual data
  - name: manual_overrides
    type: google_sheets
    enabled: true

  # PostgreSQL analytics database
  - name: analytics_db
    type: dlt_native
    enabled: true
    dlt_native:
      source_module: sql_database
      source_function: sql_database

  # Custom internal API
  - name: internal_api
    type: dlt_native
    enabled: true
    dlt_native:
      source_module: internal_api
      source_function: internal_api

Sync All Sources¶

# Sync all enabled sources
dango sync

# Sync specific source
dango sync --source stripe_prod

# List all sources
dango source list

Data Flow¶

Understanding how data flows from sources to your warehouse:

graph LR
    A[Data Source] --> B[dlt]
    B --> C[Raw Layer]
    C --> D[DuckDB]
    D --> E[dbt Staging]
    E --> F[dbt Marts]
    F --> G[Metabase]

    style A fill:#e1f5ff
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e9
    style E fill:#fff9c4
    style F fill:#ffebee
    style G fill:#e0f2f1

Source - External API, database, or file
dlt - Fetches and normalizes data
Raw Layer - Source data as-loaded in DuckDB
Staging - Clean starting point (auto-generated by Dango)
Marts - Business logic (custom SQL models you write)
Metabase - Dashboards and queries

Learn more about data layers →

Source Configuration¶

sources.yml Structure¶

version: '1.0'
sources:
  - name: unique_source_name      # Identifier
    type: csv                      # Source type
    enabled: true                  # Toggle sync
    description: "Optional description"
    csv:                           # Type-specific config
      directory: data/uploads/unique_source_name
      file_pattern: "*.csv"

Common Parameters¶

Parameter	Required	Description
`name`	Yes	Unique identifier for this source
`type`	Yes	Source type: `csv`, `stripe`, `dlt_native`, etc.
`enabled`	No	Whether to include in sync (default: `true`)
`description`	No	Human-readable description

Credentials Management¶

Never commit credentials! Use one of these methods:

Recommended: .env file (persists across sessions)

# Create or edit .env file (gitignored by default)
echo 'MY_API_KEY=your-key-here' >> .env

Or .dlt/secrets.toml (gitignored credential storage)

[sources.stripe]
api_key = "sk_live_..."

Or environment variables (current session only)

export MY_API_KEY="your-key-here"

Testing Status¶

Source Type	Status	Notes
CSV	✅ Tested	Production-ready
Stripe	✅ Tested	All resources supported
Google Sheets	✅ Tested	OAuth flow verified
Google Analytics 4	✅ Tested	OAuth flow verified
Facebook Ads	✅ Tested	OAuth flow verified
Google Ads	🔄 In Progress	Wizard-supported, testing ongoing
dlt_native	✅ Works	Registry bypass verified
Database sources	⚠️ Experimental	Uses dlt sql_database, not fully tested
Other dlt sources	⚠️ Experimental	Available via dlt_native, see Built-in Sources

Best Practices¶

1. Use Descriptive Names¶

# Good
- name: stripe_production_payments
- name: marketing_facebook_ads
- name: finance_google_sheets

# Avoid
- name: source1
- name: data

2. Enable Only What You Need¶

Disable unused sources to speed up sync:

- name: old_source
  enabled: false  # Keeps config but skips sync

3. Document Your Sources¶

- name: crm_export
  type: csv
  description: "Weekly CRM export from sales team, updated every Monday"
  csv:
    directory: data/uploads/crm_export
    file_pattern: "*.csv"
    notes: "Export from Salesforce > Reports > Weekly CRM"

4. Use Incremental When Possible¶

For large datasets, configure incremental loading:

- name: large_table
  type: dlt_native
  dlt_native:
    source_module: sql_database
    source_function: sql_database
    function_kwargs:
      incremental:
        cursor_column: "updated_at"
        initial_value: "2024-01-01"

5. Monitor Source Health¶

# Validate all sources
dango validate

# Check specific source
dango source list

Troubleshooting¶

Source Not Syncing¶

Check enabled: true in sources.yml
Verify credentials in .dlt/secrets.toml or environment
Run dango validate to see errors
Check network connectivity

Authentication Failures¶

API keys: Verify not expired, check permissions
OAuth: Re-authenticate with dango source add
Database: Test connection outside Dango

Schema Mismatches¶

When APIs change: 1. Run dango sync (schema auto-updates for API sources, staging models regenerated) 2. Update custom dbt models if needed

CSV schema changes

For CSV sources, schema is fixed on first load. If your CSV schema changes, remove and re-add the source.

Performance Issues¶

Use incremental loading for large tables
Sync sources individually rather than all at once
Check API rate limits
Consider upgrading to paid API tiers

Next Steps¶

CSV Files

Start with the simplest source type - local CSV files.

CSV Files Guide
Built-in Sources

Explore wizard-supported and dlt_native sources.

Built-in Sources
Transformations

Transform your loaded data with dbt.

Transformations