dbt
What it is: Data transformation tool that enables analytics engineers to transform data using SQL. Version control for data transformations with testing and documentation.
What It Does Best
SQL-first transformations. Write SELECT statements, dbt handles the CREATE TABLE. Version control and deploy like software.
Testing built-in. Data quality tests as code. Unique, not-null, relationships, custom tests. Catch bad data early.
Documentation automation. Auto-generate data lineage and documentation from your code. Keep docs in sync with reality.
Key Features
Jinja templating: DRY code with macros and variables
Incremental models: Process only new/changed data for efficiency
Data lineage: Automatic DAG visualization of dependencies
Tests and assertions: Built-in and custom data quality tests
Packages: Reusable transformation libraries from community
Pricing
dbt Core: Free, open source (run locally or on your infra)
dbt Cloud Developer: Free for 1 developer seat
dbt Cloud Team: $100/developer/month
dbt Cloud Enterprise: Custom pricing (SSO, advanced features)
When to Use It
✅ Transforming data in warehouse (Snowflake, BigQuery, Redshift)
✅ Analytics engineering team exists
✅ Need version control and testing for SQL
✅ Building data marts from raw data
✅ Want software engineering practices for data
When NOT to Use It
❌ Need to extract or load data (use Fivetran, Airbyte)
❌ Real-time streaming transformations (use Spark, Flink)
❌ Non-SQL transformations (use Airflow, Python)
❌ Very simple transformations (might be overkill)
❌ No data warehouse (dbt needs a target database)
Common Use Cases
Data modeling: Transform raw data into analytics-ready models
Business metrics: Define metrics once, use everywhere
Data quality: Test data integrity and freshness
Documentation: Self-documenting data warehouse
Incremental processing: Efficient updates of large tables
dbt vs Alternatives
vs Dataform: Similar tools, Dataform acquired by Google (BigQuery focused)
vs SQL in Airflow: dbt better for pure transformations, Airflow better for orchestration
vs stored procedures: dbt version controlled and testable, procedures harder to maintain
Unique Strengths
Analytics engineering focus: Purpose-built for data transformation workflow
Testing framework: Best-in-class data quality testing
Community packages: Pre-built transformations for common tasks
Documentation as code: Docs that stay up to date automatically
Bottom line: Essential tool for modern data stack. Brings software engineering best practices to analytics. The T in ELT. If you're doing SQL transformations in your warehouse, use dbt. Game-changer for data quality and collaboration.