liminfo

dbt Reference

Free reference guide: dbt Reference

16 results

About dbt Reference

The dbt Reference is a concise, searchable cheat sheet covering the essential dbt (data build tool) syntax for analytics engineers working on SQL-based data transformation pipelines. It covers 5 categories: Models (SELECT-based model definitions, ref(), source(), incremental patterns), Tests (unique, not_null, accepted_values, relationships), Macros (Jinja macro creation with {% macro %}), CLI (dbt run, dbt test, dbt build, dbt compile), and Config (config(), snapshot with SCD Type 2, seed). Every entry includes a real YAML or SQL code example you can use directly.

dbt is the standard tool for the Transform step in ELT pipelines, enabling analytics engineers to write modular, version-controlled SQL. This reference is particularly useful for engineers who work with Snowflake, BigQuery, Redshift, or Databricks and use dbt to build staging models, intermediate models, and marts. The Models section covers the ref() function for inter-model dependencies and the source() function for declaring upstream raw tables — both critical for dbt's lineage graph. The incremental model pattern shows how to use is_incremental() to process only new data on each run rather than reprocessing the entire dataset.

The Tests section covers dbt's four built-in generic tests as defined in schema.yml: unique (no duplicate values in a column), not_null (no NULL values), accepted_values (column values restricted to a defined list), and relationships (referential integrity across models). The Macros section shows how to write reusable Jinja macros — like cents_to_dollars — that can be called across any model SQL. The CLI section covers the most important dbt commands: dbt run (execute models), dbt test (validate data quality), dbt build (run + test in one step), and dbt compile (preview generated SQL). The Config section covers snapshot for SCD Type 2 history tracking and seed for loading static CSV reference data.

Key Features

  • Covers 5 categories: Models, Tests, Macros, CLI, Config with SQL and YAML examples
  • ref() for inter-model dependencies and source() for declaring raw source tables
  • Incremental model pattern with is_incremental() to process only new data on each run
  • All 4 built-in dbt generic tests: unique, not_null, accepted_values, relationships
  • Jinja macro creation with {% macro %} and {% endmacro %} for reusable SQL logic
  • CLI commands: dbt run, dbt test, dbt build (combined), dbt compile for SQL preview
  • config() macro for setting materialization (table, view, incremental), schema, and unique_key
  • Snapshot for SCD Type 2 history tracking and seed for loading static CSV reference data

Frequently Asked Questions

What is dbt and how is it different from traditional ETL tools?

dbt (data build tool) handles the Transform step of ELT pipelines. Unlike traditional ETL tools that move and transform data between systems, dbt runs entirely inside your data warehouse — you write SELECT statements, and dbt compiles and runs them as CREATE TABLE or CREATE VIEW statements. dbt brings software engineering practices to data transformation: version control with Git, automated testing with schema.yml, documentation, and a clear lineage graph showing how data flows between models.

What is the difference between ref() and source() in dbt?

ref() is used to reference another dbt model — it creates a dependency in the DAG and ensures the referenced model runs first. For example, SELECT * FROM {{ ref("stg_orders") }} references the stg_orders.sql model. source() is used to reference a raw table that exists in your data warehouse but is not managed by dbt — it comes from an external ingestion tool like Fivetran. You declare sources in a sources.yml file. Using source() enables dbt to include raw tables in the lineage graph and run source freshness checks.

How do incremental models work in dbt?

An incremental model processes only new or changed data on each run, instead of reprocessing the entire dataset. You wrap the filter condition inside {% if is_incremental() %}...{% endif %} so it only applies on incremental runs (not on the initial build). For example: WHERE event_date > (SELECT MAX(event_date) FROM {{ this }}) filters to only new events. Set materialized="incremental" and unique_key in the config() block. On first run, dbt builds the full table. On subsequent runs, it merges (UPSERT) only new rows.

What are dbt generic tests and how do I define them?

dbt generic tests are data quality checks defined in schema.yml YAML files alongside your models. The four built-in generic tests are: unique (checks no duplicate values in a column), not_null (checks no NULL values), accepted_values (checks values are from a specified list), and relationships (checks referential integrity — that every value in a column exists as a key in another model). You define them under a column's tests: key. Run them with dbt test. Failed tests by default generate warnings; you can configure severity to error to fail the pipeline.

What is a dbt macro and when should I use one?

A dbt macro is a reusable piece of Jinja templating code defined with {% macro name(args) %} ... {% endmacro %} and stored in the macros/ directory. Call it in any model SQL with {{ macro_name(args) }}. Macros eliminate SQL repetition across models — for example, a cents_to_dollars macro converts a column from integer cents to decimal dollars and can be called from any model. Use macros for: unit conversions, repeated CASE statements, dynamic column generation, and cross-database SQL abstraction.

What is the difference between dbt run, dbt test, and dbt build?

dbt run executes all model SQL files and materializes them as tables or views in your warehouse. dbt test runs all tests defined in schema.yml files — both generic tests (unique, not_null, etc.) and any custom singular tests. dbt build runs both dbt run and dbt test together in the correct order: it runs each model and immediately tests it before proceeding to downstream models. Use dbt build in CI/CD pipelines for the most thorough execution. Add --select model_name to target a specific model or subgraph.

What is a dbt snapshot and when should I use it?

A dbt snapshot implements SCD (Slowly Changing Dimension) Type 2 history tracking. It takes a periodic "snapshot" of a source table and tracks how records have changed over time by adding dbt_valid_from and dbt_valid_to columns. When a row changes (detected by a timestamp or a hash of all columns), the snapshot inserts a new row and closes the old one. Use snapshots to preserve history for source tables that do not maintain their own change history — like a CRM contacts table that overwrites records on update.

What are dbt seeds and when should I use them?

dbt seeds load static CSV files from the seeds/ directory into your data warehouse as tables. Run dbt seed to load them. Reference them in models with {{ ref("seed_name") }}. Seeds are ideal for small, static lookup tables that change rarely — such as country codes, state abbreviations, category mappings, or cost center codes. Do not use seeds for large datasets (dbt recommends keeping seeds under a few MB) or for data that changes frequently. For frequently updated lookup data, use a source table ingested by your ELT pipeline instead.