Guides

How to Set Up dbt Core: A Beginner's Guide

Arkzero ResearchApr 23, 20267 min read

Last updated Apr 23, 2026

To set up dbt Core, install the adapter for your data warehouse with pip, configure a profiles.yml file with your connection credentials, run dbt init to scaffold a project folder, then write SQL SELECT statements in the models/ directory and execute dbt run to materialize them. dbt Core 1.11.6 is the current stable release as of April 2026. The full setup takes under 30 minutes for most warehouse connections.
A data engineer working at a modern desk with multiple monitors showing code

To set up dbt Core, install the adapter for your data warehouse with pip, configure a profiles.yml file with your connection credentials, run dbt init to scaffold a project folder, then write SQL SELECT statements in the models/ directory and execute dbt run to materialize them. dbt Core 1.11.6 is the current stable release as of April 2026, compatible with dbt-postgres 1.10.0 and dbt-bigquery 1.11.0. The full setup takes under 30 minutes for most warehouse connections.

What dbt Does (and When to Use It)

Most analysts start by writing SQL queries directly in a warehouse console: pull from orders, join to customers, filter by date. This works until queries proliferate, logic gets duplicated across five dashboards, and nobody can tell which version is correct.

dbt solves the duplication problem by making transformations explicit and testable. Instead of ad-hoc queries, you write SQL SELECT statements and save them as model files. dbt executes them against your warehouse and materializes each result as a table or view. Because each model can reference another with a {{ ref() }} call, dbt builds a dependency graph and runs models in the correct order automatically.

Use dbt Core when your team writes SQL regularly, you have a cloud data warehouse such as BigQuery, Snowflake, Redshift, or Postgres, and you want transformations that are version-controlled, tested, and reusable. If you are cleaning a one-off CSV file, a simpler tool is faster. dbt pays dividends when the same transformation logic serves multiple reports or teams.

dbt Core is the open-source CLI version, free to use, and runs locally or inside any CI pipeline. dbt Cloud is the managed SaaS platform built on top of dbt Core that adds a browser IDE, scheduled jobs, and a hosted lineage graph. This guide covers dbt Core only.

Prerequisites

Before starting, confirm you have:

  • Python 3.8 or later (check with python --version)
  • pip (bundled with Python 3.8+)
  • Access to a data warehouse: local or hosted Postgres, BigQuery, Snowflake, or Redshift
  • A terminal application

A local Postgres instance works well for learning if you do not have a cloud warehouse account.

Step 1: Install dbt Core

Adapters are separate pip packages that bundle dbt-core as a dependency. Install the one matching your warehouse:

# For Postgres
pip install dbt-postgres

# For BigQuery
pip install dbt-bigquery

# For Snowflake
pip install dbt-snowflake

Confirm the installation:

dbt --version

You should see Core: 1.11.x and the adapter version listed below it.

A virtual environment prevents dependency conflicts with other Python projects:

python -m venv dbt-env
source dbt-env/bin/activate   # Windows: dbt-env\Scripts\activate
pip install dbt-postgres

Step 2: Configure Your Connection Profile

dbt reads warehouse credentials from ~/.dbt/profiles.yml. This file lives outside your project directory so credentials stay out of version control.

For a local Postgres connection:

my_project:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: your_username
      password: your_password
      port: 5432
      dbname: analytics
      schema: dbt_dev
      threads: 4

For BigQuery using a service account key file:

my_project:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account
      project: your-gcp-project-id
      dataset: dbt_dev
      keyfile: /path/to/keyfile.json
      threads: 4

The schema (Postgres) or dataset (BigQuery) value is where dbt materializes your models during development. Using a name like dbt_dev keeps development output clearly separated from production tables.

Step 3: Initialize Your Project

Run dbt init from any directory and provide a project name when prompted:

dbt init my_project

dbt creates this folder structure:

my_project/
  dbt_project.yml    # project-level configuration
  models/            # SQL transformation files live here
  tests/             # custom singular tests
  macros/            # reusable Jinja functions
  seeds/             # CSV files dbt can load as tables

Open dbt_project.yml and confirm that the profile field matches the name in your profiles.yml:

name: 'my_project'
profile: 'my_project'

Test the connection before writing any models:

cd my_project
dbt debug

A successful run ends with All checks passed!. If you see a connection error, recheck the host, port, and credentials in profiles.yml.

Step 4: Write Your First Model

Remove the sample models in models/example/ and create models/orders_summary.sql:

with orders as (
    select
        customer_id,
        order_date,
        total_amount,
        status
    from {{ source('raw', 'orders') }}
),

completed_orders as (
    select
        customer_id,
        date_trunc('month', order_date) as month,
        sum(total_amount)               as monthly_revenue,
        count(*)                        as order_count
    from orders
    where status = 'completed'
    group by 1, 2
)

select * from completed_orders

The {{ source() }} function references raw warehouse tables. Define it in models/sources.yml:

version: 2

sources:
  - name: raw
    schema: public
    tables:
      - name: orders

Run the model:

dbt run

dbt executes the SQL and creates a view named orders_summary in your development schema. To materialize as a physical table instead, add a config block at the top of the file:

{{ config(materialized='table') }}

When one model needs the output of another, use {{ ref('model_name') }} instead of a hardcoded table name. dbt resolves the reference to the correct schema and ensures the upstream model runs first.

Step 5: Add Tests

dbt includes four built-in generic tests: not_null, unique, accepted_values, and relationships (foreign key checks). Define them in a YAML file alongside your model.

Create models/orders_summary.yml:

version: 2

models:
  - name: orders_summary
    columns:
      - name: customer_id
        tests:
          - not_null
      - name: month
        tests:
          - not_null
      - name: monthly_revenue
        tests:
          - not_null

Add tests on the source table inside models/sources.yml:

sources:
  - name: raw
    schema: public
    tables:
      - name: orders
        columns:
          - name: order_id
            tests:
              - unique
              - not_null
          - name: status
            tests:
              - accepted_values:
                  values: ['completed', 'pending', 'cancelled']

Run the tests:

dbt test

dbt generates and runs a SQL query for each assertion. A failing test reports the exact row count that violated the rule. Adding source tests from the start means upstream data problems surface in dbt output rather than silently flowing into dashboards.

Running in Production

For local work, dbt run is sufficient. For automated production runs, two straightforward paths exist.

GitHub Actions is free for public repos. A workflow file that runs dbt deps && dbt run && dbt test on a daily cron schedule and posts failures to Slack covers the needs of most small teams without additional infrastructure.

dbt Cloud is the fastest option if you want a managed scheduler, a browser-based IDE, and a lineage graph UI without configuring CI yourself. The free Developer tier supports one project and one daily scheduled job, which is enough for a single analyst or small team getting started.

To generate documentation locally, run:

dbt docs generate
dbt docs serve

This opens a browser with an interactive data lineage graph and a searchable catalog of all your models, columns, and tests.

What to Build Next

Once the first project is working, staging models and snapshots deliver the most value next. Staging models sit between raw source tables and downstream transformation models, handling column renaming, type casting, and basic cleaning in one reusable place. Snapshots capture slowly-changing data, such as a customer subscription tier, as it changes over time.

For teams that want to explore the transformed output without writing more SQL, connecting VSLZ directly to the warehouse lets analysts ask questions in plain English and get charts and summaries from the dbt-built tables without additional setup.

FAQ

What is dbt Core and how is it different from dbt Cloud?

dbt Core is the open-source command-line tool that runs SQL transformations against your data warehouse. It is free to install and use locally or in any CI environment. dbt Cloud is the managed SaaS platform built on top of dbt Core that adds a browser IDE, built-in job scheduling, and a hosted lineage graph. Most teams start with dbt Core and move to dbt Cloud when they need automated scheduling or a shared UI for the whole analytics team.

Which data warehouses does dbt Core support?

dbt Core supports all major cloud data warehouses through adapter packages installed with pip: dbt-postgres for PostgreSQL and Amazon Redshift, dbt-bigquery for Google BigQuery, dbt-snowflake for Snowflake, dbt-duckdb for DuckDB, and dbt-spark for Apache Spark and Databricks. Each adapter translates dbt SQL into warehouse-specific syntax. As of April 2026, over 30 community and official adapters are available.

Do I need to know Python to use dbt Core?

No. dbt Core is installed with pip and run from the command line, but the transformation logic you write is standard SQL. Python knowledge is only needed if you write custom macros using Jinja templating or build custom generic tests. Most analysts work entirely in SQL and YAML for model definitions and test configurations.

Where does dbt store credentials and connection details?

dbt reads credentials from a profiles.yml file located at ~/.dbt/profiles.yml on your local machine. This file lives outside the project directory so it is not committed to version control. Each profile defines one or more targets such as dev and prod, with the warehouse type, host, database name, schema, and authentication details. The dbt_project.yml file inside your project references the profile by name.

How do I run dbt transformations automatically on a schedule?

The two most common approaches are GitHub Actions and dbt Cloud. With GitHub Actions, create a workflow file that installs dbt, runs dbt deps && dbt run && dbt test on a cron schedule, and stores the profiles.yml credentials as encrypted repository secrets. dbt Cloud offers a managed scheduler with a UI, error notifications, and run history, with a free Developer tier supporting one project and one daily scheduled job.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026