Guides

How to Set Up Coalesce for Data Transformation

Arkzero ResearchApr 28, 20269 min read

Last updated Apr 28, 2026

Coalesce is a visual data transformation platform that compiles SQL pipelines inside your data warehouse — Snowflake, Databricks, or Microsoft Fabric — without requiring you to write raw SQL by hand. You connect your warehouse, drag source tables onto a canvas, add stage and dimension nodes, and Coalesce generates and runs the underlying SQL. Teams using Coalesce report building production-ready pipelines in hours instead of days compared to hand-coded dbt workflows.
How to Set Up Coalesce for Data Transformation

Coalesce is a GUI-first data transformation platform that generates and runs SQL pipelines directly inside Snowflake, Databricks, or Microsoft Fabric. Unlike writing dbt models in a text editor, Coalesce gives you a visual node graph where each step — source, stage, dimension, fact — is a draggable object that Coalesce compiles into warehouse-native SQL.

This guide walks you through connecting your warehouse, setting up your first transformation pipeline, deploying to production, and understanding when Coalesce fits your stack better than alternatives.

What Coalesce Does Differently

Most transformation tools fall into one of two camps: pure code (dbt, SQLMesh) or no-code BI layers (Looker, Tableau). Coalesce sits between them. You work in a visual canvas, but every node is backed by real SQL that you can inspect, override, and version-control via Git.

The practical difference shows up in team size and skill distribution. A dbt project requires every contributor to be comfortable writing Jinja-templated SQL models in a code editor. Coalesce lets an analyst with working SQL knowledge build pipelines visually while an engineer reviews the generated SQL in Git before promotion to production. According to Coalesce's own benchmark data, teams building their first pipeline in Coalesce hit production in an average of 3 days versus 2 to 3 weeks for comparable dbt projects built from scratch.

Coalesce currently supports three warehouse targets: Snowflake (its primary platform), Databricks, and Microsoft Fabric. This matters for teams running mixed workloads or planning a migration — the same pipeline graph can be retargeted without rewriting transformation logic.

Prerequisites

Before starting, you need three things: a Coalesce account (free trial available at coalesce.io), a Snowflake or Databricks or Microsoft Fabric account, and Google Chrome. Coalesce's build interface officially supports only Chrome. Using Safari or Firefox will cause rendering issues in the node graph.

If you are testing without real company data, Snowflake provides sample data (SNOWFLAKE_SAMPLE_DATA) in every account by default. The quickstart in this guide uses that dataset.

Step 1: Connect Your Warehouse

After signing in to Coalesce, you land on the Projects Dashboard. You will see a default project and a Development Workspace. Click Launch to open the workspace.

To connect your data warehouse:

  1. Click the cogwheel icon labeled Build Settings in the top navigation.
  2. Go to Development Workspaces and click the pencil icon next to your workspace.
  3. Under Settings, enter your Snowflake account identifier. You find this in Snowflake's account selector in the lower-left corner — it looks like xy12345.us-east-1.
  4. Under User Credentials, enter your Snowflake username and password.
  5. Click Test Connection. A green checkmark confirms a working connection. Click Save.

For Databricks, the process is the same but you supply a Databricks host URL, HTTP path (from your SQL warehouse settings), and a personal access token instead of a password.

Step 2: Configure Storage Locations

Storage locations are the logical names Coalesce uses to reference databases and schemas in your warehouse. They keep pipeline logic portable — changing the underlying database name means updating the storage mapping once, not hunting through dozens of SQL files.

For a new Snowflake setup:

  1. Go to Build Settings > Storage Mappings.
  2. You will see two default locations: SAMPLE and WORK.
  3. Map SAMPLE to the SNOWFLAKE_SAMPLE_DATA database and schema TPCH_SF1.
  4. Map WORK to a target database and schema you control (for example, ANALYTICS.DEV).

The SAMPLE location is read-only source data. The WORK location is where Coalesce will write your transformed tables during development.

Step 3: Add Source Nodes

The Node Graph is Coalesce's canvas. Each box is a node: a source, a stage, a dimension, or a fact. Arrows show dependencies and data flow.

To add source tables:

  1. In the left sidebar, expand Nodes.
  2. Click the + button and choose Add Sources.
  3. Browse your mapped storage locations, select the tables you want (for example, CUSTOMER, ORDERS, and NATION from TPCH_SF1), and click Add Sources.

You will see three source nodes appear on the canvas. Source nodes are read-only references — Coalesce does not write to them.

Step 4: Create a Stage Node

Stage nodes are intermediate transformation steps. This is where most SQL logic lives: cleaning, renaming, casting, filtering, and joining.

To create a stage node from NATION:

  1. Right-click the NATION source node and choose Add Node > Stage Node.
  2. Double-click the new node to open the Node Editor.
  3. In the Mapping grid, find the column N_NAME.
  4. Double-click the Transform cell for that column and enter LOWER("NATION"."N_NAME").
  5. Set Storage Location to WORK.
  6. Click Create, then Run.

Coalesce will generate and execute a CREATE TABLE statement with your transformation applied. You can preview the result directly in the Node Editor output panel — no need to open a separate SQL client.

Any Snowflake-native function works here: date truncation, JSON extraction with PARSE_JSON, conditional logic with CASE WHEN, and window functions like ROW_NUMBER() OVER (PARTITION BY...). The transform field accepts standard SQL expressions, not a proprietary syntax.

Step 5: Build a Dimension Table

Dimension tables track entity data — customers, products, regions. Coalesce has a built-in Dimension node type that handles Type 1 (overwrite) and Type 2 (slowly changing, historical) patterns automatically.

To create a Type 2 dimension for customers:

  1. Right-click CUSTOMER and create a Stage Node named STG_CUSTOMER. Click Create and Run.
  2. From the canvas, right-click STG_CUSTOMER and choose Add Node > Dimension Node.
  3. Open the DIM_CUSTOMER node editor.
  4. Under Options > Business Key, add C_CUSTKEY. This is the unique identifier for each customer.
  5. Under Change Tracking, add C_ADDRESS and C_PHONE. These are the columns Coalesce will track for changes over time.
  6. Click Create and Run.

With those two settings, Coalesce automatically generates a Type 2 SCD table — new rows for changed addresses or phone numbers, with effective date columns and a surrogate key. Writing this logic manually in SQL takes roughly 40 to 60 lines. In Coalesce it is a checkbox.

Step 6: Create a Fact Table

Fact tables store transactional data linked to dimensions via foreign keys. The Coalesce Fact node handles merge logic (insert-or-update based on a business key) without custom SQL.

To build a fact table for orders:

  1. Create a Stage Node from ORDERS named STG_ORDERS.
  2. In the STG_ORDERS editor, delete all columns except O_ORDERKEY, O_CUSTKEY, and O_TOTALPRICE.
  3. From the left panel, drag DIM_CUSTOMER_KEY from your DIM_CUSTOMER node into the STG_ORDERS mapping grid.
  4. Go to the Join tab, clear any existing SQL, and click Generate Join. Replace the /_COLUMN_/ placeholder with O_CUSTKEY.
  5. Click Create and Run on STG_ORDERS.
  6. Right-click STG_ORDERS and add a Fact Node named FCT_ORDERS.
  7. Under Options > Business Key, add O_ORDERKEY.
  8. Click Create and Run.

Coalesce generates a MERGE statement that upserts rows based on O_ORDERKEY. You can verify the output by running this query directly in Snowflake:

SELECT DIM.C_NAME AS CUSTOMER_NAME,
       SUM(FCT.O_TOTALPRICE) AS TOTAL_PRICE
FROM ANALYTICS.DEV.FCT_ORDERS FCT
INNER JOIN ANALYTICS.DEV.DIM_CUSTOMER DIM
  ON FCT.DIM_CUSTOMER_KEY = DIM.DIM_CUSTOMER_KEY
GROUP BY DIM.C_NAME;

Step 7: Deploy to Production

Everything above runs in your Development Workspace. To promote to production:

  1. Click the Deploy tab in the top navigation.
  2. Create a Production Environment, mapping storage locations to your production database and schema (for example, ANALYTICS.PROD).
  3. Select the nodes you want to deploy and click Deploy.

Coalesce runs the same SQL it generated in development against your production storage locations. No manual copy-paste, no environment-specific SQL edits. If you have connected Git (GitHub, GitLab, or Azure DevOps), the deployment can be triggered via CI/CD pipeline instead of the UI.

For teams with separate QA environments, you create an additional environment with its own storage mapping between Dev and Prod. The same node graph targets all three without changes.

When Coalesce Fits Your Stack

Coalesce makes the most sense for three situations. First, mixed-skill data teams where analysts own transformation logic but need engineers to review and promote it. The visual canvas gives analysts confidence without writing raw Jinja, while the Git-backed SQL gives engineers the review surface they need.

Second, Snowflake-heavy organizations already buying into the Snowflake ecosystem — Coalesce has a Partner Connect integration that provisions a trial account in under five minutes.

Third, teams migrating from legacy ETL tools like Informatica or Talend to a warehouse-native pattern. The visual metaphor maps closely enough to what they already know that onboarding time is short.

Coalesce is less suited to teams who live in the terminal and want pure-code workflows, or teams running open-source stacks where paying for managed tooling is not an option.

If you want to explore the transformed data without setting up a separate BI tool, VSLZ lets you upload or connect the output tables and run plain-English queries directly against them — useful for validating pipeline output before promotion to production.

Summary

Coalesce gives you a visual DAG interface over warehouse-native SQL, with built-in node types for dimensions, facts, and slowly changing dimensions. The setup takes under 30 minutes for a Snowflake connection, and a working fact table with SCD Type 2 tracking can be running in under two hours. The deployment model — environment-based storage mappings plus Git integration — handles the dev-to-prod promotion that makes most visual tools impractical for production use.

FAQ

Is Coalesce a replacement for dbt?

Coalesce and dbt solve the same problem — transforming data inside a warehouse — but with different interfaces. dbt is code-first: you write SQL models and Jinja templates in a text editor. Coalesce is GUI-first: you build a visual node graph and Coalesce generates the SQL. Teams comfortable with code and version-control workflows typically prefer dbt for its flexibility. Teams with mixed SQL skill levels often find Coalesce faster to onboard because the visual interface lowers the barrier to building and reviewing pipelines. Both tools generate warehouse-native SQL and support Git integration.

Which data warehouses does Coalesce support?

As of 2026, Coalesce supports three warehouse targets: Snowflake, Databricks, and Microsoft Fabric. Snowflake is the primary platform with the most complete feature set and the fastest path to get started via Snowflake Partner Connect. Databricks and Microsoft Fabric support was added to serve organizations running mixed warehouse environments. The same pipeline graph can target different warehouses by updating the storage mappings and connection settings.

Does Coalesce require writing SQL?

Not for basic pipelines. Source, stage, dimension, and fact nodes are configured through a GUI — you select tables, drag columns, set business keys, and configure change tracking without writing SQL. The transform field in the mapping grid accepts SQL expressions when you need custom logic (date truncation, JSON extraction, conditional columns), but it is optional. Engineers can also inspect and override the generated SQL in the node editor. For complex joins and derived columns, some SQL knowledge helps, but a working pipeline with standard dimension and fact tables can be built entirely through the interface.

How does Coalesce handle deployment to production?

Coalesce uses environment-based deployment. You define separate environments (Dev, QA, Production) each with their own storage location mappings pointing to the appropriate databases and schemas in your warehouse. When you deploy, Coalesce runs the same compiled SQL against the target environment's storage locations. For CI/CD integration, Coalesce exposes an API that allows deployment to be triggered from GitHub Actions, GitLab CI, or Azure DevOps pipelines. This means you can gate production deploys behind code review and automated tests before Coalesce applies changes.

What is the difference between a Type 1 and Type 2 dimension in Coalesce?

In Coalesce's Dimension node, a Type 1 dimension overwrites existing rows when tracked columns change — you always see the current value but lose history. A Type 2 dimension inserts a new row when a tracked column changes and marks the old row as expired — you retain the full change history with effective dates and a current-record flag. You switch between the two by adding or removing Change Tracking columns in the node's Options panel. Type 2 is the default for customer or product dimensions where historical accuracy matters for reporting (for example, tracking what address a customer had at the time of a specific order).

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026