How to Set Up Cube for Agentic Analytics
Last updated Apr 25, 2026

What is Cube and Why It Matters Now
Cube is an open-source semantic layer for data analytics. It connects to your database, lets you define measures and dimensions in YAML or JavaScript, and exposes those definitions via REST, GraphQL, and SQL APIs to any BI tool, application, or AI agent that needs them.
The core idea is consistency. Without a semantic layer, every analyst and every AI prompt re-derives the same metric from raw SQL, often getting slightly different answers depending on how the joins are written. Cube forces one canonical definition per metric. When you change how "monthly revenue" is calculated, every dashboard and every agent query inherits the update automatically.
In February 2026, Gartner named Cube a Representative Vendor in its Market Guide for Agentic Analytics. The guide's key finding: "Semantic and policy alignment is foundational for effective agentic analytics" and predicted that 60% of agentic analytics projects relying solely on Model Context Protocol (MCP) without a semantic layer would fail by 2028. That finding has driven a surge of interest from ops teams and analysts who want AI agents querying their data reliably, not hallucinating metrics.
This guide covers setting up Cube Core (the open-source version) locally with Docker, connecting it to a Postgres database, generating a first data model, and running your first natural-language query through the Cube D3 agentic interface.
Prerequisites
Before starting, you need:
- Docker Desktop installed and running (version 24 or later)
- A Postgres database with at least one table of real data — a local Postgres instance works fine
- Node.js 18 or later (used to run the Cube CLI for model generation)
- About 15 minutes
If you do not have Postgres running locally, you can spin one up in Docker: docker run --name pg-demo -e POSTGRES_PASSWORD=demo -e POSTGRES_DB=analytics -p 5432:5432 -d postgres:16. Load a sample dataset like the classic orders table with a quick seed script before continuing.
Step 1: Run Cube with Docker
Create a new empty folder for your Cube project:
mkdir cube-demo && cd cube-demo
Then start Cube with a single Docker command:
docker run -p 4000:4000 \
-p 15432:15432 \
-v ${PWD}:/cube/conf \
-e CUBEJS_DEV_MODE=true \
cubejs/cube
Port 4000 is the Cube API and the Developer Playground UI. Port 15432 is Cube's SQL API, which lets tools like Metabase or any Postgres-compatible client query Cube as if it were a database.
Open http://localhost:4000 in your browser. The Developer Playground loads.
Step 2: Connect Your Database
The Playground prompts you to select a data source. Choose PostgreSQL. Enter your connection details:
- Host:
host.docker.internal(if Postgres is running on your Mac or Windows host machine) or the container name if it is in the same Docker network - Port:
5432 - Database: your database name
- Username / Password: your Postgres credentials
Click Apply and Cube writes a cube.js environment file to your project folder. From this point on, Cube is reading live from your Postgres instance.
For production deployments, these credentials go into environment variables (CUBEJS_DB_HOST, CUBEJS_DB_NAME, CUBEJS_DB_USER, CUBEJS_DB_PASS) in a .env file or your secrets manager. Never hardcode them.
Step 3: Generate Your Data Model
Cube can inspect your database schema and generate a starter data model automatically. In the Developer Playground, go to Schema and click Generate. Select the tables you want to include. Cube produces YAML files in the model/cubes/ folder of your project.
A generated model for an orders table might look like this:
cubes:
- name: orders
sql_table: public.orders
measures:
- name: count
type: count
- name: total_revenue
sql: amount
type: sum
dimensions:
- name: status
sql: status
type: string
- name: created_at
sql: created_at
type: time
This is a starting point, not a final definition. The important step is renaming measures to match how your business actually uses the terms and adding any calculated metrics your team cares about. A measure called total_revenue with clear SQL behind it is the kind of canonical definition that prevents the metric drift problem described earlier.
Edit the YAML directly in your project folder. Cube hot-reloads model changes without a restart.
Step 4: Test Queries in the Playground
Back in the Playground, click Build to run your first query. You can select measures and dimensions from a dropdown and Cube generates the underlying SQL, executes it, and shows results in table or chart form.
The SQL API (port 15432) lets you connect Metabase, Tableau, or any Postgres-compatible client directly. In Metabase, add a new database connection of type PostgreSQL, point it at localhost:15432, and use the same credentials as Cube's SQL API. From that point, Metabase sees all your Cube cubes as database tables.
You can also query via the REST API directly:
curl -G http://localhost:4000/cubejs-api/v1/load \
--data-urlencode 'query={"measures":["orders.total_revenue"],"dimensions":["orders.status"]}' \
-H 'Authorization: YOUR_API_TOKEN'
The API token is set with CUBEJS_API_SECRET in your environment.
Step 5: Enable Agentic Queries with Cube D3
Cube's D3 layer, announced in mid-2025 and now generally available, adds AI agents on top of the semantic model. The key difference from ad-hoc LLM-to-SQL approaches is that D3 agents query the semantic model, not the raw database. That means the agent cannot invent a revenue calculation — it must use the total_revenue measure you defined.
For Cube Cloud users, D3 activates from your account dashboard under Agentic Analytics. The free tier includes a limited number of agent requests per month, enough to test the feature with a real dataset.
Once enabled, the D3 interface lets you type questions like "What was total revenue by status last month?" and the agent resolves the query against your semantic model, returns results, and explains how it arrived at the answer. Because the semantic layer enforces consistent definitions, the answer from a D3 agent matches the answer from your Metabase dashboard using the same Cube connection.
For self-hosted Cube Core, D3 is not yet available as open source. The agentic features require Cube Cloud or an enterprise license.
What This Setup Gives You
After completing these steps, you have a working semantic layer that:
- Serves consistent metric definitions to any BI tool via SQL, REST, or GraphQL
- Generates SQL from your cubes automatically with caching built in
- Provides a foundation for AI agents to query data without hallucinating metrics
The next practical step is defining more cubes for the tables your team queries most, adding role-based access control for row-level security, and connecting your primary BI tool to the SQL API.
If you want to skip the model-building phase and get to data questions immediately, VSLZ lets you upload a CSV or connect a data source and ask questions in plain English without writing YAML or configuring infrastructure — useful for ad-hoc analysis while your Cube semantic layer is still taking shape.
Practical Notes
Cube performs best as a long-running service, not a per-query container. For production, deploy it as a dedicated service with a Redis instance for caching (CUBEJS_CACHE_AND_QUEUE_DRIVER=redis). The default in-memory cache works for development but does not persist across restarts.
Keep your cube YAML files in version control. Model changes that redefine a measure should go through a review process the same way schema migrations do — a change to total_revenue SQL affects every dashboard and every agent query that uses it.
FAQ
What is Cube used for in data analytics?
Cube is a semantic layer that sits between your database and your analytics tools. It lets you define metrics like revenue, churn rate, or order count once in YAML or JavaScript, then exposes those definitions to BI tools, applications, and AI agents via REST, GraphQL, and SQL APIs. The main benefit is consistency: every tool and every query uses the same metric definition instead of each analyst re-writing the same SQL independently.
Can I run Cube locally without a cloud account?
Yes. Cube Core is fully open source and runs locally with a single Docker command: `docker run -p 4000:4000 -p 15432:15432 -v ${PWD}:/cube/conf -e CUBEJS_DEV_MODE=true cubejs/cube`. You do not need a Cube Cloud account for the core semantic layer, REST API, GraphQL API, or SQL API. Cube Cloud adds managed infrastructure, the D3 agentic analytics layer, and enterprise access controls.
How does Cube's semantic layer differ from just writing SQL views?
SQL views are static and live inside the database. Cube's semantic layer is dynamic, version-controlled, and accessible via multiple APIs outside the database. Cube also adds caching, role-based access control, multi-tenancy, and the ability to connect the same model to multiple BI tools simultaneously. AI agents querying Cube are constrained to the defined measures and dimensions, preventing them from generating ad-hoc SQL that could return inconsistent results.
What databases does Cube support?
Cube Core works with all major SQL data sources, including PostgreSQL, MySQL, Snowflake, BigQuery, Databricks, Redshift, ClickHouse, Amazon Athena, and Presto. The connection is configured via environment variables. You can also use multiple data sources in a single Cube deployment with data source routing configured in the cube.js file.
What is Cube D3 and who can use it?
Cube D3 is Cube's agentic analytics platform, announced in 2025 and recognized in the 2026 Gartner Market Guide for Agentic Analytics. It adds AI agents that query your semantic model using natural language and return explainable, governance-compliant results. D3 is available to Cube Cloud users, including a free tier with limited agent requests. Self-hosted Cube Core does not include D3 as of early 2026.


