How to Set Up a Databricks Genie Space
Last updated Apr 29, 2026

Databricks Genie Spaces let business users query structured data in Unity Catalog using plain English, without writing SQL. An analyst sets the space up once by connecting tables, writing sample queries, and adding business context through a knowledge store. After that, anyone on the team with the right permissions can ask questions directly from the Databricks UI and receive SQL-backed answers. This guide walks through every step, with particular focus on the knowledge store configuration that most tutorials skip.
Prerequisites
Three things must be in place before you create a Genie Space.
First, the Databricks SQL entitlement on your workspace. This is separate from the All-Purpose Compute entitlement and must be granted by a workspace admin. Without it, the Genie Spaces option will not appear in the sidebar.
Second, at least CAN USE permission on a Pro or Serverless SQL Warehouse. Genie routes all generated queries through this warehouse, so choose one that can handle the concurrent load from your business users. Serverless is the better default because it scales automatically without requiring manual cluster management.
Third, SELECT privileges on the Unity Catalog tables or views you plan to include. Genie works exclusively with structured data registered to Unity Catalog. It cannot read unstructured files such as PDFs or spreadsheets. If your data is not yet registered, register it as external tables or Delta tables in Unity Catalog before proceeding. If your use case is simpler and does not require the full Databricks ecosystem, VSLZ handles natural language data queries directly from a CSV or Excel upload with no Unity Catalog configuration needed.
A single Genie Space supports up to 30 tables or views. If your use case spans more, build pre-joined views in Unity Catalog to reduce the count before setup.
Step 1: Create the Space
In the Databricks sidebar, click Genie Spaces, then click New in the upper right corner. The setup page has three fields.
Title and description. Name the space after its domain. "Sales Operations Q&A" or "Logistics Shipment Tracker" tells business users what to expect. The description appears on the space landing page, so use it to state which questions the space can answer and, equally important, which it cannot.
Default warehouse. Select the Pro or Serverless SQL Warehouse Genie will use to run all generated queries. Serverless is recommended for teams that want low-latency responses without managing compute sizing.
Tables. Add the Unity Catalog tables or views that contain the data for this space. Start narrow. A focused space with five well-annotated tables consistently outperforms a broad one with twenty. You can always add more tables later.
Click Create to open the space editor.
Step 2: Annotate Tables in Unity Catalog
Before building the knowledge store, review your Unity Catalog table and column descriptions. Genie uses this metadata when translating natural language questions into SQL. Missing or vague descriptions on key columns are the leading cause of wrong answers in practice.
Focus on three categories of columns.
Columns with ambiguous names. If a column is called status, add a description that enumerates valid values: "Order status. Values: pending, processing, shipped, delivered, cancelled." Without this, Genie has no way to know what the stored values mean or which ones a user intends when they ask about "active orders."
Date and timestamp columns. Specify timezone and granularity: "Transaction date in UTC, stored as a DATE type. Does not include time of day." This prevents incorrect date range calculations when users ask for "this week" or "last quarter."
Columns that store internal codes. If cust_tier stores "G" for Gold and "S" for Silver, document both the column purpose and the value encoding. Genie cannot infer these mappings from column names alone.
These annotations live in Unity Catalog and carry over to every Genie Space that references those tables, so the investment compounds across future spaces.
Step 3: Build the Knowledge Store
This is where a working Genie Space is made or broken. Most setup guides cover table configuration and stop there. The knowledge store is where the real calibration work happens.
It has four components.
General instructions are plain-text rules that encode business logic Genie cannot infer from column metadata. Write them as concrete, specific statements. For example: "Active customers are defined as customers with at least one completed order in the past 90 days. Cancelled and refunded orders do not count toward this definition." Or: "Fiscal year runs February through January. Q1 is February, March, and April." Or: "Revenue is always calculated net of refunds. Do not include orders with a status of cancelled or returned."
Instructions like "be precise" or "use common sense" have no effect. Each instruction should encode a rule that would otherwise require a SQL comment in every analyst-written query.
SQL examples are the single highest-value element in the knowledge store. Databricks recommends at least five verified examples per space. Each example pairs a question title with a complete, tested SQL query. The question title should be written exactly as a business user would type it, not in analyst shorthand.
When a user's question closely matches an example title, Genie uses that query as the basis for its answer. This triggers a Trusted label on the response, signaling to business users that the result follows a validated calculation rather than a dynamically generated query. Trusted responses build substantially more confidence than unlabeled generated queries.
In practice, teams that add ten or more well-tested SQL examples see a meaningful drop in incorrect answers. Each example reduces the surface area where Genie has to reason from scratch about business logic. Write examples for the most common questions first, test each one against known results, and add more iteratively as users encounter new question patterns.
SQL expressions let you define reusable business metrics as named snippets. Define terms like gross_margin, churn_rate, and active_customers once. Any user question using those terms then draws from the named expression rather than requiring Genie to interpret the phrase each time. This prevents the same business question from returning different calculations depending on how it is phrased.
Column synonyms map business vocabulary to column names. If your sales team calls order_net_value_usd "deal size," add that mapping. If cust_tier values are internally "G" and "S" but your team always says "Gold" and "Silver," add synonyms for both.
Step 4: Test Before Sharing
Open the Genie chat interface within the space editor and run at least ten to fifteen test questions before sharing. Include questions where you know the correct answer so you can verify the generated SQL. Include questions that use every business term you defined in the knowledge store. Include edge cases with fiscal year date boundaries, multi-table lookups, and metrics that have specific business rules.
For any response that looks wrong, click Inspect. Inspect triggers an additional reasoning pass that reviews the generated SQL, runs targeted verification sub-queries to check filters, date range logic, and join conditions, then returns an improved query when it identifies issues. It is particularly effective for complex questions involving custom date windows or aggregations across multiple tables.
If Inspect still returns an incorrect answer, the fix is almost always in the knowledge store. Add a SQL example for the specific question pattern, tighten a general instruction to encode the relevant rule, or clarify the column description in Unity Catalog.
Do not share the space until your core test questions pass consistently. Users who receive one wrong answer are unlikely to return without significant effort to restore their confidence in the tool.
Step 5: Share with Business Users
Click Share in the space settings and assign permissions at the appropriate level.
CAN VIEW is for business users who will ask questions and consume results. CAN EDIT is for analysts who need to update the knowledge store as business rules change. IS OWNER is for the team member responsible for long-term maintenance.
Business users do not need direct warehouse permissions. Genie runs all queries using the author-configured warehouse credentials. Unity Catalog row filters and column masks are applied automatically to each user's results, so users can only see data they are authorized to access. Row-level security policies on the underlying tables carry through without additional configuration in the space.
For teams that work with uploaded CSV or Excel files alongside Unity Catalog data, Databricks offers a File Uploads feature currently in Public Preview that lets users blend local files with space data during a session. Contact your Databricks account team to enable it.
Maintaining the Space Over Time
A Genie Space is not a one-time setup. Schemas change, business rules evolve, and new question patterns emerge. When a question type repeatedly returns wrong answers, add a SQL example for it. When business rules change, update the relevant instructions. When your team adopts new terminology, add synonyms.
In 2026, Databricks added a Benchmarks feature that lets you build a library of test questions with expected SQL answers and run them in bulk to score accuracy. Setting up a benchmark suite is the most reliable way to catch regressions when underlying table schemas change.
For teams that need to embed Genie into internal applications or agent workflows without requiring users to open Databricks directly, the Genie Conversation API provides programmatic access to any configured space.
FAQ
What data can a Databricks Genie Space use?
Genie Spaces work with structured data registered to Unity Catalog, including managed tables, external tables, views, and materialized views. Genie cannot directly answer questions about unstructured data such as PDFs or Word documents. A separate feature, Chat in Genie, connects to external document sources like Google Drive or SharePoint. File Uploads, currently in Public Preview, allows users to blend local CSV and Excel files with Unity Catalog data during a session.
Do business users need a Databricks SQL license to use a Genie Space?
Business users need either the consumer access entitlement or the Databricks SQL entitlement, plus SELECT privileges on the Unity Catalog tables used in the space. They do not need direct SQL Warehouse permissions. Genie runs all queries using the author-configured warehouse credentials, and Unity Catalog row-level security is applied automatically to each user's results.
How many tables can I add to a Genie Space?
A single Genie Space supports up to 30 tables or views. For use cases that span more tables, the recommended approach is to create pre-joined views in Unity Catalog before adding them to the space. Keeping the table count low and scoped to the space's specific domain generally produces better query accuracy than adding all available tables.
What is a Trusted response in Databricks Genie?
A Trusted response occurs when Genie's generated query exactly matches a parameterized SQL example or SQL function defined in the knowledge store. Databricks marks these responses with a Trusted label to indicate that the result follows a verified, pre-approved calculation. Adding well-tested SQL examples to the knowledge store is the primary way to increase the rate of Trusted responses.
What should I do when Genie returns an incorrect answer?
Click the Inspect button on the response. Inspect triggers an advanced reasoning pass that reviews the generated SQL, runs verification sub-queries to check filters, date ranges, and join conditions, and returns an improved query when it finds issues. If Inspect does not resolve the problem, the fix is almost always in the knowledge store: add a SQL example for the specific question pattern, tighten a general instruction, or clarify a column description in Unity Catalog.


