How to Set Up a Databricks Genie Space
Last updated Apr 3, 2026

Setting up a Databricks Genie space takes four main steps: register your data in Unity Catalog, create the space using a pro or serverless SQL warehouse, build a knowledge store with table descriptions and example SQL, and share the space with business users. The process takes under an hour for a team already on Databricks. The configuration phase, specifically the knowledge store, is where most setups succeed or fail.
What Databricks Genie Does
Databricks Genie converts plain English questions into SQL queries against Unity Catalog data and returns the results as charts or summaries. Business users, such as sales managers, logistics coordinators, or financial analysts, interact with a chat interface. They ask questions and receive answers without writing SQL or opening a separate BI tool.
Genie is not a general-purpose chatbot. Each Genie space is scoped to a curated set of tables and configured with domain-specific knowledge by a data analyst. That distinction matters for accuracy: a well-configured Genie space gives consistent, reliable answers to common business questions; a poorly configured one gives unreliable results that erode user trust quickly.
As of 2026, Databricks has raised the table limit per space from 25 to 30 and enabled Genie by default on published dashboards via an Ask Genie button. Those changes reduce the number of separate spaces most teams need to maintain.
Prerequisites
Before creating a Genie space, confirm you have:
- A Databricks workspace with Unity Catalog enabled
- A pro or serverless SQL warehouse with CAN USE permissions
- Source tables registered to Unity Catalog, not external tables with unsynced metadata
- Partner-powered AI features enabled at the account level by an administrator
- At minimum CAN EDIT permissions on the Genie space during setup; end users need CAN VIEW or CAN RUN plus SELECT on the underlying tables
If you are on a trial SKU, partner-powered AI features are not available. You need a paid workspace (F2 or higher for Azure, or an equivalent AWS or GCP tier).
Step 1: Register Your Data in Unity Catalog
Genie only queries data registered in Unity Catalog. If your tables live in legacy metastores or external catalogs, migrate them first. Foreign or federated tables are a common pitfall: their column descriptions do not always sync locally, which means Genie may never see the metadata it needs. The fix is to edit the metadata directly in the Genie space or wrap the source with a materialized view that includes proper column comments.
Before creating the space, review your table and column descriptions in Unity Catalog. Clear, accurate column names are the single largest factor in Genie's response quality. A column named rev_usd_q is opaque to both Genie and to users. Annotating it as "quarterly revenue in USD" in the column description produces measurably better query results.
If your domain uses non-standard conventions, for example fiscal quarters that start in February, add those mappings now. Genie cannot infer fiscal year boundaries without being told explicitly.
Step 2: Create the Genie Space
- In the Databricks sidebar, click Genie.
- In the upper-right corner, click New.
- Select the tables or views you want to include. Start with five or fewer tables for your first space.
- Choose a default SQL warehouse.
- Click Create.
Databricks recommends keeping spaces focused. A space covering five well-described tables outperforms a space with 25 partially documented ones. If your use case spans multiple domains, such as sales and inventory, build two separate spaces rather than combining them into one.
Give the space a descriptive title and a brief plain-English description. These appear in the Ask Genie panel on published dashboards and help users understand what questions the space is designed to answer.
Step 3: Build the Knowledge Store
The knowledge store is the core configuration layer. It tells Genie how your data is structured and how your organization describes it. There are three types of entries to add:
Table and column descriptions: Write descriptions that explain what each table tracks and what each column measures. Describe the business meaning, not just the data type. For a column tracking monthly active users, a description like "count of unique users who performed at least one session in the calendar month" gives Genie the context to use it accurately.
Synonyms: Map business terms to column names. If your sales team calls a metric "bookings" but your column is named new_arr, add a synonym mapping bookings to new_arr. Without this mapping, Genie returns incorrect results or reports that it cannot find the data.
Join relationships: Define how tables connect using column equality syntax. Genie cannot infer joins from table names alone. Missing join definitions are a leading cause of incorrect multi-table query results.
You can add up to 100 total instructions across example queries, SQL functions, and text guidance. Databricks' published documentation notes that overlong or overlapping instructions get deprioritized when context capacity fills, which means the most important join and metric definitions may never reach the model. Prioritize in this order: SQL expressions for KPIs, example SQL queries for common business questions, and text instructions only for edge cases that cannot be expressed as SQL.
Step 4: Add Example SQL Queries
Example SQL queries are the most effective tool for improving accuracy on domain-specific questions. For each common question your users are likely to ask, write the corresponding SQL and use the typical phrasing of that question as the query title.
If finance regularly asks for revenue by region last quarter, create an example query titled "Revenue by region last quarter" with the correct SQL including the right table, date filter, and aggregation. When a user asks something similar, Genie uses this query as a reference rather than generating SQL from scratch.
Add five to ten example queries covering your most common request types before sharing the space. These also surface gaps in your knowledge store before business users encounter them.
Step 5: Test Before Sharing
Use the benchmark feature to test accuracy at scale before opening the space to users. Create a set of 20 to 30 representative questions and run them through the benchmark tool. Review which answers are wrong and trace each failure back to a missing synonym, join definition, or ambiguous column description.
The most common accuracy problems, based on Databricks' troubleshooting documentation, are:
Filtering on wrong values: Genie tries to match "California" when the column stores "CA." Fix this by enabling example values on the column or adding a value dictionary entry.
Wrong table selection: Genie picks an overlapping column from the wrong table. Fix this by removing redundant columns or tables, or by adding explicit instructions specifying which table to use for particular metrics.
Timezone errors: Genie defaults to UTC when the business expects local time. Fix this by adding an explicit text instruction specifying the source timezone, conversion function, and target timezone for all date-related queries.
Do not share the space until benchmark accuracy reaches a level your team considers acceptable for self-serve reporting. Launching too early creates the impression that Genie does not work, which is difficult to reverse even after the underlying configuration is corrected.
Step 6: Share With Business Users
In the space settings, share access with individual users, groups, or all account users. Business users need:
- Consumer or Databricks SQL workspace entitlement
- SELECT privileges on the tables used in the space
- At minimum CAN VIEW or CAN RUN access on the Genie space itself
Set rate limit expectations before launch: Genie spaces in the UI support 20 questions per minute per workspace, shared across all spaces. For teams with heavy concurrent usage, plan to distribute load through the Genie API, which supports additional sessions outside the UI throughput cap.
After sharing, monitor user feedback through the Monitoring tab. Downvoted responses are the fastest signal for knowledge store gaps. Treat each downvote as a configuration bug and address it promptly.
Practical Summary
A production-ready Genie space requires three things most setup guides skip: annotated Unity Catalog metadata before you create the space, a knowledge store built around SQL expressions rather than text instructions, and benchmark testing before the space goes live. The click-through creation takes ten minutes. The knowledge store curation, done properly, takes several hours but results in a space that business users can trust for recurring questions.
For teams without a Databricks environment, VSLZ AI provides natural language querying against uploaded CSV or Excel files with no warehouse or Unity Catalog setup required.
FAQ
What permissions do I need to create a Databricks Genie space?
To create a Genie space, you need at minimum CAN EDIT permissions on the space and CAN USE access on the SQL warehouse you select. You also need SELECT privileges on the Unity Catalog tables you include. At the account level, partner-powered AI features must be enabled by an administrator before any Genie spaces can be created or used.
How many tables can a Databricks Genie space include?
A Genie space supports up to 30 tables or views as of 2026, increased from the previous limit of 25. Databricks recommends starting with five or fewer tables for your first space and only expanding as needed. Spaces with fewer, well-documented tables consistently outperform spaces that include many partially described tables.
Why is my Databricks Genie space giving wrong answers?
The most common causes are missing or ambiguous column descriptions in Unity Catalog, absent join relationship definitions in the knowledge store, and filters that don't match actual data values (for example, filtering by 'California' when the column stores 'CA'). Check the Monitoring tab for downvoted responses, trace each failure to a missing synonym or join definition, and add SQL example queries to anchor responses for your most common question types.
Can I use Databricks Genie without Unity Catalog?
No. Genie requires all data sources to be registered in Unity Catalog. Tables in legacy Hive metastores or external catalogs are not queryable by Genie directly. You must migrate those tables to Unity Catalog first, or wrap them in a Unity Catalog view with proper column descriptions before adding them to a Genie space.
How do I share a Genie space with non-technical users?
In the Genie space settings, share access with specific users, groups, or all account users. Non-technical end users need the Consumer or Databricks SQL workspace entitlement, SELECT privileges on the underlying tables, and at minimum CAN VIEW or CAN RUN access on the Genie space. Once shared, users can access the space via a direct link or through the Ask Genie button on any published dashboard that includes your data.


