Guides

How to Set Up Databricks Genie

Arkzero ResearchMar 29, 20268 min read

Last updated Mar 29, 2026

Databricks Genie is a natural language analytics interface built into the Databricks platform that lets business users query data by asking questions in plain English instead of writing SQL. Setting it up requires registering tables in Unity Catalog, configuring a Genie space with curated metadata and sample queries, and connecting a SQL warehouse. This guide walks through each step, including how to build the knowledge store that determines whether Genie gives accurate or misleading results.
Databricks office building displaying the Databricks wordmark

Databricks Genie turns a configured data space into a self-serve analytics interface that business users operate with plain English questions. Setup takes about 30 minutes for a data analyst who knows the underlying tables. The accuracy of responses depends almost entirely on the quality of metadata and sample queries added during configuration, not on Genie's AI model itself.

What Genie Does

Genie sits inside the Databricks workspace as a conversational interface layered over SQL. When a business user asks "Which regions had the highest revenue last quarter?" Genie translates that question into a SQL query, runs it against your registered tables, and returns a table or chart. Users never see the SQL unless they choose to inspect it.

The interface is useful for recurring operational questions that currently require back-and-forth with a data analyst. Instead of opening a ticket or waiting for a weekly report, a sales manager can open Genie, ask their question, and download the result.

Accuracy depends on configuration. A poorly configured Genie space produces plausible-looking but incorrect SQL queries. A well-configured space, with clear table descriptions, defined joins, and tested sample queries, produces reliable results for most common business questions.

Prerequisites

Before setting up a Genie space, confirm the following:

Data in Unity Catalog. All tables Genie will query must be registered in Unity Catalog. Genie cannot access tables outside Unity Catalog, including those in the legacy Hive Metastore. If your organization is still on Hive, migrating tables to Unity Catalog is a prerequisite, not a workaround.

A pro or serverless SQL warehouse. Genie requires a warehouse designated for SQL queries. Classic SQL warehouses are not supported. Serverless is the most practical option since it scales to zero when unused and avoids idle costs.

Permissions. The person creating the Genie space needs Databricks SQL entitlement, CAN USE access on a SQL warehouse, and SELECT privileges on the tables to be included. Account administrators must also enable partner-powered AI features at the account and workspace level before any Genie spaces can be created.

Step 1: Enable Genie in Your Workspace

Open the Databricks account console (not the workspace). Navigate to Settings, then AI features. Toggle on partner-powered AI features. A workspace administrator then needs to confirm the same toggle at the workspace level under Admin Settings.

This step is easy to miss and causes confusion when users do not see Genie in the left sidebar. If the sidebar option is absent, check both the account-level and workspace-level toggles before debugging further.

Step 2: Create a Genie Space

Once Genie is visible in the sidebar, click it, then select New. You will be prompted to select data sources. Add up to 30 tables or views. For a first Genie space, start with five or fewer tables focused on a single business domain. A narrowly scoped space covering sales data or support tickets is far easier to configure accurately than a broad space covering all company data.

After selecting tables, click Create. Databricks will scan the table metadata and create the space. The initial scan takes one to two minutes.

Step 3: Configure Your Data Sources

Open Configure, then Data. For each table, review the Overview and Sample Data tabs. Databricks automatically pulls column names and data types, but it does not know what the columns mean in your business context.

Add descriptions to every table and column that requires business context. For example: the "revenue" column might need a note that it captures "net revenue after discounts and returns, in USD, recorded at invoice close date." A "status" column needs all possible values documented with their meaning. A foreign key column needs a note explaining which table and column it joins to.

The descriptions do not need to be long. They need to be precise. Ambiguous column names without descriptions are the leading cause of incorrect Genie responses.

Next, define table join relationships under the Relationships section. Genie can construct basic joins if relationships are described in column descriptions, but explicit join definitions are more reliable. Define the join as column equality (for example, orders.customer_id = customers.id) or as a custom SQL expression for more complex conditions.

Step 4: Build the Knowledge Store

This step determines whether Genie is accurate or not. Most setup guides skip it or mention it briefly. It is the most important part of configuration.

The knowledge store contains three types of objects: SQL expressions, example queries, and text instructions.

SQL expressions encode business logic that Genie would otherwise have to infer. If "active customers" means "customers with at least one order in the last 90 days and no cancelled account flag," write that as a SQL expression named "active_customers." Genie will use this expression when users ask about active customers rather than attempting to derive the logic from column descriptions alone.

Example queries are sample SQL statements paired with the natural language questions they answer. Add at least five example queries covering the most common questions your business users will ask. Databricks recommends example queries over text instructions for business-specific logic because SQL is unambiguous.

Text instructions are plain English guidelines for Genie's behavior. Limit these to formatting rules, date conventions, and scope boundaries. For example: "When the user asks about revenue, always use the net_revenue column, not gross_revenue." Keep text instructions under 100 total. They are less reliable than SQL expressions for logic-heavy guidance.

An internal analysis referenced in Databricks documentation notes that Genie spaces configured with explicit SQL expressions and example queries consistently outperform those relying primarily on text instructions for accuracy.

Step 5: Add Sample Questions and Settings

Under Configure, then Settings, add sample questions. These appear on the Genie landing page and give business users a starting point. Write them exactly as a non-technical user would ask. For example: "What were total sales by region last month?" or "How many open support tickets are older than 7 days?"

Also set a default SQL warehouse here. If no default is set, users are prompted to select one each time they open Genie, which creates unnecessary friction.

Enable the file upload option if your business users commonly blend internal data with personal spreadsheets. This allows users to upload a CSV or Excel file and reference it alongside database tables in their questions.

Step 6: Test Before Sharing

Ask 20 to 30 realistic business questions using the Genie chat interface. For each response, open the SQL inspector to confirm the generated query is correct, not just plausible. Pay particular attention to aggregation logic, date filtering, join accuracy across multiple tables, and handling of null values.

When a response is wrong, add a targeted fix. If Genie joined the wrong tables, add an explicit join definition. If it used the wrong aggregation, add an example query. Do not add more text instructions unless the issue is behavioral rather than data-structural.

Invite one or two business users to test the space before broad rollout. Their questions will differ from yours and will expose gaps you did not anticipate.

Step 7: Share with Business Users

Share the Genie space from the sidebar menu using standard Databricks folder permissions or direct user sharing. Assign CAN RUN permission for users who only need to ask questions. Reserve CAN EDIT for analysts who help maintain the knowledge store.

Communicate the scope of the space clearly. Genie works best when users understand what it covers. A brief description on the Genie landing page prevents questions outside the space's domain and reduces frustrated feedback when Genie cannot answer something it was never designed for.

Monitoring and Improvement

Open the Monitoring tab to review questions users have asked, the responses Genie generated, and any flagged responses. Look for patterns: repeated questions that get wrong answers indicate missing example queries or unclear column descriptions.

Each workspace handles up to 20 questions per minute across all Genie spaces. For teams with heavy concurrent usage, create additional scoped spaces to distribute load rather than trying to fit everything into one space.

Review the monitoring data monthly and update the knowledge store based on what you find. A space that handles 60 percent of questions accurately at launch can reach 85 to 90 percent with three to four improvement cycles. For teams that need ad-hoc data analysis without a Databricks setup, tools like VSLZ let users upload a file and ask questions directly without configuring a warehouse or building a knowledge store.

FAQ

What data sources does Databricks Genie support?

Genie supports tables and views registered in Unity Catalog only. You can add up to 30 tables or views per Genie space. Genie also supports file uploads in preview, allowing users to blend CSV or Excel files with database tables. Tables outside Unity Catalog, including those in the legacy Hive Metastore, are not accessible to Genie.

Does Databricks Genie work without Unity Catalog?

No. All tables queried by Genie must be registered in Unity Catalog. If your organization uses the legacy Hive Metastore, you must migrate the relevant tables to Unity Catalog before setting up Genie. There is no workaround that allows Genie to access Hive Metastore tables directly.

How do I improve Databricks Genie accuracy?

Accuracy improves most from adding SQL expressions for key business metrics, example queries for common business questions, and explicit join definitions between tables. Avoid relying heavily on text instructions for logic-heavy guidance. Review the Monitoring tab regularly to identify questions that produce wrong answers, then add targeted fixes to the knowledge store.

What SQL warehouse type does Databricks Genie require?

Genie requires a pro or serverless SQL warehouse. Classic SQL warehouses are not supported. Serverless is recommended for most teams because it scales to zero when unused, minimizing idle compute costs. Set a default warehouse in the Genie space settings to avoid prompting users to select one each time they open the interface.

Can non-technical users use Databricks Genie without SQL knowledge?

Yes. Business users interact with Genie entirely through plain English questions. They do not need to write or understand SQL. Generated queries are read-only, so users cannot modify data through Genie. The quality of responses depends on how well the Genie space has been configured by a data analyst, not on the technical skill of the end user.

Related