Guides

How to Set Up Apache Superset with Docker

Arkzero ResearchApr 23, 20268 min read

Last updated Apr 23, 2026

Apache Superset is a free, open-source business intelligence platform that lets teams build interactive dashboards and explore data without writing application code. The fastest way to get it running is with Docker Compose, which sets up the full stack including a metadata database, cache layer, and background workers in a single command. This guide covers the complete setup process, how to connect a real data source, and the first steps to building a working dashboard.
Apache Superset open source BI dashboard running in Docker

Apache Superset is one of the most widely deployed open-source BI tools in production. Originally built at Airbnb in 2015 and donated to the Apache Software Foundation, it now powers analytics for companies ranging from early-stage startups to large enterprises. As of April 2026, the project has more than 62,000 GitHub stars and an active contributor base releasing updates monthly.

Most setup guides either walk through a local pip install, which involves managing Python virtual environments and several manual configuration steps, or assume you already have a production server ready. This guide takes the middle path: a Docker Compose setup that runs the full Superset stack on your machine in under ten minutes, then covers how to connect it to a real database and build a chart you can share with your team.

What You Need Before Starting

You need Docker Desktop installed and running. If you are on macOS, Docker Desktop 4.x or later works without additional configuration. On Linux, install Docker Engine and the Docker Compose plugin separately. On Windows, use Docker Desktop with WSL 2 enabled.

You also need Git. Run git --version to confirm. If it is not installed, download it from git-scm.com.

No Python installation is required. The Docker images handle the runtime environment entirely.

Step 1: Clone the Superset Repository

Open your terminal and run:

git clone --depth=1 https://github.com/apache/superset.git
cd superset

The --depth=1 flag downloads only the latest commit rather than the full history, which cuts the download size significantly.

Next, check out the latest stable release tag. As of April 2026, that is version 4.1.x. You can find the exact latest tag by running:

git tag | sort -V | tail -5

Then check out that tag:

git checkout tags/4.1.2

Pinning to a release tag rather than running from the main branch means you get a tested, stable build rather than work-in-progress code.

Step 2: Start the Stack with Docker Compose

Superset ships with a Docker Compose file designed specifically for local development and evaluation. Run:

docker compose -f docker-compose-image-tag.yml up

This command pulls the official Superset image along with PostgreSQL for metadata storage, Redis for caching, and Celery workers for asynchronous query execution. The first run takes several minutes while images download. Subsequent starts are fast because Docker caches the images locally.

When the output settles and you see lines indicating the web server is listening, Superset is ready. The default address is http://localhost:8088.

What the compose file sets up:

  • superset_app: the main web server on port 8088
  • superset_worker: Celery worker for async queries and chart caching
  • superset_worker_beat: scheduler for periodic jobs
  • db: PostgreSQL 15 storing dashboards, users, and chart definitions
  • redis: Redis 7 handling the task queue and query cache

All five services are configured to talk to each other automatically. You do not need to touch any network or connection string settings to get a working local environment.

Step 3: Log In and Explore the Interface

Navigate to http://localhost:8088. The default credentials are:

  • Username: admin
  • Password: admin

The first thing to do after logging in is change the admin password. Go to Settings in the top right corner, select User Info, and update the password. The default credentials are well-known and should not stay in place even on a local instance if it is accessible beyond your own machine.

Superset's interface has four main sections visible in the top navigation. Dashboards shows your published collections of charts. Charts is where individual visualizations live. Datasets is where you define which tables Superset can query. SQL Lab is a full SQL editor with autocomplete and result export.

A set of example charts and dashboards is loaded automatically on first run. These use a bundled SQLite database with sample data, so you can explore the interface immediately without connecting to an external source.

Step 4: Connect Your Own Database

Superset can connect to more than 40 database types through SQLAlchemy drivers. The Docker image includes drivers for PostgreSQL and MySQL out of the box. For other databases such as Snowflake, BigQuery, Redshift, or DuckDB, you need to install the driver inside the running container.

To add a database connection, go to Settings in the top navigation and select Database Connections. Click the plus icon and choose your database type from the dropdown. Superset will show you the required connection string format for that database.

For a PostgreSQL database, the connection string follows this pattern:

postgresql://username:password@hostname:5432/database_name

For a local PostgreSQL instance running on your machine (not inside Docker), use host.docker.internal as the hostname on macOS and Windows, or your machine's network IP on Linux:

postgresql://myuser:mypassword@host.docker.internal:5432/mydb

After entering the connection string, click Test Connection to verify it works before saving.

Installing additional drivers: If you need a driver that is not included in the base image, exec into the running container and install it:

docker exec -it superset_app bash
pip install snowflake-sqlalchemy

This change is not persistent across container restarts. For a durable setup, create a custom Dockerfile that extends the official Superset image and adds your required packages.

Step 5: Create a Dataset and Build a Chart

Once a database connection is saved, you can register a table as a dataset. Go to Datasets, click the plus icon, select your database and schema, then choose the table. Superset creates a dataset entry that stores column metadata and makes the table available in the chart builder.

With a dataset registered, go to Charts, click the plus icon, and select your dataset. Choose a visualization type from the gallery. Superset offers more than 40 chart types including bar charts, line charts, scatter plots, maps, pivot tables, and time-series forecasts.

The chart builder interface has three panels: the visualization type selector on the left, configuration controls in the center, and a live preview on the right. Set your metric (which column to aggregate and how), your dimension (how to group results), and any filters, then click Create Chart. Save the chart and add it to a dashboard to make it accessible to other users with a shared URL.

Common Issues and How to Fix Them

Port 8088 already in use. Another service is listening on that port. Either stop the conflicting service or change Superset's port by editing the ports entry in the compose file from "8088:8088" to "8090:8088" and restarting.

Database connection refused. When connecting to a database running on your host machine from inside Docker, localhost refers to the container, not your machine. Use host.docker.internal on macOS and Windows. On Linux, add --add-host=host.docker.internal:host-gateway to the Docker run command or set it in the compose file.

Changes not appearing after restart. The compose file mounts a local config directory. If you edited configuration files, run docker compose restart superset_app rather than a full down-and-up cycle to apply changes faster.

Image is out of date. Run docker compose pull before docker compose up to ensure you have the latest version of each image matching the tag in the compose file.

What Superset Does Not Handle Out of the Box

Superset is powerful for teams that already have structured data in a database or warehouse. If your data lives in spreadsheets, CSV exports, or ad-hoc files, you need to load it into a database first before Superset can query it. For exploratory work with raw files, a tool like VSLZ lets you upload a file directly and get charts and analysis without any database setup or connection strings.

For production deployments, the Docker Compose setup used in this guide is not recommended. Apache Superset's documentation covers Helm chart deployment for Kubernetes environments, which adds proper secret management, horizontal scaling, and persistent storage configuration for production use.

Summary

Getting Apache Superset running locally takes three commands: clone the repo, check out a stable tag, and run Docker Compose. From there, connecting a real database and building your first chart takes another ten to fifteen minutes. The result is a fully functional, self-hosted BI platform with no per-seat licensing cost and no data leaving your infrastructure.

FAQ

Is Apache Superset free to use?

Yes. Apache Superset is released under the Apache 2.0 license, which means it is free to use, modify, and distribute for any purpose including commercial use. There is no paid tier or feature gating in the open-source version. Preset.io offers a managed hosted version with paid plans for teams that want cloud hosting and enterprise support without managing the infrastructure themselves.

Can Apache Superset connect to Google Sheets or Excel files?

Superset connects to databases and data warehouses through SQLAlchemy drivers. It does not have a built-in connector for Google Sheets or Excel files directly. To use spreadsheet data, you would first need to load it into a supported database such as PostgreSQL, SQLite, or a cloud warehouse. The Google Sheets API can be accessed through a third-party SQLAlchemy driver, but this requires additional configuration.

What databases does Apache Superset support?

Superset supports more than 40 databases through SQLAlchemy, including PostgreSQL, MySQL, SQLite, Snowflake, BigQuery, Redshift, Databricks, DuckDB, ClickHouse, Presto, Trino, and many others. Support for each database requires the corresponding Python driver to be installed in the Superset environment. The official documentation lists every supported database and the required pip package for each one.

How do I update Apache Superset to a newer version?

For a Docker Compose setup, update by pulling the newer image tag. Check the Superset releases page on GitHub for the latest stable tag, update the image tag references in your compose file or environment variable, run `docker compose pull` to download the new images, then run `docker compose up` to restart with the updated version. Always check the release notes for migration steps before upgrading, as some versions require database schema migrations.

What is the difference between Superset and Metabase?

Both are open-source BI tools that connect to databases and let users build dashboards without writing application code. Superset is more technically oriented, with a full SQL editor, support for custom SQL metrics, and a wider range of chart types. Metabase is designed for non-technical users with a simpler question-building interface and easier initial setup. Superset requires more configuration to get running but gives data teams more control over how data is defined and queried.

Related

Python code editor displaying a Polars DataFrame analytics workflow
Guides

How to Get Started with Polars for Data Analysis

Polars is a Python DataFrame library built on a Rust engine with lazy evaluation and multi-core execution. Install it with pip install polars, read CSV or Parquet files with pl.read_csv() or pl.scan_csv(), and chain filter, group-by, and aggregation expressions to analyze data. On a 1 GB CSV file with 10 million rows, Polars loads data in 1.6 seconds and uses roughly 87 percent less memory than pandas on the same task.

Arkzero Research · Jun 4, 2026
How to Use Julius AI for Data Analysis - hero image
Guides

How to Use Julius AI for Data Analysis

Julius AI is a conversational data analysis platform that lets you upload a spreadsheet or CSV, ask questions in plain English, and receive charts, summaries, and statistical outputs in seconds with no SQL or code required. It runs Python in the background, handles messy real-world files automatically, and maintains session context so you can refine results conversationally. Free accounts are capped at 15 messages per month; real analysis work requires Plus at $35 per month or higher.

Arkzero Research · May 28, 2026
OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026