Guides

How to Get Started with Mage AI

Arkzero ResearchApr 24, 20266 min read

Last updated Apr 24, 2026

Mage AI is an open-source data pipeline orchestration tool that lets you build, run, and monitor ETL workflows through a visual notebook interface. Unlike heavier tools such as Apache Airflow, Mage runs locally via Docker with no configuration overhead, and pipelines are built from modular blocks in Python or SQL. Most teams can have a working pipeline loading, transforming, and exporting data within 30 minutes of installation.

How to Get Started with Mage AI hero image

Mage AI is an open-source pipeline orchestration tool that reduces data engineering setup to a Docker command and a browser. You define your pipeline as a sequence of blocks, each in its own file, and run the whole thing on a schedule with no YAML configuration files required. This guide walks through installation, building a first pipeline, and scheduling automated runs.

What Mage AI Does

Most data teams spend hours configuring Airflow before writing a single line of pipeline logic. Mage takes a different approach: the UI is the editor. Every block, whether a data loader, transformer, or exporter, is a standalone Python or SQL file that you write directly in the browser, execute step by step, and debug with real output at each stage.

Mage integrates with more than 150 data sources including Postgres, BigQuery, Snowflake, S3, and standard REST APIs. Its GitHub repository has over 7,500 stars and the tool is actively maintained by a funded company. The core unit is the pipeline, built from three block types: Data Loader (pulls data from a source), Transformer (cleans or reshapes data), and Data Exporter (sends the output to a destination).

Step 1: Install Mage with Docker

Docker is the recommended installation method. It runs the Mage server in a container so there is no Python environment to configure manually.

Create a project directory and save this as docker-compose.yml:

version: '3'
services:
  magic:
    image: mageai/mageai:latest
    command: mage start my_project
    ports:
      - 6789:6789
    volumes:
      - .:/home/src/
    restart: on-failure:5

Then run:

docker compose up

Open http://localhost:6789 in your browser. The Mage dashboard loads within a few seconds.

If you prefer pip instead of Docker:

pip install mage-ai
mage start my_project

Both methods start the server at the same port. Docker is preferred for team environments because it locks the runtime version.

Step 2: Create a New Pipeline

From the Mage dashboard, click New pipeline and choose Standard (batch) for a basic ETL job. Give it a descriptive name like daily_sales_sync.

You land in the pipeline editor with three empty sections: Data Loader, Transformer, and Data Exporter. Each section can hold one or more blocks chained in sequence.

Step 3: Add a Data Loader

Click Data loader and select a source type. Mage provides templates for common sources. To load a local CSV, select Python > File:

import pandas as pd

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader

@data_loader
def load_data(*args, **kwargs):
    return pd.read_csv('data/sales.csv')

Click Run block in the top right corner. Mage executes the block and displays the first 10 rows of the resulting dataframe directly below the code. No separate test script, no terminal window.

For database sources, select Python > Postgres or Python > BigQuery and fill in connection credentials. Mage stores these in a YAML config file inside the project directory.

Step 4: Add a Transformer

Click Transformer and select Python > Generic (no template). The transformer receives the upstream block output as a DataFrame and returns a modified version.

A typical cleaning transformer:

import pandas as pd

if 'transformer' not in globals():
    from mage_ai.data_preparation.decorators import transformer

@transformer
def transform(data, *args, **kwargs):
    # Drop rows missing revenue
    data = data.dropna(subset=['revenue'])
    # Normalize column names
    data.columns = [c.lower().replace(' ', '_') for c in data.columns]
    # Filter to current year
    data['date'] = pd.to_datetime(data['date'])
    data = data[data['date'].dt.year >= 2025]
    return data

Run the block and the cleaned output appears immediately. You can chain multiple transformers for complex logic without defining dependencies manually.

Step 5: Add a Data Exporter

Click Data exporter and select a destination. For writing a CSV file:

if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter

@data_exporter
def export_data(data, *args, **kwargs):
    data.to_csv('output/cleaned_sales.csv', index=False)

For writing to a Postgres table, select the Postgres template and configure the connection. Mage injects connection strings from environment variables automatically when you reference them in the block.

Step 6: Run the Full Pipeline

Click Run pipeline from the pipeline view. Mage executes each block in order and shows status per block in real time: queued, running, completed, or failed.

If a block fails, the pipeline stops at that point and the error appears inline next to the failing block. Click the block to expand the output panel and read the Python traceback. Most failures are mismatched column names or missing file paths that the output makes immediately obvious.

Step 7: Schedule Automated Runs

Click the Triggers tab in the left sidebar. Click New trigger and select Schedule.

Configure the schedule:

Frequency: Daily, weekly, or a custom cron expression such as 0 6 * * * for 6 AM daily
Start time: The first execution datetime in UTC
Timeout: Maximum runtime before the run is force-stopped

Click Enable trigger. Mage begins executing the pipeline on schedule as long as the server is running.

For production deployments, Mage provides official Terraform modules for AWS ECS, GCP Cloud Run, and Kubernetes. The same pipeline code runs unchanged in production with no modifications to block logic.

Monitoring Runs

The Runs tab shows every pipeline execution with duration, status, and block-level logs. Failed runs highlight the exact block that errored, and logs are retained for debugging without any external log aggregation setup.

For alerting, Mage supports Slack and email notifications on run failure. Configure these in Settings > Notification by entering your Slack webhook URL or SMTP credentials.

When Mage Is the Right Tool

Mage works well for teams with fewer than 50 pipelines that need a low-overhead alternative to Airflow. The visual interface and block-level test execution reduce iteration time significantly for analysts who are comfortable writing Python but do not want to manage a full Airflow deployment.

For teams with hundreds of production pipelines and dedicated data engineering staff, tools such as Dagster or Prefect provide more mature observability, versioning, and DAG management features.

If your goal after the pipeline runs is to extract insights and generate charts from the output data rather than build additional transformations, VSLZ can connect to the resulting CSV or database table and produce analysis from a plain English prompt without requiring a separate analytics tool.

Key Takeaways

Mage AI reduces the barrier to building scheduled ETL pipelines through a notebook-style interface that tests each step with real data before running the full job. Installation takes one Docker command, pipeline blocks run and debug independently in the browser, and scheduling requires only a settings form. For an analyst or operations manager who needs to automate a recurring data extraction and loading job, Mage is one of the lowest-friction paths from a blank project to a running production pipeline.

FAQ

Is Mage AI free to use?

Mage AI is open-source and free to self-host under the Apache 2.0 license. The core pipeline builder, scheduler, and monitoring features are all available without a paid plan. Mage also offers a paid cloud-hosted version called Mage Pro for teams that want managed infrastructure, SSO, and enterprise support. For most individual analysts and small teams, the open-source version running locally via Docker covers all standard ETL needs.

How is Mage AI different from Apache Airflow?

Apache Airflow defines pipelines as Python DAG files that you deploy to a scheduler, which makes it powerful but complex to set up and debug. Mage AI uses a block-based notebook interface where you write, run, and inspect each step directly in the browser without touching configuration files. Mage is generally faster to get running for simple to medium ETL jobs. Airflow has a larger ecosystem, more mature monitoring, and is better suited to organizations with dedicated data engineering teams managing many complex workflows.

Can Mage AI connect to cloud databases like BigQuery or Snowflake?

Yes. Mage includes pre-built data loader and exporter templates for BigQuery, Snowflake, Redshift, Postgres, MySQL, and more than 150 other sources. Connection credentials are stored in a YAML config file inside your Mage project directory and referenced in blocks through environment variables. You can also write custom SQL queries directly in the Mage SQL block editor against any connected database.

Can I run Mage AI in production on cloud infrastructure?

Yes. Mage provides official Terraform modules for deploying to AWS ECS, GCP Cloud Run, and Kubernetes. The pipeline code is identical between local and production environments, so there is no rewriting required when moving from development to deployment. The Mage documentation includes step-by-step guides for each cloud provider. For persistent storage and multi-user access, a production deployment typically uses a shared Postgres database as the Mage backend alongside cloud block storage for data files.

What programming languages does Mage AI support?

Mage supports Python, SQL, and R for writing pipeline blocks. Python is the most commonly used option and has access to the full Mage decorator API. SQL blocks connect directly to a configured database and execute queries, with the results passed to the next block as a DataFrame. R support covers basic transformation use cases. The majority of community tutorials and pre-built templates use Python.