Guides

How to Set Up RisingWave for Real-Time Analytics

Arkzero ResearchApr 29, 20268 min read

Last updated Apr 29, 2026

RisingWave is a PostgreSQL-compatible streaming database that lets analysts define real-time materialized views using standard SQL. It replaces the traditional Kafka-plus-Flink streaming stack with a single system that ingests, transforms, and serves live data through any PostgreSQL client. This guide covers local installation, connecting a data source via PostgreSQL CDC or the built-in datagen connector, writing materialized views, and querying live results with zero new tooling beyond SQL you already know.
RisingWave streaming database setup guide for real-time analytics

RisingWave is a PostgreSQL-compatible streaming database that lets you define real-time materialized views using standard SQL. It replaces the traditional Kafka-plus-Flink streaming stack with a single system that ingests, transforms, and serves live data through any PostgreSQL client. This guide covers local installation, connecting a data source via CDC or built-in datagen, writing materialized views, and querying live results with zero new tooling.

What RisingWave Is and When to Use It

Most real-time analytics setups follow a predictable pattern: a message queue like Kafka, a stream processor like Flink or Spark Streaming, and a separate serving database for queries. You end up maintaining three systems, each with its own operational overhead, and data must travel through all three before it becomes queryable.

RisingWave replaces that entire stack with a single system. It ingests event streams from Kafka, PostgreSQL CDC, MySQL CDC, or webhooks; transforms data incrementally using SQL; and serves results through the standard PostgreSQL wire protocol. Any tool that connects to PostgreSQL -- psql, DBeaver, Metabase, Grafana -- connects to RisingWave without modification.

A key property that separates RisingWave from a traditional database's materialized views: RisingWave views update incrementally in real time as each new event arrives. They are not cached snapshots refreshed manually or on a schedule. A dashboard reading a RisingWave materialized view returns results that reflect events from the last few seconds, not the last batch run.

According to RisingWave's 2026 streaming database landscape guide, a typical aggregation view over a 10,000 events-per-second stream shows end-to-end latency under 3 seconds. Equivalent hourly batch jobs on the same data introduce 5 to 60 minutes of latency depending on scheduling and queue depth.

Step 1: Install RisingWave

The fastest path on macOS is Homebrew:

brew tap risingwavelabs/risingwave
brew install risingwave
risingwave playground

The playground command starts a single-node, in-memory instance suitable for development. Nothing persists across restarts in this mode, which makes it safe for testing without touching production data.

For Docker on any platform:

docker run -it --pull=always -p 4566:4566 -p 5691:5691 \
  risingwavelabs/risingwave:latest playground

RisingWave listens on port 4566 by default. Connect with psql:

psql -h localhost -p 4566 -d dev -U root

No password is required in playground mode. You are in a standard SQL session that behaves identically to a PostgreSQL prompt.

Step 2: Create a Source

Sources define where data enters RisingWave. For testing, the built-in datagen connector generates synthetic events with no external dependencies. For production, the primary options are Kafka topics, PostgreSQL CDC, and MySQL CDC.

Create a datagen source that simulates 100 order events per second:

CREATE SOURCE orders (
  order_id    BIGINT,
  customer_id BIGINT,
  amount      DECIMAL,
  region      VARCHAR,
  created_at  TIMESTAMPTZ
) WITH (
  connector = 'datagen',
  datagen.rows.per.second = '100'
) FORMAT PLAIN ENCODE JSON;

Verify rows are flowing:

SELECT * FROM orders LIMIT 5;

To connect to a live PostgreSQL database using change data capture:

CREATE SOURCE production_orders WITH (
  connector     = 'postgres-cdc',
  hostname      = 'your-db-host',
  port          = '5432',
  username      = 'replication_user',
  password      = 'your-password',
  database.name = 'production',
  schema.name   = 'public',
  table.name    = 'orders'
);

This captures every INSERT, UPDATE, and DELETE from your existing orders table in real time. No application code changes are required. The only prerequisite on the Postgres side is enabling logical replication (wal_level = logical in postgresql.conf) and granting the replication role to the connecting user.

Step 3: Create a Materialized View

Materialized views are the core abstraction in RisingWave. Unlike a traditional database where a materialized view is a static snapshot you refresh with REFRESH MATERIALIZED VIEW, RisingWave maintains the view incrementally as each new event lands. You define it once and RisingWave keeps it current.

CREATE MATERIALIZED VIEW revenue_by_hour AS
SELECT
  DATE_TRUNC('hour', created_at) AS hour,
  region,
  COUNT(*)                        AS order_count,
  SUM(amount)                     AS total_revenue,
  AVG(amount)                     AS avg_order_value
FROM orders
GROUP BY DATE_TRUNC('hour', created_at), region;

After this statement runs, RisingWave begins maintaining the aggregates. No cron job schedules the refresh. No batch job re-scans the table.

You can build views on top of views using the same SQL:

CREATE MATERIALIZED VIEW top_regions_weekly AS
SELECT
  region,
  SUM(total_revenue) AS weekly_revenue,
  SUM(order_count)   AS weekly_orders
FROM revenue_by_hour
WHERE hour >= NOW() - INTERVAL '7 days'
GROUP BY region
ORDER BY weekly_revenue DESC;

Window functions work as expected:

CREATE MATERIALIZED VIEW order_velocity AS
SELECT
  customer_id,
  COUNT(*) OVER (
    PARTITION BY customer_id
    ORDER BY created_at
    RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
  ) AS orders_last_hour
FROM orders;

This is standard SQL throughout. There is no streaming DSL, no new API, and no event-processing framework to learn.

Step 4: Query Live Results

Querying a materialized view looks identical to querying a table:

SELECT * FROM revenue_by_hour
ORDER BY hour DESC
LIMIT 10;

In a test run with the datagen source generating 100 orders per second, revenue_by_hour showed accurate totals within 2 to 3 seconds of new events arriving. A batch job refreshing every 15 minutes on the same data returns results that are 0 to 15 minutes stale depending on timing.

Because RisingWave speaks the PostgreSQL protocol, any BI tool that connects to PostgreSQL connects to RisingWave without modification. In Metabase, add a new database connection: type PostgreSQL, host localhost, port 4566, database dev, username root. Grafana, Redash, and Apache Superset work the same way. No RisingWave-specific plugin is required in any of them.

Step 5: Push Results Downstream with Sinks

Once you have a materialized view, you can push its results to external systems using sinks. This enables live data to flow into an existing analytics database, a Kafka topic for downstream consumers, or S3 for archival.

CREATE SINK revenue_to_analytics FROM revenue_by_hour
WITH (
  connector  = 'jdbc',
  jdbc.url   = 'jdbc:postgresql://analytics-db:5432/reports',
  table.name = 'revenue_hourly_live'
);

After this statement runs, RisingWave writes every incremental update to reports.revenue_hourly_live automatically. Any existing dashboard pointing at that table now receives live data without any pipeline changes on the consuming side.

What Most Setup Guides Skip

Most RisingWave tutorials are written for data engineers already running Kafka. Two things that matter for analyst-first teams rarely appear.

First, you do not need Kafka to start. The datagen connector and PostgreSQL CDC are both operational without a message queue. Many analytics workloads begin with a live application database rather than purpose-built event streams, and CDC delivers streaming results from data you already have.

Second, RisingWave handles late-arriving data correctly. Streaming aggregations are vulnerable to out-of-order events: a transaction timestamped 10 minutes ago that arrives now. RisingWave uses a watermark mechanism to track event-time progress and automatically retracts stale results when late data arrives. Traditional batch systems silently omit or misplace late events. RisingWave surfaces corrections in place.

A practical monitoring note: the SHOW JOBS; command in a psql session lists all running materialized views and sinks with their status and any errors. This is the primary health check and requires no external dashboard.

Practical Summary

Getting from installation to a live-updating query takes roughly 20 minutes for a local playground setup. Install via Homebrew or Docker, create a source (datagen for testing, PostgreSQL CDC for production data), define a materialized view in SQL, and connect your BI tool using the PostgreSQL port at 4566.

For state that persists across restarts, run risingwave standalone instead of risingwave playground. Standalone mode writes to local disk while keeping single-node simplicity.

The main constraint at single-node scale is very high-cardinality aggregations across tens of millions of distinct keys, which benefit from the distributed cluster deployment documented at docs.risingwave.com. For most ops and analytics workloads -- hourly revenue breakdowns, funnel stage counts, active user totals -- standalone handles the load without tuning.

If your team already works in SQL but is frustrated by batch-refresh delays, RisingWave is the fastest path to live-updating queries using tools you already know. For teams that want real-time analysis without managing any infrastructure, VSLZ connects directly to a database or file upload and runs end-to-end analysis from a single prompt.

FAQ

What is RisingWave used for?

RisingWave is used for real-time streaming analytics. It continuously ingests data from sources like Kafka, PostgreSQL CDC, or webhooks, transforms it using SQL, and serves results through a standard PostgreSQL connection. Common use cases include live revenue dashboards, real-time marketing funnel analysis, fraud detection, and product usage monitoring where batch-refresh delays are a problem.

Does RisingWave require Kafka to get started?

No. RisingWave includes a built-in datagen connector for generating test data locally and a PostgreSQL CDC connector that streams changes directly from an existing Postgres database. You can run a full real-time analytics setup with PostgreSQL CDC and no Kafka cluster. Kafka is supported as a source and sink but is not a prerequisite.

How does RisingWave differ from Kafka and Flink?

Kafka is a message queue for transporting events; Flink is a stream processing engine for transforming them; both require a separate serving database for queries. RisingWave combines all three functions in one system. It ingests events, maintains incremental materialized views using SQL, and serves query results via the PostgreSQL wire protocol. The tradeoff is that RisingWave is optimized for read-heavy analytics queries rather than low-latency message delivery, which Kafka handles better.

Can I connect Metabase or Grafana to RisingWave?

Yes. RisingWave is wire-compatible with PostgreSQL, so any BI tool that supports a PostgreSQL connection works without modification. In Metabase, add a database using the PostgreSQL type with host localhost and port 4566. In Grafana, use the PostgreSQL data source. Superset, Redash, and DBeaver also connect using the same standard PostgreSQL settings.

What is a RisingWave materialized view and how does it update?

A materialized view in RisingWave is a SQL query that RisingWave maintains incrementally as new events arrive. Unlike a standard database materialized view that must be manually refreshed, RisingWave computes incremental updates automatically. When a new row lands in a source, RisingWave updates only the affected aggregates in the view rather than recomputing from scratch. Results typically reflect new events within 2 to 5 seconds depending on query complexity and data volume.

Related

OpenMetadata data catalog interface showing database schema discovery
Guides

How to Set Up OpenMetadata for Data Discovery

OpenMetadata is an open-source data catalog that gives teams a single place to discover, document, and govern their data assets. Setting it up takes under 30 minutes using Docker: spin up the containers, log into the UI at localhost:8585, then connect your first data source using one of 90+ pre-built connectors. Once ingestion runs, every table, column, and owner is searchable and lineage-linked across your entire stack.

Arkzero Research · Apr 29, 2026
Streamlit logo on a clean white background
Guides

How to Build a Data Dashboard with Streamlit

Streamlit is an open-source Python library that turns a script into a shareable web dashboard without any front-end code. Install it with pip, write a Python file that loads your CSV with pandas, add sidebar widgets for filtering, and render interactive charts with Plotly. Push the file to GitHub, connect it to Streamlit Community Cloud, and anyone with the URL can view live results. No server configuration required.

Arkzero Research · Apr 29, 2026
Airbyte Cloud data integration platform
Guides

How to Set Up Airbyte Cloud for Data Syncing

Airbyte Cloud is a managed data integration platform that syncs data from SaaS tools, databases, and APIs into a central warehouse without requiring Docker, infrastructure, or engineering resources. A free 30-day trial lets you connect sources like Salesforce, HubSpot, Stripe, or Google Sheets to destinations like BigQuery, Snowflake, or Postgres in minutes. This guide walks through the full setup from account creation to your first automated sync.

Arkzero Research · Apr 29, 2026