How to Set Up Airbyte Cloud for Data Syncing
Last updated Apr 29, 2026

Getting your business data into one place used to require a data engineer, a Kubernetes cluster, or both. Airbyte Cloud eliminates that. Sign up, pick a source, pick a destination, and your first sync runs in under 30 minutes. The cloud version handles infrastructure, connector updates, and scaling automatically, so operations teams and analysts can move data without touching a server.
What Airbyte Does
Airbyte is an open-source data integration platform that moves data from sources to destinations using pre-built connectors. A source is any system where your data lives: a SaaS app like Salesforce or Stripe, a database like PostgreSQL or MySQL, a file store like S3, or a spreadsheet tool like Google Sheets. A destination is where you want the data to go, typically a cloud data warehouse like BigQuery, Snowflake, or Redshift.
The platform performs ELT (Extract, Load, Transform), which means it copies data as-is into your destination and lets your SQL or analytics layer handle transformation. This differs from the older ETL pattern, which required transforming data in transit. ELT keeps the raw data intact and separates concerns cleanly: Airbyte handles movement, your warehouse handles modeling.
As of 2026, Airbyte maintains over 600 connectors. Community contributions add new ones weekly. The GitHub repository has over 16,000 stars, and the platform is used by organizations including Cisco, Red Bull, and Typeform to power production data stacks.
Airbyte Cloud vs. Self-Hosted
Airbyte offers two deployment options: Airbyte Cloud, which is fully managed, and self-hosted, which runs on Docker Compose or Kubernetes on your own infrastructure.
Self-hosted is the right choice if you need full data residency control, have strict compliance requirements, or already run Kubernetes. It requires a machine with at least 8 GB of RAM and comfort with Docker Compose. The default self-hosted setup runs all services locally, including a Postgres instance for the internal catalog.
Airbyte Cloud requires none of that. There is no Docker to configure, no server to provision, and no updates to manage. Airbyte handles availability, connector version upgrades, and autoscaling automatically. The cloud version starts with a free 30-day trial and pricing after that is usage-based, starting at roughly $10 per month for light workloads. For most small teams and startups syncing a handful of sources daily, the cloud version is the faster and cheaper path.
This guide covers Airbyte Cloud.
Step 1: Create Your Account
Go to airbyte.com and click "Get started free." You can sign up with a GitHub or Google account. No credit card is required for the 30-day trial.
Once logged in, Airbyte places you in a default workspace. A workspace is an isolated environment for your connections. Most teams use one workspace. Larger organizations with separate staging and production environments may create additional workspaces to keep configurations distinct.
Step 2: Add a Source
A source is the system you want to pull data from. In the Airbyte Cloud UI, click "Sources" in the left sidebar, then "New source."
The connector catalog opens. Use the search bar to find your source by name. If your company uses HubSpot for CRM, search "HubSpot." For transactional data from Stripe, search "Stripe." The catalog lists connectors organized by type: SaaS applications, databases, file stores, and developer tools.
After selecting a connector, Airbyte prompts for credentials. What it needs depends on the source. For HubSpot, you authenticate via OAuth by clicking "Authenticate with HubSpot" and granting access in your browser. For Postgres databases, you enter the host, port, username, password, and database name. For Google Sheets, you share the spreadsheet with a service account email that Airbyte provides.
After entering credentials, click "Set up source." Airbyte tests the connection and returns a list of available streams. A stream is one logical data set within the source. For HubSpot, streams include Contacts, Companies, Deals, and Email Events. For Stripe, streams include Charges, Customers, Invoices, and Subscriptions.
This step takes under five minutes for most SaaS sources.
Step 3: Add a Destination
A destination is where Airbyte writes the synced data. Click "Destinations" in the left sidebar, then "New destination."
Common choices for teams getting started include BigQuery (Google Cloud's serverless warehouse with a free tier), Snowflake (enterprise-grade, pay-per-query), and Postgres (a relational database you can run on Supabase or Neon for free). BigQuery's free tier covers 10 GB of storage and 1 TB of queries per month, which is enough to run a real analytics operation for a small team.
To add BigQuery as a destination, you need a Google Cloud project and a service account with BigQuery Data Editor and Job User roles. Airbyte's documentation walks through creating the service account and downloading the JSON key file. Paste the key contents into the Airbyte destination form, specify a dataset name, and click "Set up destination."
For teams not yet using a warehouse, Postgres on Supabase is the fastest path: create a free Supabase project, copy the connection string from the Supabase dashboard, and paste it into the Airbyte destination form.
Step 4: Configure Your Connection
With a source and destination defined, create a connection to link them. Airbyte guides you through three configuration decisions.
Sync frequency. Choose how often Airbyte runs the sync: every 24 hours, every 6 hours, hourly, or on-demand. For most operational reporting use cases, a daily sync at off-peak hours is sufficient. More frequent schedules increase credit consumption.
Stream selection. Pick which streams to sync. Enable only the streams you need. Syncing every available table from a Salesforce instance wastes storage and slows pipelines. A typical sales reporting setup needs Accounts, Contacts, Opportunities, and Activities.
Sync mode. Each stream can run in one of several modes. Full refresh replaces the destination table entirely on each run, which works well for small, slow-changing data sets. Incremental append adds only new or changed records without modifying existing rows, which is more efficient for large tables like event logs or transactions. Incremental deduped history updates rows in place, keeping one record per primary key, which is the right mode for entity tables like Customers or Products where attributes change over time.
Click "Set up connection." Airbyte triggers a test sync to verify the full pipeline end to end.
Step 5: Run Your First Sync
After the connection is created, the first sync runs automatically. Completion time depends on the source and data volume. A HubSpot full refresh covering 10,000 contacts and 5,000 deals typically finishes in two to five minutes. A Stripe sync covering a full year of transaction history may take 15 to 30 minutes.
The connection dashboard shows sync status in real time: records extracted, records loaded, bytes transferred, and any errors. Airbyte logs each sync run with full details, so failed syncs are straightforward to diagnose. Common failure causes include expired OAuth tokens, changed credentials, or schema drift when an upstream source adds or removes a field.
Once the sync completes, the destination tables are populated and ready to query. In BigQuery, tables appear under the dataset you specified. In Postgres, they appear as new tables in the target schema.
What to Do With the Data
After your first sync, the data sits in your destination as raw tables. From here, most teams take one of two paths: direct analysis or a transformation layer.
For direct analysis, connect a BI tool to your warehouse. Metabase and Looker Studio both connect to BigQuery and Postgres in under five minutes and let non-technical teammates build dashboards on the synced data without writing SQL.
For teams that want to query the data in plain English without setting up a dashboard, tools like VSLZ connect directly to your data source and return charts and statistical summaries from a single prompt.
For teams that want modeled data, dbt Core sits on top of Airbyte's raw tables and transforms them into clean, typed, tested models. This is the standard modern data stack: Airbyte for ingestion, dbt for transformation, and a BI tool for presentation.
Common Mistakes to Avoid
Syncing too many streams on launch is the most common setup error. Start with three or four tables that answer a specific business question: customer acquisition, pipeline health, or revenue by segment. Add streams incrementally as the team builds confidence in the pipeline.
Ignoring schema changes is the second mistake. SaaS vendors add and remove fields without warning. Airbyte detects schema changes and can alert you or handle them automatically by adding columns and nullifying removed fields. Configure schema change handling explicitly in the connection settings to avoid silent failures.
Skipping the incremental sync mode for large tables is the third mistake. Running full refresh on a table with 500,000 rows every hour is expensive and slow. Set large transaction and event tables to incremental mode from the start.
Mixing environments in one workspace creates maintenance overhead later. Create separate Airbyte connections for staging and production sources, even if the destinations are different schemas within the same warehouse.
FAQ
How do I connect Salesforce to BigQuery using Airbyte?
In Airbyte Cloud, add Salesforce as a source by entering your Salesforce credentials or authenticating via OAuth. Then add BigQuery as a destination by providing your Google Cloud project ID, dataset name, and a service account JSON key with BigQuery Data Editor and Job User roles. Create a connection between the two, select the Salesforce objects you want to sync (Accounts, Contacts, Opportunities, etc.), choose incremental deduped history as the sync mode for entity tables, and run the connection. Your first sync will populate the selected tables in your BigQuery dataset within minutes.
Is Airbyte Cloud free to use?
Airbyte Cloud offers a free 30-day trial with no credit card required. After the trial, pricing is usage-based starting at approximately $10 per month, which includes a small credit allocation. Additional data synced beyond the included credits is billed at a per-record rate. For light workloads such as a few sources syncing daily, monthly costs are typically under $50. The self-hosted version of Airbyte is free to run but requires you to manage the infrastructure.
What is the difference between full refresh and incremental sync in Airbyte?
Full refresh replaces the entire destination table with fresh data on every sync run. This is simple and reliable but expensive for large tables, since all records are re-transferred every time. Incremental append adds only records created or updated since the last sync, which is faster and cheaper. Incremental deduped history also adds only new and changed records but maintains one row per primary key in the destination, so updated records replace old ones instead of accumulating duplicates. Use full refresh for small static tables, incremental append for event logs, and incremental deduped history for entity tables like customers and products.
Can I use Airbyte without a data warehouse?
Yes. Airbyte supports Postgres as a destination, which you can run for free on Supabase or Neon without setting up a dedicated warehouse. This is a practical starting point for teams that want to centralize data but are not yet ready to pay for Snowflake or BigQuery. The trade-off is that Postgres is a transactional database, not an analytical one, so query performance on large analytical workloads will be slower than a purpose-built warehouse. Most teams start with Postgres on Supabase and migrate to BigQuery or Snowflake once data volumes grow.
How do I fix a failed Airbyte sync?
In Airbyte Cloud, click on the connection that failed to open the sync history. Each run shows a status log with the exact error message. The most common causes are expired OAuth tokens (fix by clicking re-authenticate on the source), changed database credentials (update the source configuration), network connectivity issues (check firewall rules for self-hosted destinations), and schema drift where a source added or removed fields (configure schema change handling to auto-propagate or pause on change). After fixing the root cause, trigger a manual sync from the connection dashboard to verify the fix before relying on the next scheduled run.


