Skip to main content

Implementation Guide

Not availableCloud Not availableSelf-Managed Community (OSS)AvailableSelf-Managed Enterprise

Airbyte Self-Managed Enterprise is in an early access stage for select priority users. Once you are qualified for a Self-Managed Enterprise license key, you can deploy Airbyte with the following instructions.

Airbyte Self-Managed Enterprise must be deployed using Kubernetes. This is to enable Airbyte's best performance and scale. The core components (api server, scheduler, etc) run as deployments while the scheduler launches connector-related pods on different nodes.

Prerequisites

Infrastructure Prerequisites

For a production-ready deployment of Self-Managed Enterprise, various infrastructure components are required. We recommend deploying to Amazon EKS or Google Kubernetes Engine. The following diagram illustrates a typical Airbyte deployment running on AWS:

AWS Architecture Diagram

Prior to deploying Self-Managed Enterprise, we recommend having each of the following infrastructure components ready to go. When possible, it's easiest to have all components running in the same VPC. The provided recommendations are for customers deploying to AWS:

ComponentRecommendation
Kubernetes ClusterAmazon EKS cluster running in 2 or more availability zones on a minimum of 6 nodes.
IngressAmazon ALB and a URL for users to access the Airbyte UI or make API requests.
Object StorageAmazon S3 bucket with two directories for log and state storage.
Dedicated DatabaseAmazon RDS Postgres with at least one read replica.
External Secrets ManagerAmazon Secrets Manager for storing connector secrets.

We require you to install and configure the following Kubernetes tooling:

  1. Install helm by following these instructions
  2. Install kubectl by following these instructions.
  3. Configure kubectl to connect to your cluster by using kubectl use-context my-cluster-name:
Configure kubectl to connect to your cluster
  1. Configure your AWS CLI to connect to your project.
  2. Install eksctl.
  3. Run eksctl utils write-kubeconfig --cluster=$CLUSTER_NAME to make the context available to kubectl.
  4. Use kubectl config get-contexts to show the available contexts.
  5. Run kubectl config use-context $EKS_CONTEXT to access the cluster with kubectl.

We also require you to create a Kubernetes namespace for your Airbyte deployment:

kubectl create namespace airbyte

Configure Kubernetes Secrets

Sensitive credentials such as AWS access keys are required to be made available in Kubernetes Secrets during deployment. The Kubernetes secret store and secret keys are referenced in your values.yml file. Ensure all required secrets are configured before deploying Airbyte Self-Managed Enterprise.

You may apply your Kubernetes secrets by applying the example manifests below to your cluster, or using kubectl directly. If your Kubernetes cluster already has permissions to make requests to an external entity via an instance profile, credentials are not required. For example, if your Amazon EKS cluster has been assigned a sufficient AWS IAM role to make requests to AWS S3, you do not need to specify access keys.

External Log Storage

For Self-Managed Enterprise deployments, we recommend spinning up standalone log storage for additional reliability using tools such as S3 and GCS instead of against using the default internal Minio storage (airbyte/minio).

Secrets for External Log Storage
apiVersion: v1
kind: Secret
metadata:
name: airbyte-config-secrets
type: Opaque
stringData:
## Storage Secrets
# S3
s3-access-key-id: ## e.g. AKIAIOSFODNN7EXAMPLE
s3-secret-access-key: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Overriding name, s3-access-key-id or s3-secret-access-key allows you to store these secrets in the location of your choice. If you do this, you will also need to specify the secret location in the bucket config for your values.yml file.

Using kubectl to create the secret directly:

kubectl create secret generic airbyte-config-secrets \
--from-literal=s3-access-key-id='' \
--from-literal=s3-secret-access-key='' \
--namespace airbyte

Ensure your access key is tied to an IAM user with the following policies, allowing the cluster to S3 storage:

{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action": "s3:ListAllMyBuckets",
"Resource":"*"
},
{
"Effect":"Allow",
"Action":["s3:ListBucket","s3:GetBucketLocation"],
"Resource":"arn:aws:s3:::YOUR-S3-BUCKET-NAME"
},
{
"Effect":"Allow",
"Action":[
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:DeleteObject"
],
"Resource":"arn:aws:s3:::YOUR-S3-BUCKET-NAME/*"
}
]
}

External Connector Secret Management

Airbyte's default behavior is to store encrypted connector secrets on your cluster as Kubernetes secrets. You may opt to instead store connector secrets in an external secret manager of your choosing (AWS Secrets Manager, Google Secrets Manager or Hashicorp Vault).

Secrets for External Connector Secret Management

To store connector secrets in AWS Secrets Manager via a manifest:

apiVersion: v1
kind: Secret
metadata:
name: airbyte-config-secrets
type: Opaque
stringData:
aws-secret-manager-access-key-id: ## e.g. AKIAIOSFODNN7EXAMPLE
aws-secret-manager-secret-access-key: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Overriding name, aws-secret-manager-access-key-id or aws-secret-manager-secret-access-key allows you to store these secrets in the location of your choice. If you do this, you will also need to specify the secret location in the secret manager config for your values.yml file.

Alternatively, you may choose to use kubectl to create the secret directly:

kubectl create secret generic airbyte-config-secrets \
--from-literal=aws-secret-manager-access-key-id='' \
--from-literal=aws-secret-manager-secret-access-key='' \
--namespace airbyte

Installation Steps

Step 1: Add Airbyte Helm Repository

Follow these instructions to add the Airbyte helm repository:

  1. Run helm repo add airbyte https://airbytehq.github.io/helm-charts, where airbyte is the name of the repository that will be indexed locally.
  2. Perform the repo indexing process, and ensure your helm repository is up-to-date by running helm repo update.
  3. You can then browse all charts uploaded to your repository by running helm search repo airbyte.

Step 2: Create your Enterprise License File

  1. Create a new airbyte directory. Inside, create an empty airbyte.yml file.

  2. Paste the following into your newly created airbyte.yml file:

Template airbyte.yml file
webapp-url: # example: http://localhost:8080

initial-user:
email:
first-name:
last-name:
username: # your existing Airbyte instance username
password: # your existing Airbyte instance password

license-key: # license key provided by Airbyte team
  1. Fill in the contents of the initial-user block. The credentials grant an initial user with admin permissions. You should store these credentials in a secure location.

  2. Add your Airbyte Self-Managed Enterprise license key to your airbyte.yml in the license-key field.

  3. To enable SSO authentication, add SSO auth details to your airbyte.yml file.

Configuring auth in your airbyte.yml file

To configure SSO with Okta, add the following at the end of your airbyte.yml file:

auth:   
identity-providers:
- type: okta
domain: $OKTA_DOMAIN
app-name: $OKTA_APP_INTEGRATION_NAME
client-id: $OKTA_CLIENT_ID
client-secret: $OKTA_CLIENT_SECRET

See the following guide on how to collect this information for Okta.

To modify auth configurations on an existing deployment (after Airbyte has been installed at least once), you will need to helm upgrade Airbyte with the additional environment variable --set keycloak-setup.env_vars.KEYCLOAK_RESET_REALM=true. As this also resets the list of Airbyte users and permissions, please use this with caution.

To deploy Self-Managed Enterprise without SSO, exclude the entire auth: section from your values.yml config file. You will authenticate with the instance admin user and password included in your airbyte.yml. Without SSO, you cannot currently have unique logins for multiple users.

Step 3: Configure your Deployment

  1. Inside your airbyte directory, create an empty values.yml file.

  2. Paste the following into your newly created values.yml file. This is required to deploy Airbyte Self-Managed Enterprise:

global:
edition: enterprise
  1. The following subsections help you customize your deployment to use an external database, log storage, dedicated ingress, and more. To skip this and deploy a minimal, local version of Self-Managed Enterprise, jump to Step 4.

Configuring the Airbyte Database

For Self-Managed Enterprise deployments, we recommend using a dedicated database instance for better reliability, and backups (such as AWS RDS or GCP Cloud SQL) instead of the default internal Postgres database (airbyte/db) that Airbyte spins up within the Kubernetes cluster.

We assume in the following that you've already configured a Postgres instance:

External database setup steps
  1. Add external database details to your values.yml file. This disables the default internal Postgres database (airbyte/db), and configures the external Postgres database:
postgresql:
enabled: false

externalDatabase:
host: ## Database host
user: ## Non-root username for the Airbyte database
database: db-airbyte ## Database name
port: 5432 ## Database port number
  1. For the non-root user's password which has database access, you may use password, existingSecret or jdbcUrl. We recommend using existingSecret, or injecting sensitive fields from your own external secret store. Each of these parameters is mutually exclusive:
postgresql:
enabled: false

externalDatabase:
...
password: ## Password for non-root database user
existingSecret: ## The name of an existing Kubernetes secret containing the password.
existingSecretPasswordKey: ## The Kubernetes secret key containing the password.
jdbcUrl: "jdbc:postgresql://<user>:<password>@localhost:5432/db-airbyte" ## Full database JDBC URL. You can also add additional arguments.

The optional jdbcUrl field should be entered in the following format: jdbc:postgresql://localhost:5432/db-airbyte. We recommend against using this unless you need to add additional extra arguments can be passed to the JDBC driver at this time (e.g. to handle SSL).

Configuring External Logging

For Self-Managed Enterprise deployments, we recommend spinning up standalone log storage for additional reliability using tools such as S3 and GCS instead of against using the default internal Minio storage (airbyte/minio). It's then a common practice to configure additional log forwarding from external log storage into your observability tool.

External log storage setup steps

Add external log storage details to your values.yml file. This disables the default internal Minio instance (airbyte/minio), and configures the external log database:

Ensure you've already created a Kubernetes secret containing both your S3 access key ID, and secret access key. By default, secrets are expected in the airbyte-config-secrets Kubernetes secret, under the aws-s3-access-key-id and aws-s3-secret-access-key keys. Steps to configure these are in the above prerequisites.

global:
storage:
type: "S3"
storageSecretName: airbyte-config-secrets # Name of your Kubernetes secret.
bucket: ## S3 bucket names that you've created. We recommend storing the following all in one bucket.
log: airbyte-bucket
state: airbyte-bucket
workloadOutput: airbyte-bucket
s3:
region: "" ## e.g. us-east-1
authenticationType: credentials ## Use "credentials" or "instanceProfile"

Set authenticationType to instanceProfile if the compute infrastructure running Airbyte has pre-existing permissions (e.g. IAM role) to read and write from the appropriate buckets.

Configuring Ingress

To access the Airbyte UI, you will need to manually attach an ingress configuration to your deployment. The following is a skimmed down definition of an ingress resource you could use for Self-Managed Enterprise:

Ingress configuration setup steps
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: # ingress name, example: enterprise-demo
annotations:
ingress.kubernetes.io/ssl-redirect: "false"
spec:
ingressClassName: nginx
rules:
- host: # host, example: enterprise-demo.airbyte.com
http:
paths:
- backend:
service:
# format is ${RELEASE_NAME}-airbyte-webapp-svc
name: airbyte-enterprise-airbyte-webapp-svc
port:
number: 80 # service port, example: 8080
path: /
pathType: Prefix
- backend:
service:
# format is ${RELEASE_NAME}-airbyte-keycloak-svc
name: airbyte-enterprise-airbyte-keycloak-svc
port:
number: 8180
path: /auth
pathType: Prefix
- backend:
service:
# format is ${RELEASE_NAME}-airbyte--server-svc
name: airbyte-enterprise-airbyte-server-svc
port:
number: 8001
path: /api/public
pathType: Prefix

Once this is complete, ensure that the value of the webapp-url field in your values.yml is configured to match the ingress URL.

You may configure ingress using a load balancer or an API Gateway. We do not currently support most service meshes (such as Istio). If you are having networking issues after fully deploying Airbyte, please verify that firewalls or lacking permissions are not interfering with pod-pod communication. Please also verify that deployed pods have the right permissions to make requests to your external database.

Configuring External Connector Secret Management

Airbyte's default behavior is to store encrypted connector secrets on your cluster as Kubernetes secrets. You may optionally opt to instead store connector secrets in an external secret manager such as AWS Secrets Manager, Google Secrets Manager or Hashicorp Vault. Upon creating a new connector, secrets (e.g. OAuth tokens, database passwords) will be written to, then read from the configured secrets manager.

Configuring external connector secret management

Modifing the configuration of connector secret storage will cause all existing connectors to fail. You will need to recreate these connectors to ensure they are reading from the appropriate secret store.

If authenticating with credentials, ensure you've already created a Kubernetes secret containing both your AWS Secrets Manager access key ID, and secret access key. By default, secrets are expected in the airbyte-config-secrets Kubernetes secret, under the aws-secret-manager-access-key-id and aws-secret-manager-secret-access-key keys. Steps to configure these are in the above prerequisites.

secretsManager:
type: awsSecretManager
awsSecretManager:
region: <aws-region>
authenticationType: credentials ## Use "credentials" or "instanceProfile"
tags: ## Optional - You may add tags to new secrets created by Airbyte.
- key: ## e.g. team
value: ## e.g. deployments
- key: business-unit
value: engineering
kms: ## Optional - ARN for KMS Decryption.

Set authenticationType to instanceProfile if the compute infrastructure running Airbyte has pre-existing permissions (e.g. IAM role) to read and write from AWS Secrets Manager.

To decrypt secrets in the secret manager with AWS KMS, configure the kms field, and ensure your Kubernetes cluster has pre-existing permissions to read and decrypt secrets.

Step 4: Deploy Self-Managed Enterprise

Install Airbyte Self-Managed Enterprise on helm using the following command:

helm install \
--namespace airbyte \
--values ./values.yml \
--set-file airbyteYml="./airbyte.yml" \
airbyte-enterprise \
airbyte/airbyte

To uninstall Self-Managed Enterprise, run helm uninstall airbyte-enterprise.

Updating Self-Managed Enterprise

Upgrade Airbyte Self-Managed Enterprise by:

  1. Running helm repo update. This pulls an up-to-date version of our helm charts, which is tied to a version of the Airbyte platform.
  2. Re-installing Airbyte Self-Managed Enterprise:
helm upgrade \
--namespace airbyte \
--values ./values.yml \
--set-file airbyteYml="./airbyte.yml" \
--install airbyte-enterprise \
airbyte/airbyte

Customizing your Deployment

In order to customize your deployment, you need to create an additional values.yaml file in your airbyte directory, and populate it with configuration override values. A thorough values.yaml example including many configurations can be located in charts/airbyte folder of the Airbyte repository.

After specifying your own configuration, run the following command:

helm upgrade \ 
--namespace airbyte \
--values path/to/values.yaml
--values ./values.yml \
--set-file airbyteYml="./airbyte.yml" \
--install airbyte-enterprise \
airbyte/airbyte

Customizing your Service Account

You may choose to use your own service account instead of the Airbyte default, airbyte-sa. This may allow for better audit trails and resource management specific to your organizational policies and requirements.

To do this, add the following to your values.yml:

serviceAccount:
name: