Backing Up the Arthur Platform#

Once all the pre-requisites have been met, the various Arthur platform components can be backed up. The process to manually backup individual components is detailed below, which may also be scripted.

Backing up Clickhouse Data#

By default, the Arthur Platform ships with a Kubernetes CronJob which backs up Clickhouse each day at midnight.

To manually back up ClickHouse data, you can run the following commands:

arthur_namespace="Put your Arthur namespace here"
$ kubectl get cronjobs -n $arthur_namespace | grep -i clickhouse
NAME                              SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
clickhouse-backup-cronjob         0 0 * * *   False     0        14h             2d18h
$ kubectl create job clickhouse-backup --from=cronjob/clickhouse-backup-cronjob -n $arthur_namespace
job.batch/clickhouse-backup created
$ kubectl get jobs -n $arthur_namespace
NAME                                       COMPLETIONS   DURATION   AGE
clickhouse-backup-cronjob-27735840         1/1           8m35s      14m

Backing Up Enrichments#

The Arthur Platform uses Velero to take a backup of the Enrichments Infrastructure, and the Enrichments workflows. The Enrichments infrastructure and Enrichment Workflows are orchestrated as separate backups and will require running 2 separate commands.

Backing Up Enrichments Infrastructure#

To manually back up the Enrichments infrastructure, run the following commands:

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"

arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
$ velero backup create $name-enrichments \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --selector='component in (kafka-mover-init-connector, model_server)' \
    --include-resources=deployments,services \,,controllerrevisions.apps,,,secrets,configmaps \
    --storage-location=$storage_location \

Backing Up Enrichments Workflows#

To manually back up the Enrichments Workflows, run the following commands:

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"

arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
velero backup create $name-workflows \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --include-resources=workflows \,,controllerrevisions.apps,,,secrets,configmaps \
    --storage-location=$storage_location \

Backing Up Messaging Infrastructure#

The Arthur Platform uses Velero to take a backup of the Kafka (and ZooKeeper) Deployment State and EBS Volumes. To manually back up Kafka, run the following commands:

# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"

arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
$ velero backup create $name-messaging \
    --namespace=$velero_namespace \
    --include-namespaces=$arthur_namespace \
    --selector='app in (cp-zookeeper,cp-kafka)' \,,controllerrevisions.apps,,,services,endpoints,configmaps,poddisruptionbudgets \
    --storage-location=$storage_location \

Backing Up RDS Postgres#

RDS database backups are called Snapshots. To manually create a snapshot of an RDS database, execute the below script:

$ backup_date=$(DATE +%Y-%m-%d-%H-%M-%S);
$ name=arthur-backup-$backup_date
$ aws rds create-db-cluster-snapshot \
    --db-cluster-snapshot-identifier $name-snapshot \
    --db-cluster-identifier RDS_DB_NAME \
    --profile AWS_PROFILE_NAME \
    --region AWS_REGION


The command is only compatible for a multi-region RDS database cluster. If you are using a single-region RDS database, the command to use is aws rds create-db-snapshot.

For more information, please refer to the AWS Documentation: