Appendix#
Running the Velero CLI#
Velero provides a Command-Line Interface (CLI) for taking backups and performing restores. The CLI can be installed locally, or it can be invoked by kubectl exec
on the Velero Backup Controller pod.
Local Installation#
Refer to the Velero Documentation for installing Velero on your platform.
Velero uses your KUBECONFIG file to connect to the cluster.
$ velero --help
Velero is a tool for managing disaster recovery, specifically for Kubernetes
cluster resources. It provides a simple, configurable, and operationally robust
way to back up your application state and associated data.
If you're familiar with kubectl, Velero supports a similar model, allowing you to
execute commands such as 'velero get backup' and 'velero create schedule'. The same
operations can also be performed as 'velero backup get' and 'velero schedule create'.
Usage:
velero [command]
Available Commands:
backup Work with backups
backup-location Work with backup storage locations
bug Report a Velero bug
client Velero client related commands
completion Generate completion script
create Create velero resources
debug Generate debug bundle
delete Delete velero resources
describe Describe velero resources
get Get velero resources
help Help about any command
install Install Velero
plugin Work with plugins
restic Work with restic
restore Work with restores
schedule Work with schedules
snapshot-location Work with snapshot locations
uninstall Uninstall Velero
version Print the velero version and associated image
Flags:
--add_dir_header If true, adds the file directory to the header
--alsologtostderr log to standard error as well as files
--colorized optionalBool Show colored output in TTY. Overrides 'colorized' value from $HOME/.config/velero/config.json if present. Enabled by default
--features stringArray Comma-separated list of features to enable for this Velero process. Combines with values from $HOME/.config/velero/config.json if present
-h, --help help for velero
--kubeconfig string Path to the kubeconfig file to use to talk to the Kubernetes apiserver. If unset, try the environment variable KUBECONFIG, as well as in-cluster configuration
--kubecontext string The context to use to talk to the Kubernetes apiserver. If unset defaults to whatever your current-context is (kubectl config current-context)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_file string If non-empty, use this log file
--log_file_max_size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
-n, --namespace string The namespace in which Velero should operate (default "velero")
--skip_headers If true, avoid header prefixes in the log messages
--skip_log_headers If true, avoid headers when opening log files
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
Use "velero [command] --help" for more information about a command.
Executing on the Velero Backup Controller Pod#
If it is not possible to install the Velero CLI on the local workstation, you can still run Velero commands directly on the Velero pod as follows:
$ velero_namespace="Put your Velero namespace here"
$ kubectl exec velero-699dc869d4-r24bh -n $velero_namespace -c velero -- /velero help
Velero is a tool for managing disaster recovery, specifically for Kubernetes
cluster resources. It provides a simple, configurable, and operationally robust
way to back up your application state and associated data.
<<<output-truncated-for-brevity>>>
Backups using Velero#
Creating a Backup#
To take a backup of Arthur, you would invoke the CLI as follows.
$ arthur_namespace="Put your Arthur namespace here"
$ velero_namespace="Put your Velero namespace here"
$ velero backup create <some-unique-name> \
--namespace=$velero_namespace \
--include-namespaces=$arthur_namespace \
--storage-location=docs-demo-backup-location-velero
Listing all Backups#
You can list all backups using the Velero CLI:
$ velero_namespace="Put your Velero namespace here"
$ velero backup get -n $velero_namespace
Describing a Backup#
You can get an overview of the backup using the Velero CLI:
$ velero_namespace="Put your Velero namespace here"
$ velero backup describe <insert-backup-name> -n $velero_namespace
Debugging a Backup#
For debugging a backup, you can access the backup’s logs using the Velero CLI:
$ velero_namespace="Put your Velero namespace here"
$ velero backup logs <insert-backup-name> -n $velero_namespace | head
Restores using Velero#
Similar to Backup, Restore happens using the Velero CLI. A restore takes a Backup object and then executes the restore procedure.
Attempting a Restore#
You can execute a restore with the following Velero CLI command:
$ velero_namespace="Put your Velero namespace here"
$ velero restore create \
--from-backup <insert-backup-name> \
--namespace $velero_namespace \
--restore-volumes=true
Listing all Restore attempts#
Just like with the Backup, Velero will create a Restore Velero Resource, which you can inspect with the Velero CLI:
$ velero_namespace="Put your Velero namespace here"
$ velero restore get -n $velero_namespace
Describing a Restore attempt#
You can get an overview of the restore attempt using the Velero CLI:
$ velero_namespace="Put your Velero namespace here"
$ velero restore describe <insert-restore-name> -n $velero_namespace
Debugging a Restore attempt#
For debugging a restore attempt, you can access the logs using the Velero CLI:
$ velero_namespace="Put your Velero namespace here"
$ velero restore logs <insert-restore-name> -n $velero_namespace | head
Running Backups on a Schedule#
There are two ways the Arthur platform can be backed up on a schedule:
Scripting the entire backup process (see example), and executing it on a fixed schedule from a job runner (Jenkins, Gitlab-CI etc.)
Leveraging native schedulers to back up the individual components of the platform:
Clickhouse is already backed up at midnight (by default) using Kubernetes CronJobs out-of-the-box.
Use Velero Schedules to create Velero Backups:
Messaging infrastructure
# You need to configure this by getting the name of the backup storage location # eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>` $ storage_location="Put your storage location here" $ arthur_namespace="Put your Arthur namespace here" $ velero_namespace="Put your Velero namespace here" $ velero schedule create messaging-infra-backup-nightly \ --namespace=$velero_namespace \ --include-namespaces=$arthur_namespace \ --selector='app in (cp-zookeeper,cp-kafka)' \ --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,services,endpoints,configmaps,poddisruptionbudgets --storage-location=$storage_location \ --schedule "0 0 * * *" \ --ttl 720h0m0s
Enrichments (infrastructure and workflows)
# You need to configure this by getting the name of the backup storage location # eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>` $ storage_location="Put your storage location here" $ arthur_namespace="Put your Arthur namespace here" $ velero_namespace="Put your Velero namespace here" $ velero schedule create enrichments-workflows-backup-nightly \ --namespace=$velero_namespace \ --include-namespaces=$arthur_namespace \ --include-resources=workflows \ --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \ --storage-location=$storage_location \ --schedule "0 0 * * *" \ --ttl 720h0m0s $ velero schedule create qa-enrichments-infra-backup-nightly \ --namespace=$velero_namespace \ --include-namespaces=$arthur_namespace \ --selector='component in (kafka-mover-init-connector, model_server)' \ --include-resources=deployments,services \ --exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \ --storage-location=$storage_location \ --schedule "0 0 * * *" \ --ttl 720h0m0s
RDS databases can be automatically backed up on a schedule, not at a specific point in time but within a 30-minute window. And during this window, the database is snapshotted at a random time. Due to this limitation from AWS, ensure there are no operations (like model CRUD etc.) on the Arthur platform during the backup window.
$ aws rds modify-db-instance \ --db-instance-identifier RDS_DB_NAME \ --backup-retention-period 14 \ --preferred-backup-window 23:45-00:15 \ --profile AWS_PROFILE_NAME \ --region AWS_REGION \ --apply-immediately
Sample Backup Script (manual)#
The following script can be used to run all the backup steps together:
#!/bin/bash
set -euo pipefail
IFS=$'\n\t'
# You need to configure this by getting the name of the backup storage location
# eg: `velero backup-location get` or `kubectl get backupstoragelocation -n <velero-namespace>`
storage_location="Put your storage location here"
arthur_namespace="Put your Arthur namespace here"
velero_namespace="Put your Velero namespace here"
backup_date=$(date +%Y-%m-%d-%H-%M-%S);
name=arthur-backup-$backup_date
echo "Creating a new backup with name $name"
echo "Taking a backup of CH data"
kubectl create job $name-clickhouse-backup \
--namespace=$arthur_namespace \
--from=cronjob/clickhouse-backup-cronjob
ch_backup_jobname=$(kubectl get jobs -o name -n "$arthur_namespace" | grep "$name-clickhouse-backup")
kubectl wait $ch_backup_jobname \
--namespace=$arthur_namespace \
--for=condition=complete \
--timeout=30m
echo "Taking a backup of the enrichments infrastructure"
velero backup create $name-enrichments \
--namespace=$velero_namespace \
--include-namespaces=$arthur_namespace \
--selector='component in (kafka-mover-init-connector, model_server)' \
--include-resources=deployments,services \
--exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
--storage-location=$storage_location \
--wait
echo "Taking a backup of workflows"
velero backup create $name-workflows \
--namespace=$velero_namespace \
--include-namespaces=$arthur_namespace \
--include-resources=workflows \
--exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,secrets,configmaps \
--storage-location=$storage_location \
--wait
echo "Taking a backup of Kafka/Kafka-ZK StatefulSets, their EBS Volumes, and related components"
velero backup create $name-messaging \
--namespace=$velero_namespace \
--include-namespaces=$arthur_namespace \
--selector='app in (cp-zookeeper,cp-kafka)' \
--exclude-resources=clusterrolebindings.rbac.authorization.k8s.io,clusterroles.rbac.authorization.k8s.io,controllerrevisions.apps,endpointslices.discovery.k8s.io,customresourcedefinitions.apiextensions.k8s.io,services,endpoints,configmaps,poddisruptionbudgets \
--storage-location=$storage_location \
--wait
echo "Taking a backup of the RDS database"
aws rds create-db-cluster-snapshot \
--db-cluster-snapshot-identifier $name-snapshot \
--db-cluster-identifier RDS_DB_NAME \
--profile AWS_PROFILE_NAME \
--region AWS_REGION