Introduction

Understanding how to Backup Your OpenShift etcd is crucial for OpenShift operators, as etcd holds the state and configuration of the entire cluster. This process ensures that in the event of a disaster, your cluster’s data integrity and availability remain uncompromised. By securing etcd backups, operators safeguard against data loss, enabling quick recovery and minimal downtime.

Procedure

Process Explanation

The backup runs every night using the cronjob mechanism in Kubernetes. This will invoke the oc debug mode and backup in each of the control-plane servers the backup process. This is done in order to have a consistent point in time for all etcd pods running.

NOTE: you must take a snapshot of the control plane servers, or if those servers are physical, make sure the backup files are synced outside to a remote location.

Apply the backup yamls

Let’s create the project for backing up the etcd

$ oc new-project etcd-backup

If project has default node worker, use the following to remove the annotation:

$ oc annotate namespace other-proj openshift.io/node-selector= --overwrite

Create the Service Account

01_sa-etcd-backup.yaml

---
kind: ServiceAccount
apiVersion: v1
metadata:
  name: openshift-backup
  namespace: etcd-backup
  labels:
    app: openshift-backup

Apply the SA:

$ oc apply -f 01_sa_etcd-backup.yaml

Create the ClusterRole

02_clusterrole.yaml

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-etcd-backup
rules:
- apiGroups: [""]
  resources:
     - "nodes"
  verbs: ["get", "list"]
- apiGroups: [""]
  resources:
     - "pods"
     - "pods/log"
  verbs: ["get", "list", "create", "delete", "watch"]

Apply the Cluster role binding

$ oc apply -f 02_clusterrole.yaml

Create the clusterrolebinding

03_clusterrolebinding.yaml

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: openshift-backup
  labels:
    app: openshift-backup
subjects:
  - kind: ServiceAccount
    name: openshift-backup
    namespace: etcd-backup
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-etcd-backup

Apply the ClusterRoleBinding yaml

$ oc apply -f 03_clusterrolebinding.yaml

Apply the correct SCC to the user

In order to provide the user access and abilities to run the scripts on the host level, run the following:

$ oc adm policy add-scc-to-user privileged -z openshift-backup

Create the cronjob

04_cronjob.yml

---
kind: CronJob
apiVersion: batch/v1beta1
metadata:
  name: openshift-backup
  namespace: etcd-backup
  labels:
    app: openshift-backup
spec:
  schedule: "56 23 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  jobTemplate:
    metadata:
      labels:
        app: openshift-backup
    spec:
      backoffLimit: 0
      template:
        metadata:
          labels:
            app: openshift-backup
        spec:
          containers:
            - name: backup
              image: "registry.redhat.io/openshift4/ose-cli"
              command:
                - "/bin/bash"
                - "-c"
                - oc get no -l node-role.kubernetes.io/master --no-headers -o name | xargs -I {} --  oc debug {} -- bash -c 'chroot /host sudo -E /usr/local/bin/cluster-backup.sh /home/core/backup/ && chroot /host sudo -E find /home/core/backup/ -type f -mmin +"1" -delete'
          restartPolicy: "Never"
          terminationGracePeriodSeconds: 30
          activeDeadlineSeconds: 500
          dnsPolicy: "ClusterFirst"
          serviceAccountName: "openshift-backup"
          serviceAccount: "openshift-backup"

Apply the cronjob.

$ oc apply -f 04_cronjob.yml

Test

In order to verify the cronjob is running and without waiting for the middle of the night, we can invoke a single job using the create job command. This will run our pod in the environment and if all is working well it will finish and be in completed statue.

Please create it using the following:

$ oc create job backup --from=cronjob/openshift-backup

Verify that the pods are not in a failed state and is Running and then Completed

$ oc get pods -n etcd-backup

Summary

At Octopus Computer Solutions, we prioritize the resilience and security of your Kubernetes deployments. Through our open-source-driven approach, we emphasize the importance of regular Backup Your OpenShift etcd processes as part of our comprehensive backup solutions. Our expertise ensures that your critical data is always protected, offering peace of mind and reinforcing the value we bring to your Kubernetes operations.

We have additional OpenShift related posts, feel free to see and share.

References

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.12/html-single/backup_and_restore/index#backup-etcd