Hello to all folks from the Kubernetes world, In this Article and the next I will be explaining how to implement Kasten (k10 by Veeam) onto the TKGS (Tanzu Kubernetes Grid with vSphere) guest cluster workload and make it happen.
First of all, I would like to thank Michael Courcy from Kasten, Clarence Pouthier, and their colleagues.
(names will be added in the future when available )
Background Story:
Our Story began when we didn’t have a working DR backup to our workload cluster, and Murphy just waited for the right moment to create one.
My colleagues and I realized we need a holistic tool for our on-prem and cloud k8s clusters, mostly for our on-premise ones.
That’s how I came up with Kasten by Veeam. But we did have a few challenges on our way to making it happen.
THAT IS WHY I’M MAKING THIS ARTICLE FOR OTHER PPL TO SAVE TIME
TKGS HIGH-LEVEL Overview architecture
Before stepping forward you must! when I say must I mean must understand first how TKGS works.
vCenter → vSphere Cluster → SuperVisor Cluster Bootstrap and Manager → vSphere Namespace → TKG guest Cluster → VirtualMachine(k8s cluster VM’s)
In the above diagram, you can slightly understand how it’s designed for you later.
You can see, that we have a manager cluster that runs clusters underneath as namespaces.
Important note! Most of the guest cluster components are managed by a Supervisor. This means your CSI, APISERVER, cluster and node lifecycle, flavors, special attributes, plugins, Storage, volumes, etc. are managed by it.
Another Diagram of how the backup of PVC/data moved inside the cluster
First the PVC simple provisioning in high level
When using the Default backup solution that comes by default with TKGS, then in below you can see the whole flow velero architecture.
TKGS AT BACKUPS
Well in Kasten Documentation you will able to see that they do not support backing up the Supervisor Cluster.
See the link in the vSphere Section → Storage Integration — K10 7.0.4 documentation (kasten.io) (At time of writing this)
https://docs.kasten.io/latest/install/storage.html#vsphere-profile
TKGS comes built-in with DataProtection service which installs velero for you and dependencies object for velero, on the TKG guest cluster
TKGS made DataProtection CRD for them, to be able to manage it via their Control UI.
From my experience, it is not working well, due to the supervisor creating a para-virtualized CSI, that creates another hop (or point in the way) that deals with the backup/snapshot.
Causes the backup of PVC to break, and sometimes also for regular non-pvc backup depending on what you are backing up.
You have the backup/snapshot go to the supervisor first and then go to Datastore.
Finally Getting Started
Prerequisites
- TKGS guest cluster running
- Obviously above a Supervisor Cluster running
- velero-vSphere cli installed on bastion/jumpbox Server
- service account preconfigured with Kasten’s permissions requisite
- Find doc under vsphere-profile section Storage Integration — K10 7.0.4 documentation (kasten.io)
- vCenter root/admin access
- port 902 open between guest cluster network to ESXI hosts
- Velero vSphere Plugin installed on Supervisor
- Good Mood and Good music in the background to listen to 🙂
My environment details: (At time of writing )
- vCenter version: 8.0.2
- Esxi Hosts version: 8.0.2
- vSphere Velero CLI version: 1.6
- Supervisor Cluster Version: 1.26.5
- TKC — Guest Cluster Version: 1.26.5
TKGS Supervisor from version 1.27.x (vCenter version 8.0.3+) comes with the velero plugin installed with it
- Installting the Velero vSphere Plugin on Supervisor Cluster
- Workload Management → Service TAB → Add New Service
Click on the link to download the service yaml manifest for the plugin.
Eventually, it’s registering the service as vSphere pod operator for the Supervisor Cluster
Download the CLI tar.gz file CLI tool, for later use. open it on the bastion server and move the binary to your /usr/bin directory, give it the right permission.
Then Click on Upload and -> choose the right manifest — > click finish
You should now see the new Service of “Velero vSphere Operator”
Click on Manage Services and add The Supervisor cluster to install the plugin.
Click on Finish and it will install the plugin at your supervisor cluster
You will see a new namespace created for the operator
Go back to inventory -> vSphere Cluster -> Namespaces -> svc-velero-vsphere-xxxx
Before moving ahead make sure you have a vSphere Service Account that uses the vSphere role with the right Permissions:
See Kasten Documentation for this: Storage Integration — K10 7.0.4 documentation (kasten.io) https://docs.kasten.io/latest/install/storage.html#vsphere-profile
See below my configured permissions (ignore migration under cryptographic operations) I’ve done a few trials
OK, looks like we are all set with prerequisites.
- I’ve already checked that port 902 is open between the guest cluster and
- velero vSphere is already installed on my bastion/jump box Server
- Velero vSphere Plugin installed on Supervisor Cluster
I’ll See you next in the Second Part → Kasten TKGS Backup Implementation Deep Dive Explained — PART 2