Kubernetes Backup and Recovery with HPE and Kasten K10
Transcription is available below video player.
Welcome to the screencast where we will talk about getting started with Kasten K10 by Veeam using the HPE CSI driver for Kubernetes. My name is Michael Mattsson. I'm a tech marketing engineer with Hewlett Packard enterprise. Kasten is a leader in Kubernetes backup and disaster recovery. Kasten K10, a data management platform purpose-built for Kubernetes, provides enterprise operations teams and easy to use scalable and secure systems for backup and restore, disaster recovery, and application mobility with operational simplicity. Kasten is an independent Kubernetes business unit within Veeam. Before we jump into the demo, there are a few key concepts I want us to go over. The HPE CSI driver for Kubernetes is a multi-platform multi-vendor driver. You gain access through these storage systems from Kubernetes is through the REST APIs and data paths. The driver itself is installed in Kubernetes. It will give you access to a number of storage related objects on top of Kubernetes, such as a storage class, persistent volume claim, persistent volumes, and a few snapshots-related objects.
A common pattern on Kubernetes is to isolate applications in namespaces. That's where all your secrets, your config maps, your workload controllers, and of course your persistent volume claims reside. Kasten K10 runs in its own namespace and is deployed on Kubernetes like any other application. It is also capable of discovering other namespaces and applications and all the objects that pertains to that application. With that information, Kasten K10 will be able to perform data management operations on your volume snapshot, to persistent volume claims and all the other objects that pertain to the application itself. And that is how you would perform backups and snapshots and recovery and disaster recovery of your applications running on Kubernetes. To set the context further, I want to clarify a few things. We have the HPE CSI driver for Kubernetes installed and configured. We have our default storage class, which is using a supported container storage provider. And... What we want to go through is how we prepare to cluster to deploy Kasten K10 and all the necessary minutiae to perform a manual snapshot and recovery scenario. So let's get started.
First, we need to visit scod.hpedev.io, where all the documentation lives, for the HPE CSI driver for Kubernetes. We want to go into the using section and enable CSI snapshots. There's a stanza here that explains how to install the external snapshotter and check out a specific branch and deploy the necessary CRDs for the volume snapshots, the volume snapshot class and the volume snapshot contents. And it also deploys the snapshot controller. There we go. Further down here, there is a provisioning section that explains how to use CSI snapshots. So, we need to create a volume snapshot class that we can give to Kasten K10 to perform snapshots of volumes provisioned from the HPE standard storage class.
There we go. Next, we need to go to docs.Kasten.io and run through some of the prerequisites, and also we will be installing directly from here as well. So under the installing tab, you will be able to find storage integration and storage integration underneath there. You have the CSI tab, the HPE CSI drivers or standard CSI drivers. You don't need any special sauce to get it running. And what do you, what you want to look for here is the annotation here. So this particular annotation, you need to, annotate your volume snapshot class with, to allow Kasten K10 to perform the snapshots and recovery.
There we go. kubectl and ready. Next we need to go to install requirements. So there are a few prerequisites here. So we want to be able to add the Helm repo forecasting. It lives in its own Helm repo. We also want to perform a Helm repo update. Happy Helming! Next, we want to create a namespace where we will install Kasten. Namespace created. Next, we need to go to installing K10 on Kubernetes, and we are using a standard upstream open-source Kubernetes, no special distribution. So, we need to then hit the Helm command again and install in our newly created Kasten-io namespace.
We can also watch the pods coming up for K10, and there are a number of them. All right, that looks good enough. Next, we want to access the dashboard, but first we need to set up a port-forward so I can access the dashboard from my desktop browser. Do some kubectl-ing. There we go, ah, forwarding. We can go back to the browser and open up link in a new tab, and we should be able to see the Kasten K10 dashboard. You need to fill in your company name and an email address and hit accept.
There are a few welcome screens here, but essentially Kasten K10 is now ready to perform snapshots, backups, and recoveries. But we need something to snapshot. So I'm going to install a more mongodb. This is our standard Helm chart from the Bitnami chart repository. And I'm also going to use our custom values file that basically just sets a admin password that is not a random string. Set that to admin. I do not recommend that in production. Back in the K10 interface, we can already see that the mariadb application had been discovered, right? So we can see the details here. We can see all the artifacts that pertain to mariadb. We have our persistent volume claim. We can see that it is a stateful set, the service that we can connect to the database and a few other details.
We also need to have some data to look at, right? So I'm going to install a database that is basically employee tests, test data, data set essentially. It's available on GitHub. I also set up a port forward in the background so I can access the database from a desktop. So we can see that I can connect to the database, this is a standard mariadb instance. And we can start loading the employees database, all the rows and records. It also contains a test, right? So you can test the data set for consistency. So you have the expected records and the CRCs, and we can see it immediately that everything matches. We're up and running. The database is now installed, and this is our known good state. So we want to go back to Kasten K10 and perform a snapshot of that state that we have the current application in, and that will bundle up all the artifacts, including taking a snapshot with HPE CSI driver on the backend.
And this usually just takes a couple of seconds. There we go. It took 15 seconds to create a crash consistent state of that particular application. We can also see in the mariadb namespace that we have a volume snapshot on our persistent volume claim that K10 just created for us. I now want to create a common user error, a common DBA error, where you simply use the, you log into the database and you've got instructions that you were going to delete a department from the database. You do a delete from departments without qualifying further. And all of a sudden you blew all your departments from your database, right? So when you go back out and rerun the test of the data set, you will see that there will be records missing and the CRCs doesn't match. Right?
So now we have a DBA on fire and we want to be able to get his database back to our known good state. And we have a snapshot that we created manually. And so we just want to hit that restore button as quickly as we can to restore to that previous state. We have our manual protected that we just created. We're going to hit that and we can qualify further what we want to restore, but we want to restore all the artifacts and just get the application as quickly as possible back to the previous known state.
So there we go. Hit restore. We are really, really, really sure. There we go. Heading back to the dashboard, we can see that the job has already started. And this operation has been sped up in a demo just to kind of showcase how long it would take, but it takes around a minute. And what happens here in the background is that all the objects tickets tear down that needs to be recreated. A new PVC is created from the volume snapshot that we created with the HPE CSI driver. And the application should be up and running in just a minute or so. Yeah. So we can see that the job took one minute 12 seconds there. Everything's restored and up and running.
Right. So we should be able to head back and run the test again. We can see that we have lost a connection. So I'm just reconnecting it in the background, setting up the port-forward again and run the test script again. And we can immediately see here that the records match the CRC match, and we're back to our known good state using just a few clicks.