Multi-Cluster Global Policies with Kasten K10 and Razee
Today, we are seeing a large number of customers either already adopting or expressing interest in the use of multiple Kubernetes clusters in every one of their environments (production, staging, dev, etc.). These clusters are often being split across application, security, or team boundaries and the architecture is further complicated by the fact that these clusters will often be deployed across multiple availability zones, regions, clouds, and on-premises data centers. Our observations have also been backed up by this recent CNCF survey that showed that the vast majority of Kubernetes users have deployed multiple production clusters.
Here, at Kasten, we are focused on making critical Day 2 operations such as backup, disaster recovery, and mobility for all your cloud-native applications dead simple. While our K10 data management platform already works across all the above environments, we are also actively exploring how to make it easy to propagate things like global policies, mobility profiles, and more across all these clusters.
Given that our requirements are not unique, we started looking at a number of interesting community projects in the multi-cluster resource management space that are out there today. This blog post describes the use of Razee, one such multi-cluster project being driven by IBM, with K10, the market leader for application backup, disaster recovery, and mobility for Kubernetes. We will use Razee to distribute a global policy that, in this example use case, will protect all Helm-deployed applications running in your Kubernetes clusters.
I. Installing Razee
As documented on its GitHub page, “Razee is an open-source project that was developed by IBM to automate and manage the deployment of Kubernetes resources across clusters, environments, and cloud providers.” While the scale here isn’t the 10s of 1000s of Kubernetes clusters IBM deploys, the tagline for the project does fit the use case.
To get started, I use the command visible on the Razee dashboard to deploy the Razee agent into my cluster:
II. Installing K10, An Enterprise-Grade Data Management Platform
After installing Razee, I added the free and fully-featured edition of K10, our data management platform that is deeply integrated into Kubernetes. If you aren't already familiar with it, K10 provides you an easy-to-use and secure system for backup/restore and mobility of your entire Kubernetes application. This includes use cases such as:
Backup/restore for your entire application stack to make it easy to “reset” your application to a good known state
Cloning your application to a different namespace for debugging
Installing K10 is quite simple and you should be up and running in 5 minutes or less! It is usually a one-line Helm command or a single-click install in cloud marketplaces. Install documentation for your environment can be found here but, for my cluster, I deployed K10 using the following commands:
Is defined as a Kubernetes-native resource (a CustomResource)
On an hourly frequency, will protect all applications that are deployed via Helm. It has a wildcard selector that detects Helm applications via the label heritage: Helm.
Has a flexible retention scheme for backups and snapshots to both manage costs and meet compliance requirements. In particular, it uses a GFS-based retention scheme to allow you to decide on the number of total backups stored and the rolloff from one tier to the next (e.g., one hourly backup rolls off into a daily backup once a day). This way, you can keep an hourly granularity for fast restores but still have the ability to go back days and weeks without having to keep every backup around.
III. Integrating K10 and Razee
Finally, to get Razee in my cluster to start pulling in the above policy (and all subsequent edits to the policy), I created a file with the following content:
The above Kubernetes resource is a Razee RemoteResource. It is used to automatically deploy Kubernetes resources that are stored in the URL specified within it (multiple URLs and S3-based remote resources are also possible). I applied the RemoteResource to my cluster using:
$ kubectl apply -f distribute-k10-policies.yaml
For the demo recording below, I had preinstalled MySQL via Helm 3 before I installed either Razee or K10.
$ helm --namespace mysql install k10 stable/mysql
As you can see in the video below, magic happens where the K10 policy defined above gets pulled into the cluster by Razee, K10 automatically discovers it, and then starts protecting all Helm-deployed applications.
And, all of this happens in less than 45 seconds!
V. Next Steps, Related Projects, and More...
Exploring integrations with projects like Razee is still in the initial stages for us but there are a number of things that stand out such as its scalable pull-based model and ease-of-use. There are also a number of other features in Razee that I also want to dig deeper into including how templating works, using RemoteResources to bootstrap other RemoteResources, and to see if there is a secure way to handle secret distribution.
Finally, one should note that Razee has common goals with other projects. While there are a number of projects in the multi-cluster networking space, the biggest related one for configuration is the Kubernetes community-driven kubefed effort. Even though the project is on its second incarnation, progress there seems to have been stalled in an alpha stage. I am not close enough to the project to identify why but it does look like it tried to solve too much at the same time, conflated cross-cluster application deployment with multi-cluster management, and tried to address them both in the same solution. We also ran into the kubed project but it didn't fit our customer requirements as it only synchronizes configuration and secrets.
Overall, given the increasingly common multi-cluster deployment patterns seen with Kubernetes and our goals to deliver simplicity, ease-of-use, and reduce operational burden for DevOps teams, the ability to seamlessly handle multi-cluster operations via a unified control plane in our K10 product is very important to us. Configuration and resource distribution demonstrated in this article is just one aspect of handling multi-cluster management. We have very ambitious plans and so, stay tuned, reach out if you would like early previews, and feel free to send us input on what you might be looking for!
Niraj Tolia is the General Manager and President of Kasten (acquired by Veeam), that he founded in order to solve the problem of Kubernetes backup and disaster recovery. With a strong technical background in distributed systems, storage, and data management, he has held multiple leadership roles in the past, including Senior Director of Engineering for Dell EMC's CloudBoost group and VP of Engineering and Chief Architect at Maginatics (acquired by EMC). Dr. Tolia received his PhD, MS, and BS in Computer Engineering from Carnegie Mellon University.
Kasten, Inc. 8800 Lyra Drive, Suite 450 Columbus, Ohio 43240
We value the critical role that the security community plays in helping us protect the confidentiality, integrity, and availability of our software, services, and information. If you have information about security vulnerabilities that affect Kasten software, services, or information, please report it to us via our HackerOne Vulnerability Disclosure Program, or anonymously via this form.