In the early days of Kubernetes, many of us were told to use Kubernetes primarily to run stateless workloads, even though Kubernetes came with a collection of APIs and constructs to support legacy workloads. Back then, it was assumed that when it came to stateful workloads, it was better to use managed services.
Today, users are running and scheduling stateful workloads directly on Kubernetes. What’s changed?
- The emergence of an operator pattern that eases the automation of data services and maintenance.
- There’s been continuous growth and improvement of the Container Storage Interface (CSI)
- People have more experience and expertise with cloud native stateful workloads and as a community, we are now more confident in our ability to manage them.
However, there are many challenges with data protection workflows when it comes to backing up stateful workloads, and different strategies have emerged to address those challenges. Storage-centric snapshots provided by the underlying file or block storage are limited to crash-level consistency. Data service hooks can be added to freeze or unfreeze the data service layer during the snapshot process. Data service-centric approaches depend on the database-specific utilities that are available, such as MySQLdump, PG_dump and so on. An application-centric approach exercises all of these techniques in a coordinated manner.
When taking an application-centric approach, it’s important to remember that an application may cross different domains, and that throughout the application lifecycle, workloads will scale up and down.There are also different types of targets, including object storage and vendor targets. In other words, there are many moving parts, and the data protection workflow can become very complex.
To address this complexity, Kasten by Veeam developed Kanister, an open-source project that enables you to backup and restore application data on Kubernetes. You can deploy Kanister as a Helm application into your Kubernetes cluster, to streamline data protection workflows.
How Does Kanister Work?
Kanister abstracts away tedious details of curating data protection workflows using a set of cohesive custom resource definition APIs. Implemented as a Kubernetes controller, Kanister works well with existing Kubernetes resources and is extensible via custom Blueprints and Kanister functions. Essentially, Kanister enables you to take application-level copies of Kubernetes data and move it to wherever it needs to go.
Kanister has API specifications for Blueprint, ActionSet and Profile:
- Blueprints consist of instructions that tell the Controller how to perform actions on a specific application.
- ActionSets instruct the Controller to run an action with a set of runtime parameters, with outputs captured in the status sub-resources.
- Profiles capture information about a location to store data operation artifacts.
It also provides a collection of built-in atomic functions to simplify your data protection workflows. These functions encapsulate common data protection operations that accept different input arguments to reflect your use cases. Their output values and logs are retained by the Controller for analysis and debugging purposes.
The Controller is deployed as part of the Kubernetes cluster, and the Blueprints focus on the application and the data services within it. The list of Blueprints to choose from has grown over the last several months, with community members contributing to Kanister.
During a backup operation, an ActionSet is used to trigger the Blueprint:
The Controller receives the instructions to execute the Blueprint and exports the artifacts to remote storage:
This process can be used to not only backup and recover applications, but for testing and cloning, or migrating data to another namespace, without having to write scripts.
See Kanister in Action
In a recent webinar, Software Engineer Ivan Sim provides a Demo of Kanister, and shows how to create and restore crash-consistenct EBS snapshots using Kanister’s CSI functions. He also demonstrates how Argo workflows can be used to perform multiple parallel snapshot operations simultaneously.
Listen to the full webinar and watch the demo here.
Want to give Kanister a try? Download it today at github.com/kanisterio/kanister. Your feedback and contributions are always welcome!
New to Kasten? Try it for free.