Kasten and VMware at KubeCon NA 2021

Transcription is available below video player.

Gaurav Rishi: 

All right, thank you very much for joining us. My name is Gaurav Rishi, I'm the VP of product here at Kasten by Veeam. But I'm really happy to have Dave Smith Uchida. He's from VMware and this is a really exciting talk from him about what Tanzu is, what data protection in the Tanzu ecosystem means. And also really excited to talk about the collaboration that Kasten by Veeam and VMware is going to be doing in the context of open source and Astrolabe and Kanister. So hopefully by the end of this talk, you'll know what those mean and how we are working together. So thank you. And Dave, onto you.

Dave Smith Uchida:    

Great. Thanks, Gaurav. Yeah, so I'm Dave Smith Uchida, I'm the Tanzu Data Protection Architect. I'm also the Velero Architect at VMware. So Tanzu is kind of a big umbrella for all things Kubernetes at VMware. And that includes both our Tanzu Kubernetes grid, which are our Kubernetes distributions, but also other things like Tanzu data services, which are databases and other data services running in Kubernetes and supplied from us. We have things like Tanzu Mission Control, which is our multi-cloud Kubernetes controller and manager system. So Tanzu covers our Kubernetes ecosystem and it's more than just our Tanzu Kubernetes grid distributions.

So what are we trying to achieve in terms of Tanzu? And it's really data protection should just work. We want this to be something that you don't have to worry about. And part of what we're doing with Tanzu is trying to offer you a unified experience across different clouds. So it's not just Kubernetes running on top of vSphere. Tanzu's also Kubernetes running on AWS, running in Azure, but we want to give you a simplified and consistent experience across all these different clouds. And we want to extend that to data protection so that you can get a consistent data protection experience, not just in how you run it, but also in making sure that everything works right, so that no matter which data protection vendor you choose, you're going to get a basic level of protection that will work every day.

And we want to do this in an open way so that not only will Tanzu benefit from it, but also other Kubernetes distributions. And this is key both for us, because TMC handles non-Tanzu Kubernetes, and also for the data protection partners who need to be able to protect everything everywhere. So in terms of Tanzu, what are components that we're trying to protect? One thing is developer apps that are developed using Tanzu application services. This is a mechanism we have for building applications, deploying them and running them.

And so if you are developer and you're building a new app, we want you to be able to protect it. Same thing for data that's been stored in Tanzu data services. If you're running your workload, if you just made up a Kubernetes thing and you're running it in one of our TKG clusters, we want to be able to protect that. And if you purchased an application from a third party that's running, we want that to work as well. And then we have our infrastructure, things like the management cluster for TKG, or Supervisor cluster on vSphere. We want those to all be protected as well.

So in putting together this strategy, we looked at it from a number of personas. We have CTO, backup admin, an app developer, and a Tanzu/Kubernetes admin. We'll go into that a little bit, what each of these are. So as a CTO, you put your CTO hat on, and you're looking at the big picture and you don't want to be worried about data protection. You want to say that this is something I'm going to get, and I don't have to make decisions about which technologies I deploy or which applications I deploy because my data protection will or will not work with those things. So we want to have consistency of data protection because our experience has been it's usually the last thing that people figure out because they get all the POCs and everything done, they go, "Oh yeah, by the way, let's do data protection." And by the time you get to that point, you've actually made a lot of decisions. So we don't want it to be like, "Oh, well no, that's not going to work," or, "You have to, you do it this way."

And then as a CTO, you have to worry about things like compliance. Are you satisfying all of your regulatory requirements, HIPAA or Sarbanes Oxley. And those are things you may have a data protection vendor that does that very well for you, and you'd like to be able to use that with Tanzu.

We'd like to be able to move applications between clouds. So that, for example, if you had something running in AWS and you have a failure there, you can bring it up in Azure, or you can bring it up on on-prem or vice versa.

And then, for somebody who's been running an IT infrastructure for a while, we already have a lot of stuff that works for non-Kubernetes environments. We want to be able to continue to get the same level, there's an expectation that this data recovery and disaster recovery is going to work in your Kubernetes environment.

So then the backup admin is often a persona that we don't think about from the Kubernetes point of view. And backup admins may already be there, they're responsible for protecting, backing up data already and they want to move on to take in these new Kubernetes workloads. And so we want them to be able to do this with the same enterprise data protection features. We want them to be able to use the solutions that they've already chosen so that they don't have to come up with new workflows, new ways of protecting things for Kubernetes.

And then they need to be able to back up everything because that's usually the policy, to back everything up. And they also want to be able to define policies where you say, "Yeah, no matter what you as a developer or a DevOps team are doing, this is going to be the minimum standard for you."

Now, if you're actually developing apps or running apps, you may be an in-house developer or you may be somebody who develops turnkey software. What you'd like to be able to do is, with your Kubernetes app you'd like to be able to specify how data protection is going to work with it so that you can give a(n) answer that isn't a 15-step read me of how to back things up. You'd like to be able to build this into the application, have it automatically gets picked up, automatically used.

We'd like to be able to control the service interruption so if the application needs to get quiesce, we'd like to use the right mechanism to quiesce it. Sometimes applications may have mechanisms that are nonstop and require downtime, but we need to be able to trigger those properly. And I want to be able to back up, if many Kubernetes apps are composed of other apps, you have a Postgres database, you have a message queue. Well, you don't want to have to define how to back up Postgres again, how to back up the message queue again, you'd like to have this be something where you trigger your apps’ backup and it can trigger the automatic backup of these other things of these components and hold them together as a unit.

And then just as a[n] actual usage, you'd like to be able as a developer to say, "Hey, I can back up my cluster while I'm working. I can go ahead, I can do a restore without contacting the backup admin." And just the usage of it you want to be self-service.

And then as a Kubernetes admin, you want to be able to have this data protection across all of your Kubernetes apps and components. You don't want to have to worry about the details of a particular solution. You may be running in multiple clouds. There may be a solution that works for you very well in your on-prem that's different than the solution you're using in, say AWS, and different from a solution you're using in Azure. What you’d like to be able to do is give the Kubernetes admin the basic same control so they can say things like backup cluster and not have to worry about what the infrastructure and what the data protection is doing in these different environments

So we have a strategy. We think that the strategy covers all of this. We're covering every product, we're working with multiple data protection vendors, including Kasten. We're developing the ways to let app developers define how to backup their apps. We have self-services in the mix and then we really want to be able to provide the enterprise-grade protection to our Kubernetes users.

So this is what Tanzu data protection looks like right now. So we have Velero. I'm the project lead and architect for Velero, which is an open-source tool. And Velero is part of our strategy as the “batteries included” data protection option so that when you get Tanzu, Tanzu Community Edition, or TKG, or what have you, out of the box you have a data protection solution built into it. It's not a be all end all data protection. It's not an enterprise-grade product. It's very much Kubernetes-centric. It doesn't handle anything except Kubernetes. We intend to keep developing this and we tend to keep this as kind of our cutting edge of Kubernetes, keep up with the latest Kubernetes, and also to keep up with any new features we're coming out with Tanzu. But then we want to be able to share that infrastructure with other people. We also have, I mentioned earlier, Tanzu Mission Control, which is our multi-cluster control plan. Right now that has a data protection feature built into it, but it will only run with Velero at the moment. So that's something we want to change.

Now we have some other data protection vendors that have embedded Velero inside and that's helped them get up to speed on Kubernetes. And we have good friends like Kasten, who've been very early in this whole journey and we've worked together very closely on building this out, for example, getting everything working on Kubernetes on vSphere. And we want to ensure that we can continue on with that without forcing them to, for example, wrap (in) Velero. So right now, if we had a Velero feature, that doesn't really apply everywhere else. If we build some feature and we get it protected by Velero, doesn't apply everywhere else. So our current architecture basically backs up two things. It backs up the Kubernetes metadata. It also backs up the volume data. So Velero, in our cases, the Kubernetes serializer, and we push things out, and then we can snapshot and backup volumes, but nothing else really at the moment. And Velero isn't designed to be an embedded solution. It's a backup application in its own right, so embedding it inside other products doesn't work very well.

I mean it works, but it's not ideal for anybody. And it holds us back in terms of being able to make changes and innovate in the product. So we want to set up something that can be embedded. Also, protecting complex applications is hard. So right now we have the ability in Velero to run, say execution hooks, but this is pretty primitive. It doesn't let you just have something work out of the box. And if we develop additional plugins, we develop more complex plugins… If we do that for Velero, then how do we share that with our other partners who aren't embedding Velero. And we want to get this basic data protection functionality everywhere.

 

So the next architecture that we're building out right now defines a couple new API layers. So one is kind of a Southbound API, which is something that we've taken to referring to Velero as being a backup orchestrator, with the orchestrator can call Southbound and say, "Hey, snapshot this protected entity." So protected entity might be something like a volume and it may sit on top of CSI snapshotting or go directly to things. The protected entity involves, or includes, data transports so it can get the data out of whatever we're snapshotting. But it doesn't have to be a volume, it might be a Postgres database, or Harbor, or container registry. And so these are Kubernetes clusters. And then Northbound, so that we can have kind of a common Kubernetes data protection surface, is this Kubernetes data protection API, which we're currently in the process of figuring out what we need in it. And we'll be defining that with our partners and with Kubernetes, with the community. And even in this architecture, these two APIs, adopting these are optional. You cannot adopt either one, and still work with Velero... not Velero, Tanzu and Kubernetes. But we think we want to offer some strong advantages to say, "Hey, this is something everybody wants to be part of."

So the next generation architecture starts to look more like this. So we have up at the top, we have things like Tanzu Mission Control, custom built customer operation, our Tanzu command line. And these are all things that we expect to sit on top of this data protection API so that we can do things like backup cluster. And now, with this, if we can get our data protection providers to implement this API, now I can say, "Hey, Tanzu Mission Control. I'm in this installation. We're installing Kasten. Let's go ahead and integrate that and let Tanzu Mission Control do things like, 'Yeah, we're going to apply policies across our entire fleet.'" Some of them may be running Velero only, some of them may be running Kasten, maybe running something else, but Tanzu Mission Control can now work across all of these.

Same with the customer automation. If you build in things that say, "Hey, back this stuff up every night," you can build that. And for basic stuff, you set it on top of this data protection API, and you should be able to swap in your provider. So, part of our strategy with Velero is it's this batteries included, first, training wheels, get you going. And we want people to be able to have a smooth transition as they need more features, more scale so that if they say, "Okay, well, Velero is great, but I've outgrown it. Now I want to move on to something that's bigger like Kasten. I need to have something integration with the rest of my data center, data protection like Veeam offers." We'd like for them not to have to go back and redo everything and relearn everything. They just want to have this seamless upgrade path.

The downward facing API is Astrolabe, and that's a project we've been working on for a while. We actually use it inside of Velero and the Velero plugin for vSphere. And we're looking to increase what it can do, and start fitting more things into the framework so that now we don't just backup and snapshot volumes, but we also snapshot and backup databases that we might provide through data services, applications that developers have built without the user having to divide things like execution hooks. And many people are using Velero for its Kubernetes serialization. So really it's the part where it backs up a Kubernetes cluster. And we want to break that out and make that something that's callable by itself without having to wrapper Velero anymore, and then make this infrastructure available, make it something that we can then rely on in any Kubernetes cluster or install very easily.

So now, with Kasten, we're starting to look at how do we integrate, from an open source point of view, Kasten's technology and Astrolabe and Velero, and this is what is motivating us. So we've got more entities that need protection. These things are going across multiple Kubernetes clusters. Applications are really what you want to backup, but defining the application has become much harder. It's something where in the old vSphere days, we would basically say VM equals application, and you put all your stuff in the VM, you snapshot it, you back it up, and you're good. But with modern applications, you can't even define it as being a single Kubernetes namespace or a single Kubernetes cluster. It gets bigger. So we need an extensible way to define applications and workflows for data protection.

So we're looking at how we combine these two pieces of technology. So we're building out Astrolabe, which is kind of a general purpose framework for data protection snapshotting. And then we have Kanister, which is a way to describe applications and describe how to backup things in a configurable, no-code way. And earlier I talked about things like protected entities. So protected entities are just things that we can snapshot. Some of these may be very heavyweight, or at least the backing to them may be very heavyweight. If we do a volume snapshot, there's a huge amount of machinery underneath of that that actually does the snapshot. In the same way, say we want to trigger the application-specific backup of a database, this may have a lot of very heavyweight stuff. But a lot of times what you want is you just want to be able to describe the application, but the sequencing is complex.

The way that you need to set it up is complex. And this is where Kanister really fits in. And so if we are working together now to make Kanister fit in as a protected entity, so you can say, "Yeah, I've defined this thing with Kanister, and now I can make it a protected entity. And I can snapshot this through the Astrolabe APIs. And now we can use Kanister with Velero. We can use Kanister with non Velero applications, if they're using the Astrolabe APIs. And we can also expose things to Kanister, where we can expose things like databases and so forth, other protected entities into Kanister so that Kanister can say, “Yes, my application consists of my YAMLs. There's a Postgres database in here. There's a Redis database in here, and I've got a file system." So I don't want to like write a big, huge piece of Go code to put all this together. We can build that in Kanister, but we can make the Postgres database. If that's exposing itself as a protect identity and Redis is exposing itself as a protected identity, then we can go ahead and glue it together using Kanister, but then expose it as one thing outside.

So, yeah, so that's where we're trying to go. We're just starting on this, but we're very, very excited about doing this and it's really going to be an open source driven and community driven effort so that we'll be doing this as part of our open source projects and making this available to everybody. And then we can start pulling all this stuff together. And I think it's going to be very, very exciting. So this is kind of how we're thinking. If we dive down into the technical flow, this is kind of how we're thinking about it.

So in terms of Astrolabe, Astrolabe thinks of things as protected entities and there's a tree or a graph of protected entities, so that we can say something like a cluster has, these are the protected entities are inside of it. So when we say to the cluster, "Hey, snapshot yourself." It's going to walk across its protected entities. And it's going to say, "Oh. Hey. Here we have a Postgres database. Let's call the snapshot API. Here we have a persistent volume. Let's call the snapshot API on it." Here we have something that's defined by a Kanister blueprint, let's call the protected entity, which will then trigger out an action set and triggers your, trigger the Kanister controller to execute the Kanister logic and then possibly it can call existing Kanister workflows. It may also call back into Astrolabe and come back in again. And we'll be able to not only trigger everything from the top, we can also describe the whole tree. And then when it's time to move the data, we can look at the tree and we can say, "Hey, let's move all these things over."

So between these two, we feel that our data protection strategy is going to fulfill all of the needs we outlined earlier. So covering everything that's part of Astrolabe is making all that available, making that work, being able to use different data protection vendors. This is something where both the Kubernetes data protection API that we're proposing and the Astrolabe API are things that give us this ability to have a common, least common denominator across all the data protection vendors. And actually having Kanister being available everywhere, this is going to be a big plus to making a lot of these things a lot easier to do. Astrolabe again is a way for our app developers to define. Kanister is a big part of this. Self-service with the Kubernetes data protection API, you should be able to write CRs and say, "Backup my cluster, restore my cluster." And just be able to use those directly. Migration, Astrolabe tries to give us kind of a common cross cloud mechanism and cross app mechanism to move things around.

And then really being able to bring in all of our third-party vendors and have them working at the same level. At least with the same basic level with everything allows us to really say we've got enterprise grade data protection and it doesn't matter who your data protection vendor is. You want this? Yes, it's going to work. Is it faster because you bought X, or Y, or Z? Maybe. Do you have (a) better compliance monitoring? Maybe. These are all areas where we expect there to be a lot of competition and room for innovation. But the simple, basic thing of are you able to back up your application, should work with every vendor. So if we have any questions, happy to take them now, and these are the open source projects on GitHub that I've been talking about.

Gaurav Rishi: 

Thanks. Thank you, Dave. That was actually quite a good talk. You started wide, and then you went straight deep in. But just to maybe summarize and test my own understanding out here, I mean, Tanzu is a broad portfolio. It has build, run, and operate components out there. And to me it sounded like the strategy, especially with the CTOs that you talked about, is to keep it an open platform, which is not only open but is actually cross cloud. Am I summarizing that right, at least at the top level?

Dave Smith Uchida:    

Yeah. That's very true. I mean, Tanzu is, we've a huge contributed open source at this point, and we're a big believer in openness. And we're aligning ourselves with the upstream Kubernetes distributions, but at the same time, people want a vendor. They want somebody to say, "Hey, I'm going to make sure this works for you. When you have a problem, call us." We're not going to point fingers at the other guys. So, that's our goal, I think, with Tanzu.

Gaurav Rishi: 

All right. And so now just drilling down into the data protection strategy that you talked about. Like you said, things like Velero got here with the batteries included running start, but was not built for being embedded inside. And that's led us to the next generation of APIs that you talked about to open it up. Can you maybe, at a broad level, talk about the kind of benefits the community can expect and the customers can expect as Astrolabe and the data protection APIs come into being?

Dave Smith Uchida:    

Sure. So very much what we want to see is well, better backups. And we can do this by being able to trigger application specific mechanisms. So that's part of the protected entity idea is that it's a simplified API for data protection. But you may, right now we're doing everything by, say, snapshotting volumes. And that works, but is that the best way to backup, say, a Kassandra? Probably not. But, right now you've got the choice of either say backing up all your volumes or running a Kassandra backup separately. So we'd like to unify all these things together because our applications are now lots of different pieces. Kassandra, isn't an application other than as a piece of code, nobody runs Kassandra and says, "Oh yeah, my customers want Kassandra." Well, maybe some people do, but not very many. Most people are like, "Well, I want to build an application that provides some service to people. Kassandra's a component of that, but I've got other components that need to be pulled together."

So I think that's where we'll see, that's what Astrolabe is trying to do, is let us build out a better strategy for backing up these complex apps. And then the data protection APIs, the goal here is that as a Kubernetes user, you can be assured of the same basic level of commands wherever you go, so that you can say “backup cluster” and this will happen no matter which cloud you're in and no matter which data protection vendor you're using.

Gaurav Rishi: 

Right. Well, that's cool. Just to switch gears and talk a little bit about our partnership here. So of course Veeam and VMware, I think their partnership is something that is almost a textbook chapter about how well it's done. And I think I look at Kubernetes as a new front where we extend the partnership, not only because of the early running start Kasten had, but now I think with Kasten being a part of Veeam, and VMware coming together, sounds like there's a lot of room ahead. So maybe with that sort of lens on, if you can talk a little bit about the kind of partnership that you expect us to develop both on the open-source Astrolabe plus Kanister side, and then maybe looking a little ahead. I'm asking you to look at a little bit at the crystal ball. So apologies for that.

Dave Smith Uchida:    

Yeah. There's a lot of room for us to do things together. So from the VMware side, we don't really view Velero as competing with our data protection vendors. But what we do need to be able to do is be out in front and when somebody asks us, "Hey, can you protect this?" We'll at least be able to say, "Yeah, we deal with Velero." Ideally, we've got everybody moving at the same pace. We've been working with Kasten pre-Veeam and Veeam, and I think that bringing the two together is a big strength and it's our strategy to work very closely with our data protection partners. Now, traditionally, we've done this in front of a closed sourced way, and this has been, the old vSphere is like, "Hey, we have these APIs," and, we'll do all these things in proprietary manner.

And it's a big shift for us to start looking at this in an open-source way, a community driven way. And it's think that's really healthy. And I think it's really different because vSphere is not necessarily a community. I mean, there's definitely a community of people who use it, but you don't get the same ability to give input. You can't show up and say, "Hey, I've got some code that I want you to put in and it does this feature." So I think that's a big thing. And being able to collaborate at the code level, we can actually share code between companies as open source and get everybody, because there's strong advantages, I think, to Kasten having people adopt Kanister, and there's strong advantages for everybody adopting Kanister. And there's strong advantages for us and getting this level playing field for everybody.

Gaurav Rishi: 

No, I think well said. Look, I think all of us here are quite excited about Astrolabe and Kanister coming together, but the larger strategic partnership like you alluded to. And so I think I see a long runway ahead for both of us, and it's going to be a nice smooth flight and thank you for coming over to the KubeCon booth. And, it's almost good to be back. Things stretching almost back to normal, despite us being safe with these masks, which we still need. But thanks. Thanks, Dave. This was excellent.

Dave Smith Uchida:    

Yeah. Thank you so much. And thank you for so much for having me. So that's Tanzu data protection strategy.

Gaurav Rishi: 

All right. That's perfect.