Kasten K10 by Veeam is the market’s leading cloud-native backup and recovery solution for Kubernetes, providing enterprise operations teams an easy-to-use, scalable and secure system for backup/restore, disaster recovery, and application mobility. While Kasten K10 wasn’t designed to integrate natively with third-party monitoring solutions through standard monitoring protocols – SNMP, email or external in-guest agents, it can leverage Prometheus to integrate with third-party monitoring solutions through the exposure of Prometheus metrics.
One such solution is Zabbix, an enterprise-class open-source monitoring tool used to monitor networks and applications. In this blog post, we’ll demonstrate how to expose metrics from a Kubernetes environment collected by Prometheus leveraging an integration between Zabbix and Kasten K10.
Two Options for Exposing Prometheus Metrics
There are two ways to expose metrics from Prometheus:
- The easiest option is to expose Prometheus as a service to access metrics through its catalogs:
- The second option is Federation. Federation is used to either set up scalable Prometheus monitoring or to pull related metrics from one service's Prometheus instance into another. Using the second method involves scraping data from Kasten K10 Prometheus monitoring.
What Are the Prerequisites?
You will need Zabbix 4.2 or higher to begin. Additionally, the following requirements must be met:
Using Prometheus as a Service
For this option, you will need to expose at least one Kasten Service (catalog-svc) to the Zabbix host for the purpose of gathering metrics. The Zabbix host will access the exposed service to gather the required metrics from the Prometheus instance.
To leverage federation, you must have a Prometheus instance installed, and it should be able to communicate with the Kasten K10 dashboard. It should also be reachable from Zabbix using the Zabbix Admin credentials.
(Note: Zabbix version 5.2.16 does not support using Prometheus as a Service.)
How to Expose Kasten K10’s Prometheus Metrics
In this section, we’ll explain how to get started using either option described above: using Prometheus as a Service or Federation. Then we’ll explain how to proceed with the Zabbix integration.
Exposing Catalog Services on a Kubernetes Cluster
To begin, check to make sure that the Kubernetes service “catalog-svc” is active and present.
Here are a couple of examples which we used in our lab (vanilla Kubernetes cluster, only one server, master untainted):
- Exposing the service as NodePort (via node IP):
kubectl expose svc catalog-svc --name catalog-zabbix --type=NodePort -n kasten-io
The server with the Zabbix agent must be able to reach the target machine on the ports highlighted by the command:
kubectl get svc -n kasten-io | grep <exposed_svc
- Exposing the service as LoadBalancer:
kubectl expose svc catalog-svc --name catalog-zabbix --type=LoadBalancer -n kasten-io
If possible, check from the browser from the machine with the Zabbix agent to make sure that the metrics of interest are displayed here.
If all of the prerequisites are ready on the Kubernetes side, you can move on to configuring Zabbix (see below).
Setting up Prometheus Federation
First, set up the Prometheus instance. Then, configure the job to scrape data from Kasten K10:
- job_name: k10
- ‘<FQDN for Kasten K10 Prometheus endpoint>’
Next, check that data is scraped correctly from Kasten K10. From the dashboard, do a query (example: jobs_completed) and check to see whether the data is displayed correctly.
From the Status dropdown, choose the Targets dashboard.
If the Prometheus instance is able to pull data correctly, you can proceed with configuring Zabbix.
Depending on what version of Zabbix you have installed, there may be some mismatches in the commands provided below. If you run into any problems, consult the Zabbix documentation to find alternatives.
In the Type of Information field, enter “Text.” Then click “Update.”
Next, create a new Item for the real metrics monitoring:
The new item must have the following characteristics:
- Type: Dependent Item
- Key: catalog-backupfailed
- Master-item: <the parent item just created>
- Type of information: Numeric (unsigned)
Next, click on the “Preprocessing” tab and enter the following information:
- In Row 1, enter Prometheus to JSON
(min version Zabbix 4.2)
- In Row 2, insert JSONPath to get the value $..value.sum()
To avoid errors in case of an empty array, flag the “Custom on fail” option. Then, select “Set value to” and enter 0 as the value. Then click Save.
Now you can create all the items that will be the real objects for monitoring. For any other monitoring inputs, reference Kasten K10’s documentation.
Other values of interest can be obtained from the Grafana dashboard within Kasten K10. By clicking on “Edit” under each value (e.g. Backup Failed), you will find the Prometheus query used to obtain that value. The purpose is to replicate it in the JSON + JSONPath combination.
Note: You can use incremental-only monitoring, which is a counter, to monitor any growth in the number of failed jobs.
More on Monitoring
Monitor the "Latest data" to see if new values are present. If the fields are blank after several minutes, proceed to troubleshooting.
In this section, we’ll cover how to set up Triggers, (For more information, reference this documentation).
Below is the interface for setting up Triggers:
First, enter a descriptive name for the trigger. Under “Problem Expression,” compare the value of the previous X number of hours with the current value. In the case of a failed backup being set as a Trigger, the value must not exceed 0:
(Zabbix version 5.4)
last(/<zabbix-host>/catalog-backup-failed)- first(/<zabbix-host>/catalog-backup-failed,4h) > 0
(Zabbix version 5.2)
last(/<zabbix-host>/catalog-backup-failed)- last(/k10-poller/catalog-backup-failed,#1:now-4h) > 0
Note: Try using the “Add” button for help.
For “Recovery Expression,” compare the value of the previous X hours with the current value in order to automatically resolve an incident if the value returns to normal:
(Zabbix version 5.4)
last(/<zabbix-host>/catalog-backup-failed)- first(/<zabbix-host>/catalog-backup-failed,4h) <= 0
(Zabbix version 5.2)
last(/<zabbix-host>/catalog-backup-failed) - last(/k10-poller/catalog-backup-failed,#1:now-4h) <= 0
If you want to leave the option to flag a Trigger, check the box next to “Allow Manual Close.
Once you’ve completed these fields, click “Save.”
Troubleshooting Item Problems
Should you run into problems configuring Items, Clock on “Configuration” and select “Hosts.”
Click on “Items” and check if there are any exclamation points displayed. If there are, and the problems are reported with the JSONPath section, follow these steps:
- Modify the Type of Information of the item in Text.
- Leave only the option “Prometheus to JSON” under Preprocessing, and remove the row “JSONPath”
- Click the “Latest Data” tab to analyze the data.
- Copy and paste the output on https://jsonpath.herokuapp.com/.
- Test some JSONPaths to determine the correct one. If the JSONPath is correct, you can test the configuration again.
Thanks to Kasten K10’s flexibility, it’s easy to retrieve and visualize key metrics in your Kubernetes environment, so you can optimize operations and ensure the availability of your Kubernetes data and applications. To learn more about configuring Zabbix for Kasten K10, download the whitepaper.
Don’t have Kasten K10 installed? Try it for free today.