Integrating Prometheus, InfluxDB and Grafana

Posted on Wed 29 September 2021 in Kubernetes

I've got a Kubernetes cluster prepared to be be integrated with Prometheus, i.e., all relevant information is exposed with /metrics and scraped by a Prometheus instance. I want to save the information long term and I've decided that Influx DB is the best option for that. In addition, I want to create dashboards using Grafana.

This would require moving the information from Prometheus to Influx DB. According to this, for Influx DB v1, this would be accomplished by using directly remote writes in Prometheus. However, Influx DB v2 doesn't allow this way of working. Instead, Telegraf has to be inserted in the middle. Therefore, the whole pipeline would be Prometheus 🠮 Telegraf 🠮 Influx DB 🠮 Grafana.

I'm using EKS to deploy the cluster, but these instructions should work with any other Kubernetes cluster. I'll only assume that kubectl is already configured to work with your cluster. We are going to deploy many services, so you may require new nodes in your cluster.

Helm

We'are going to use Helm for installing all the components, so first we install helm with these commands:

    curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3

    chmod 700 get_helm.sh

    ./get_helm.sh

Influx DB

To deploy Influx DB using Helm, we have to add the Influx DB helm repo and then install it with these commands:

helm repo add influxdata https://helm.influxdata.com/

helm upgrade --install my-influxdb influxdata/influxdb2

Run this command, per the instructions, to obtain the password and save it in your password manager:

echo $(kubectl get secret myinfluxdb-influxdb2-auth -o "jsonpath={.data['admin-password']}" --namespace default | base64 --decode)

Now, we are going to redirect the port for the Influx DB web console so that it can be accessed from outside. Run this command:

kubectl port-forward service/my-influxdb-influxdb2 8087:80 &> forward-influx.txt &

If you are using Visual Studio Code as I am, you should now forward port 8087 there as well.

Open in a web browser http://localhost:8087 and you should see the InfluxDB console. You can log in with username admin and the password for Influx DB obtained above.

We are going to use the Influx DB web console to create a token that will allow us latter to write data from Telegraf and to read it from Grafana. In the console, go to Data | Tokens | Generate Token | Read/Write Token. Select the permissions to write and read in the default bucket, give the token a name and save it. Next, select it to see it. Write it down, as we will need it later.

In case you need to check the logs of Influx DB, you can do it with this command:

kubectl logs -f --namespace default $(kubectl get pods --namespace default -l app.kubernetes.io/name=influxdb2 -o jsonpath='{ .items[0].metadata.name }')

Telegraf

Install Telegraf with this command:

helm upgrade --install my-telegraf influxdata/telegraf

Notice that, at least at the time of writing, the instructions given by helm for running an interactive shell and obtaining the logs are wrong because the labels are wrong. These would be the correct commands:

kubectl exec -i -t --namespace default $(kubectl get pods --namespace default -l app.kubernetes.io/name=telegraf -o jsonpath='{.items[0].metadata.name}') /bin/sh

kubectl logs -f --namespace default $(kubectl get pods --namespace default -l app.kubernetes.io/name=telegraf -o jsonpath='{ .items[0].metadata.name }')

You should see some errors in the log because the output configuration of Telegraf is wrong. We are going to fix it next.

We are going to change Telegraf's configuration, which can be carried out with the command to edit its configmap:

kubectl edit configmap/my-telegraf

Remove all this part:

    [[outputs.influxdb]]
      database = "telegraf"
      urls = [
        "http://influxdb.monitoring.svc:8086"
      ]

And write this in its place:

    [[inputs.http_listener_v2]]
    service_address = ":1234"
    path = "/receive"
    data_format = "prometheusremotewrite"

    [[outputs.influxdb_v2]]
    urls = ["http://my-influxdb-influxdb2:80"]
    token = "$INFLUX_TOKEN"
    organization = "influxdata"
    bucket = "default"

You have to exchange the $INFLUX_TOKEN with the token that you obtained before from the InfluxDB web console.

This prepares Telegraf to receive the information in port 1234 with the path /receive and forward the data to Influx DB.

In addition, increase the value of metric_buffer_limit to something like 50000.

In order to make Telegraf use this new configuration, save it and run this program that kills the pod and makes kubernetes create a new one with the new configuration:

kubectl delete pods -l app.kubernetes.io/name=telegraf

Check Telegraf's logs to see that there is no problem:

kubectl logs -f --namespace default $(kubectl get pods --namespace default -l app.kubernetes.io/name=telegraf -o jsonpath='{ .items[0].metadata.name }')

In addition, we have to create a service so that Telegraf can be reached at http://my-telegraph:1234 because by default it only exposes port 8125. Change the configuration of Telegraf's service with this command:

kubectl edit svc/my-telegraf

Add this in the ports section:

    - name: http-listener
      port: 1234
      protocol: TCP
      targetPort: 1234

Now, you should see with this command that Telegraf's service is also listening in port 1234:

kubectl get services

Prometheus

If you don't have Prometheus installed in your system yet, you can do it with these commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm install my-prometheus prometheus-community/prometheus

Edit Prometheus configuration with this command:

kubectl edit cm/my-prometheus-server

Add this line, at the same level as the global section, i.e., inside the prometheus.yml section, in order to make Prometheus write the data to Telegraf:

    remote_write:
    - url: "http://my-telegraf:1234/receive"

Prometheus should read the new configuration automatically once you save it. Run this command to see the Prometheus log:

kubectl logs -f --namespace default $(kubectl get pods --namespace default -l app=prometheus -l component=server -o jsonpath='{ .items[0].metadata.name }') prometheus-server

If everything is OK, you should be able to see in the Explore section of the Influx DB web console a prometheus_remote_write section with metrics about the cluster. If you want to plot the CPU utilization, for example, you can select the Script editor and use this query:

import "experimental/aggregate"
from(bucket: "default")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "prometheus_remote_write")
|> filter(fn: (r) => r["cpu"] == "total")
|> filter(fn: (r) => r["_field"] == "container_cpu_usage_seconds_total")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> aggregate.rate(every: 1m, unit: 1s)
|> yield(name: "mean")

Submit it and you should see some data.

Grafana

Install Grafana with these commands:

helm repo add grafana https://grafana.github.io/helm-charts

helm install my-grafana grafana/grafana --set sidecar.datasources.enabled=true --set sidecar.dashboards.enabled=true --set sidecar.datasources.label=grafana_datasource --set sidecar.dashboards.label=grafana_dashboard

Follow the instructions to get Grafana's password:

kubectl get secret --namespace default my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

In addition, forward port 3000 of Grafana so that you can access its web console:

export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=my-grafana" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace default port-forward $POD_NAME 3000 &> forward-grafana.txt &

Notice that I've changed the forward command so that it redirects its output. Forward the port also in Visual Studio Code if you are using it and open http://localhost:3000 to access Grafana console.

Log in with username admin and the password obtained above.

Now, we are going to add InfluxDB as a source for Grafana. In Grafana's web console, go to Data sources | Add your first data source and select InfluxDB. Select Flux in the Query Language to use the new syntax. Enter this information:

  • URL: http://my-influxdb-influxdb2
  • Uncheck the toggle in Basic Auth.
  • Organization: influxdata
  • Token: the one obtained from InfluxDB at the beginning. It's the same one used for allowing Telegraf writing in InfluxDB.

Click on Save and test.

Add a new Dashboard and a new panel. You can add the query given above to check that the connection between all elements works:

import "experimental/aggregate"
from(bucket: "default")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "prometheus_remote_write")
|> filter(fn: (r) => r["cpu"] == "total")
|> filter(fn: (r) => r["_field"] == "container_cpu_usage_seconds_total")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> aggregate.rate(every: 1m, unit: 1s)
|> yield(name: "mean")