A Kubernetes cluster is a complex entity composed of multiple namespaces, containers, pods, services, and deployments that exist in a continuously changing state. Let’s explore how we can use Prometheus and Grafana for monitoring and logging Kubernetes clusters.

Introduction

 

To ensure efficient performance of Kubernetes (also called K8s) clusters in production, cluster administrators must have real-time understanding of diverse metrics that tell about various issues such as memory or storage shortage, node, and network health, application errors, etc. Logging and monitoring are usually used for this, but new DevOps practices require a more integrated approach embodied in the concept of observability — referring to the ability to infer the system’s internal state from external outputs.

Observability combines monitoring, alerting, and logging with metrics visualization and analysis that together allow administrators to get instant insights into the real-time performance of their clusters and applications and then to take timely and informed action. A properly implemented observability system provides DevOps with granular insights that can be used to debug and heal complex systems. 

In this tutorial, we’ll discuss how to implement the observability of Kubernetes clusters and applications using Prometheus, Elasticsearch, Metricbeat, and Grafana. By the end of this tutorial, you’ll know how to ship your cluster and application metrics from Prometheus to Elasticsearch and observe them using powerful Grafana dashboards. 

Before diving into the tutorial, let’s briefly discuss the components of the stack we are going to use.

Elasticsearch

Elasticsearch is a distributed database and search engine powered by Lucene free-text search library. Elasticsearch (also called ES) allows deploying Lucene indexes in highly available and distributed clusters and performing various aggregation and analysis tasks on the data stored in them. 

Prometheus

Prometheus is an open source monitoring and alerting tool with built-in support for containers and Kubernetes. It can be configured to scrape metrics from Kubernetes nodes, containers, pods, services and user applications running in Kubernetes. To allow Prometheus to scrape metrics from applications, developers just need to expose metrics via HTTP at the /metrics endpoint. Prometheus has a powerful PromQL query language and integrations with major databases and metrics collector agents including Elasticsearch and Metricbeat.

Metricbeat

Metricbeat is a lightweight data shipper that can periodically collect metrics from the system, applications, and services running in various environments (cloud or on-premises). It can be configured to work with a variety of metric sources like Prometheus to send metrics directly to Elasticsearch.

Grafana

Grafana is a powerful data visualization and analytics tool that also supports alerting and notifications and has integrations with major time-series databases (Prometheus, InfluxDB), Elasticsearch, SQL databases, and cloud monitoring services, etc. It ships with great metrics aggregations and powerful dashboards to make it easy to enable observability for your clusters and applications. 

Fans of the Elastic Stack might be wondering why we aren’t using Kibana for visualization. This is a fair question, and we are certainly fans of it. Grafana’s team, though, has committed to keeping the tool as an Apache2 licensed open source package forever, whereas Elastic’s team has moved in opposite directions. Ergo, we shall use the most flexible license.

Tutorial 

To enable observability for our Kubernetes cluster and applications, we’ll need to create the following workflow:

  • Prometheus is configured to scrape metrics from Kubernetes API server, kubelet, kube-state-metrics,cAdvisor, and other Kubernetes internal components to get data about cluster health, nodes, pods, endpoints, etc. Prometheus can be also configured to scrape metrics from user applications running in Kubernetes via the /metrics endpoint.
  • Metricbeat’s Prometheus module is used to collect metrics scraped by Prometheus at the Prometheus /federate endpoint and then ship it to Elasticsearch. 
  • Grafana has Prometheus as a default data source. By installing Prometheus and Grafana using Prometheus Operator, we also get powerful metric dashboards and Prometheus cluster alerts provided by the kubernetes-mixin project. 
  • Grafana supports Elasticsearch as a data source so we can also create visualizations and dashboards for data stored in Elasticsearch.  

To reproduce examples in this tutorial you’ll need the following dependencies:

Deploying Elasticsearch


If you already have Elasticsearch deployed in your environment, you can skip this section. 

For this tutorial, we deploy Elasticsearch on Kubernetes. To achieve this, we’ll need the Elasticsearch Deployment manifest and ClusterIP service to connect Elasticsearch to Metricbeat and Grafana. We use the following ES single-node deployment spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
  namespace: monitoring 
  labels:
    app: es
spec:
  replicas: 1
  selector:
    matchLabels:
      app: es
  template:
    metadata:
      labels:
        app: es
    spec:
      initContainers:
      - name: set-vm-max-map-count
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ['sysctl', '-w', 'vm.max_map_count=262144']
        securityContext:
          privileged: true
      - name: volume-mount-hack
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["sh", "-c", "chown -R 1000:100 /usr/share/elasticsearch/data"]
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:6.7.2
        imagePullPolicy: IfNotPresent
        env:
        - name: ES_JAVA_OPTS
          value: -Xms1024m -Xmx1024m
        ports:
        - containerPort: 9200
        resources:
          limits:
            memory: "2147483648"
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      volumes:
      - name: data
        emptyDir: {}

The Deployment was created with emptyDir volume type that will exist as long as the ES deployment exists. If you need persistence, make sure to use local PersistentVolumes or persistent volumes of supported cloud providers.

We’ll also need to create a ClusterIP service for the ES deployment to expose it to the rest of the cluster. 

apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: monitoring
  labels:
    app: es
spec:
  selector:
    app: es
  type: LoadBalancer
  ports:
  - name: http
    port: 9200
    protocol: TCP
    targetPort: 9200

Now as Elasticsearch is deployed to the cluster, let’s create Prometheus and Grafana deployments.

Deploying Prometheus and Grafana

We are going to deploy Prometheus with the CoreOS Prometheus Operator via Helm package manager. Prometheus Operator is a runtime that ships with all required configuration and defaults for deploying Prometheus in Kubernetes. With the Prometheus Operator, we get all basic scrape configurations for retrieving Kubernetes-native metrics, Grafana installation with preconfigured Prometheus dashboards, AlertManager deployment, and many other useful tools all right out of the box. Once Prometheus is installed via the Operator, we can easily manage its lifecycle in the Kubernetes-native way. 

To install the Prometheus Operator, we used Helm v3.3.0 and a Helm chart from the community-curated stable repository. We’ll need to do the following to cleanly install Prometheus using Helm:

Add Helm stable repository if it doesn’t exist:

helm repo add stable https://kubernetes-charts.storage.googleapis.com/

Helm v.3.3.0 does not require the tiller server, so we can install the Prometheus Operator right away:

helm install prom-operator  stable/prometheus-operator –namespace=monitoring

Once the Operator is installed, you should have the following pods running in the “monitoring” namespace:

$ kubectl get pods

NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prom-operator-prometheus-o-alertmanager-0   2/2     Running   2          5h30m
elasticsearch-6f4c587475-sdpc2                           1/1     Running   1          22h
prom-operator-grafana-684965779-2fjhr                    2/2     Running   2          5h31m
prom-operator-kube-state-metrics-756d8d6849-n949p        1/1     Running   1          5h31m
prom-operator-prometheus-node-exporter-66z2m             1/1     Running   1          5h31m
prom-operator-prometheus-o-operator-847cb84c6d-mqsnv     2/2     Running   2          5h31m
prometheus-prom-operator-prometheus-o-prometheus-0       3/3     Running   4          5h30m

After Prometheus is installed, we can easily access it in the browser using port-forwarding:

kubectl port-forward -n monitoring prometheus-prom-operator-prometheus-o-prometheus-0 9090

Now we can access the Prometheus dashboard at localhost:9090. Inside the dashboard, we find a lot of targets automatically configured by the Prometheus Operator. By default, Prometheus is configured to scrape metrics from kubelet (K8s nodes), kube-scheduler, kube-metrics-server, and other important internal components. The scope of targets is quite comprehensive to enable full observability of your Kubernetes cluster. 

Also, Prometheus Operator automatically configures AlertManager with important system alerts that can help administrators instantly react to the issues in the cluster. For example, in the image below we see node alerts that fire when the Kubernetes node is out of memory, inactive, or when it experiences network connection issues. 

We can also manage alerts (create/delete) by accessing Alert Manager UI:

kubectl port-forward -n monitoring alertmanager-prom-operator-prometheus-o-alertmanager-0  9093

In addition, Prometheus Operator automatically installed Grafana with preconfigured Prometheus dashboards and visualizations. To access Grafana, we also need to forward ports to the localhost:

kubectl port-forward -n monitoring prom-operator-grafana-684965779-2fjhr 3000

The default login and password for Grafana are “admin” and  “prom-operator.” Inside Grafana you can find over 30 preconfigured Kubernetes metrics with great metrics visualizations of Kubernetes cluster compute resources, network, kubelet, kube-proxy statistics, and many other useful metrics. The dashboards used by the Prometheus Operator are from the kubernetes-mixin project. 

For example, in the image above you can see the dashboard displaying metrics related to CPU resource utilization in the Kubernetes cluster. 

Storing Metrics

Prometheus includes a local on-disk time series database, but we’re more interested in separating Kubernetes observability from persisting metrics. Elasticsearch is a natural solution for storing and analyzing metrics data because it includes powerful analytics and metrics aggregations, index lifecycle management tools, and ensures high availability and scalability of metrics data out of the box. 

We can easily ship Prometheus metrics to Elasticsearch using Metricbeat’s Prometheus module. As we’ve already mentioned, Metricbeat is a lightweight shipper developed by Elastic that is easily integrable with the ELK stack. Metricbeat has a standalone Kubernetes module that can be used as an alternative to Prometheus. 

First, let’s configure Metricbeat’s Prometheus module to get all metrics scraped by Prometheus. To achieve this, we’ll use the Prometheus Federation API which is powerful in bringing all datasets together and enabling disaster recovery to the remote data store.

We’ll have two ConfigMaps for the Metricbeat deployment. The first one is a general configuration where we specify Elasticsearch endpoint and  ES credentials if applicable, as well as ES index and shard settings. 

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-config
  namespace: monitoring 
  labels:
    k8s-app: metricbeat
data:
  metricbeat.yml: |-
    metricbeat.config.modules:
    # Mounted `metricbeat-daemonset-modules` configmap:
      path: ${path.config}/modules.d/*.yml
    # Reload module configs as they change:
      reload.enabled: false
    processors:
    - add_cloud_metadata:
    output.elasticsearch:
      hosts: ["10.104.231.221:9200"] 
      setup.dashboards.enabled: true
      setup.template.settings:
        index.number_of_shards: 5
        index.number_of_replicas: 1
        index.number_of_routing_shards: 30

The second ConfigMap set ups the Prometheus Module: 

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-modules
  namespace: monitoring
  labels:
    k8s-app: metricbeat
data:
  kubernetes.yml: |-
    - module: prometheus
      period: 10s
      hosts: ["10.108.222.135:8080"]
      metrics_path: '/federate'
      query:
        'match[]': '{__name__!=""}'

This configuration specifies the /federate endpoint exposed by the Prometheus server. It allows aggregating all Prometheus metrics sets and shipping them to Elasticsearch. The query used in this config is set to get all metrics collected by the Prometheus server. 

Also, take note of the hosts property above. Here, you’ll need to specify a Prometheus cluster IP or DNS name in the Kubernetes cluster to allow Metricbeat to connect to the Prometheus server on it.

After deploying ConfigMap with kubectl, let’s create a Metricbeat DaemonSet. We used the following basic configuration:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: metricbeat
  namespace: monitoring
  labels:
    k8s-app: metricbeat
spec:
  selector:
    matchLabels:
      k8s-app: metricbeat
  template:
    metadata:
      labels:
        k8s-app: metricbeat
    spec:
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: metricbeat
        image: docker.elastic.co/beats/metricbeat:6.7.2
        args: [
          "-c", "/etc/metricbeat.yml",
          "-e",
          "-system.hostfs=/hostfs",
        ]
        securityContext:
          runAsUser: 0
        volumeMounts:
        - name: config
          mountPath: /etc/metricbeat.yml
          readOnly: true
          subPath: metricbeat.yml
        - name: modules
          mountPath: /usr/share/metricbeat/modules.d/*.yml
          readOnly: true
      volumes:
      - name: config
        configMap:
          defaultMode: 0600
          name: metricbeat-config
      - name: modules
        configMap:
          defaultMode: 0600
          name: metricbeat-modules
      - name: data
        emptyDir: {}

 

Upon installation, Metricbeat will connect to Prometheus /federate endpoint, collect and process Prometheus metrics, and send that to the Metricbeat index in Elasticsearch.

Data ingested from Prometheus can then be accessed in any data visualization and observability tools like Kibana or Grafana. For example, if for some reason Prometheus data were lost, we can add the Elasticsearch source in Prometheus to retrieve collected metrics. 

To add Elasticsearch as a new data source, select the “Data Source” tab in the Grafana navigation menu:

From there, select and configure the Elasticsearch data source:

Here you should enter the Elasticsearch endpoint, credentials (if needed), and select the index you want to retrieve (Metricbeat index in our case). After the data source is created, you can select ES as a source of metrics and create data visualizations and dashboards similar to ones provided by the kubernetes-mixin project. 

Conclusion

The cloud native ecosystem offers powerful tools that make it easy to enable observability for your Kubernetes clusters and applications without writing a single line of code. For example, Prometheus Operator can automatically configure Prometheus to collect metrics from key Kubernetes internal components and access them in a clearly defined format that is understandable by many other applications.

Similarly, one can easily ship metrics from Prometheus to Elasticsearch using Metricbeat’s Prometheus module and display and analyze them in Grafana.

Powerful integrations between different applications in the cloud native landscape mean that you can customize your monitoring and logging pipeline in a way that suits your DevOps team’s  needs and the application stack you are using.

Store and Analyze your Logs and Metrics with Qbox Hosted Elasticsearch Clusters

Qbox hosted Elasticsearch provides powerful monitoring and logging features right out of the box. When deploying ES with Qbox you also get a strong security configuration, automatic backups, and the professional 24/7 support from the team of qualified CKA and CKAD engineers. 

Not yet enjoying the benefits of a hosted ELK stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment with Qbox.

Contact Us

The tech support team at Qbox.io has earned the reputation of being second to none in our industry. We are here for you, at any time, from anywhere. Just provide us with your name, your company’s name, and your contact info — and let us know how we can help.