Prometheus is a popular metric reporting framework. It utilizes a pull-based collection system where exporters expose endpoints and a centralized collection agent periodically requests metric values. This scheme has been validated by prolonged testing and scaled deployments.

However, while there are certainly “best practices” in defining metrics, the exporter API does nothing to safeguard you from inefficiencies. In our applications, this means extremely high memory pressure on metric exporters using the Prometheus golang client. Without a serviceable understanding of the underlying implementation it is difficult to understand the cause and more importantly how to avoid defining inefficient metrics.

Storing Metric Values in Bounded Memory

Prometheus supports a variety of metric definitions:

  • Counter: Track strictly increasing values like request count.
  • Gauge: Enable increasing and decreasing to track things like the number of active requests.
  • Histogram: Define buckets, each of which is a Counter, to track distributions of values like request latencies.
  • Summary: Use observation samples to compute quantiles over observations.

In the exporter, each of these metric definitions is designed to fit in bounded memory. This metric storage requirements do not scale with additional observations. Instead, the exporter maintains a bounded set of values for each metric and the collector annotates the reported metric values with various metadata on collection. For example, the timestamp of the metric observation.

The Counter and Gauge metrics are stored as a single 8-byte value. In the former this value must be strictly increasing and the later supports decrements as well.

A Histogram is defined using bucket intervals. For each interval a single Counter metric is stored. So recording an observation in the histogram is as simple as determining the bucket definition and incrementing the corresponding Counter.

A Summary metric requires a more complex data structure to provide quantile values. Basically, each instance maintains a collection of time-bounded data samples which each have staggered time offsets so that they range the “MaxAge” of the Summary. This is meant to emulate a sliding window without requiring timestamped metadata on each observation. Basically, the longest active sample is used to compute and report the quantile value. When the longest sample has exceeded the “MaxAge” the data is purged and sample collection starts new. As a result, this metric stores duplicates of each observation and therefore requires significantly more memory than other metrics.

Effect of Metric Labels on Memory Utilization

Labels allow specific values to be tied to a metric observation. For example, a Counter metric defined for tracking the number of gRPC requests could have a label for the gRPC method, namely “method”. Using this label, Prometheus can enable tracking observations on a per-method basis as well as total observations.

As mentioned previously, Prometheus stores each metric using bounded memory. However, when using labels on metrics each unique set of labels requires a separate “instance” of the metric internally. Using our gRPC example above, if we have method calls for “put”, “get”, and “delete” we will initialize a CounterVec metric with those labels and then call the GetMetricWith function with the label value for the method to retrieve the singular metric instance. Internally, each unique method requires a separate 8-byte value to track the count.

Metrics may be defined with multiple labels. However, as a general rule of thumb memory utilization scales with the number of possible unique label values. This is because, as previously described, each unique set of label values requires a separate metric instance internally. This can become problematic in scenarios where:

  • Labels have high-cardinality or are unbounded. Using labels with many potential values means the unique set of label values can be very large.
  • A metric is defined with a large number of labels. If each label has 5 potential values, then the number of unique combinations scales exponentially with the number of labels as 5^n. This gets very large very fast.

This memory utilization issue is especially important when using a Histogram or Summary metric as these have significantly higher base memory requirements than the Counter or Gauge metrics.

Benchmarking Memory Utilization

To loosely profile the memory utilization of Prometheus exporters over a variety of metrics we put together a small program that simply creates a labeled metric vector and records observations with a varying degree of unique label values. Specifically, we are testing our hypotheses that:

  1. Metrics without labels, or a small number of unique label values, are relatively cheap in terms of memory utilization.
  2. Memory utilization scales linearly with the number of unique label values.

The boilerplate code we use to benchmark is provided below. This example profiles the Counter metric specifically, however the other experiments only require updating the metric initialization and observation functions. From a high-level view this code (1) initializes a new metric vector defined with 3 labels, namely “foo”, “bar”, and “baz”; (2) records a collection of observations where each has a unique set of label values by declaring the “baz” label with an incrementing value; and (3) dumping the heap profile.

package main

import (
    "fmt"
    "os"
    "runtime"
    "runtime/pprof"

    "github.com/prometheus/client_golang/prometheus"
)

func main() {
    labels := []string{"foo", "bar", "baz"}
    labelValues := map[string]string{
        "foo": "foo",
        "bar": "bar",
    }

    // create and register metric vector
    metricVec := prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "test",
        },
        labels,
    )

    if err := prometheus.Register(metricVec); err != nil {
        panic(err)
    }

    // observe metrics
    for i := 0; i<100000; i++ {
        labelValues["baz"] = fmt.Sprintf("baz-%d", i)

        metric, err := metricVec.GetMetricWith(labelValues)
        if err != nil {
            panic(err.Error())
        }

        metric.Inc()
    }

    // flush heap profile
    f, err := os.Create("heap.prof")
    if err != nil {
        panic(fmt.Sprintf("could not create memory profile: ", err))
    }

    defer f.Close()
    runtime.GC()

    if err := pprof.WriteHeapProfile(f); err != nil {
        panic(fmt.Sprintf("could not write memory profile: ", err))
    }
}

Execution of this code produces a heap.prof file which can be parsed and queried using the pprof tool in go with the command go tool pprof heap.prof. An example of printing the top 5 nodes according the heap profile is shown below:

hamersaw@ragnarok:~/development/benchmarks/prometheus-metrics$ go tool pprof heap.prof
File: yogi
Type: inuse_space
Time: Jul 7, 2022 at 3:01am (CDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 5
Showing nodes accounting for 49.52MB, 93.39% of 53.03MB total
Showing top 5 nodes out of 28
      flat  flat%   sum%        cum   cum%
      22MB 41.49% 41.49%       22MB 41.49%  github.com/prometheus/client_golang/prometheus.MakeLabelPairs
   12.50MB 23.58% 65.07%    34.50MB 65.07%  github.com/prometheus/client_golang/prometheus.NewCounterVec.func1
    7.52MB 14.17% 79.24%    45.02MB 84.90%  github.com/prometheus/client_golang/prometheus.(*metricMap).getOrCreateMetricWithLabels
    4.50MB  8.49% 87.74%     4.50MB  8.49%  runtime.allocm
       3MB  5.66% 93.39%        3MB  5.66%  github.com/prometheus/client_golang/prometheus.extractLabelValues (inline)

We ran this boilerplate on a variety metric types and numbers of unique label values to test our hypotheses. We noticed that the overall heap usage showed a large variance. To help present useful data our heap utilization values are an average over 3 consecutive executions. As such, the numbers that result as additional executions of this experiment will differ, but should display the same trends. We present the most significant of our findings below:

First we compare the Counter and Gauge metrics with increasing numbers of unique label values. There are two things we think are important to note. First, for 25k unique label value sets the metrics take ~20MB, this shows that need to restrict the number of unique label values because even a relatively low number (i.e. 25k) requires a non-negligible amount of memory. Second, it is easy to see a linear increase in heap usage. This aligns with our hypothesis as each additional unique label value set requires a new instance of the metric internally and therefore increases memory utilization.

                25k         50k         75k         100k
Counter         21267.43kB  32551.41kB  47126.58kB  55.52MB
Gauge           18706.79kB  34598.02kB  44053.69kB  56.52MB

Next we look at the Histogram metric. In this experiment we compared the number of unique label values (y-axis) with the number of observations for each individual set of label values (x-axis). We see the two most important observations in this experiment to be (1) memory utilization does not increase with the number of observations as the Histogram metric uses a Counter for each bucket, so the memory usage will not increase with additional observations and (2) memory usage increases linearly with the number of unique label values for the same reason as the Counter and Gauge metrics. Without performing the experiment, we posit the Summary metric will show similar behavior.

                    unique label values
observation count               250         500         750         1000
                    25k         28677.73kB  29971.42kB  28948.40kB  29973.05kB
                    50k         57.29MB     59.79MB     62.29MB     60.29MB
                    75k         76.53MB     80.53MB     77.53MB     82.53MB
                    100k        97.03MB     95.03MB     99.03MB     99.03MB

14 day(s) offloaded in the 100DaysToOffload challenge.