Prometheus is a popular metric reporting framework. It utilizes a pull-based collection system where exporters expose endpoints and a centralized collection agent periodically requests metric values. This scheme has been validated by prolonged testing and scaled deployments.
However, while there are certainly “best practices” in defining metrics, the exporter API does nothing to safeguard you from inefficiencies. In our applications, this means extremely high memory pressure on metric exporters using the Prometheus golang client. Without a serviceable understanding of the underlying implementation it is difficult to understand the cause and more importantly how to avoid defining inefficient metrics.
Storing Metric Values in Bounded Memory
Prometheus supports a variety of metric definitions:
- Counter: Track strictly increasing values like request count.
- Gauge: Enable increasing and decreasing to track things like the number of active requests.
- Histogram: Define buckets, each of which is a
Counter
, to track distributions of values like request latencies. - Summary: Use observation samples to compute quantiles over observations.
In the exporter, each of these metric definitions is designed to fit in bounded memory. This metric storage requirements do not scale with additional observations. Instead, the exporter maintains a bounded set of values for each metric and the collector annotates the reported metric values with various metadata on collection. For example, the timestamp of the metric observation.
The Counter
and Gauge
metrics are stored as a single 8-byte value. In the former this value must be strictly increasing and the later supports decrements as well.
A Histogram
is defined using bucket intervals. For each interval a single Counter
metric is stored. So recording an observation in the histogram is as simple as determining the bucket definition and incrementing the corresponding Counter
.
A Summary
metric requires a more complex data structure to provide quantile values. Basically, each instance maintains a collection of time-bounded data samples which each have staggered time offsets so that they range the “MaxAge” of the Summary
. This is meant to emulate a sliding window without requiring timestamped metadata on each observation. Basically, the longest active sample is used to compute and report the quantile value. When the longest sample has exceeded the “MaxAge” the data is purged and sample collection starts new. As a result, this metric stores duplicates of each observation and therefore requires significantly more memory than other metrics.
Effect of Metric Labels on Memory Utilization
Labels allow specific values to be tied to a metric observation. For example, a Counter
metric defined for tracking the number of gRPC requests could have a label for the gRPC method, namely “method”. Using this label, Prometheus can enable tracking observations on a per-method basis as well as total observations.
As mentioned previously, Prometheus stores each metric using bounded memory. However, when using labels on metrics each unique set of labels requires a separate “instance” of the metric internally. Using our gRPC example above, if we have method calls for “put”, “get”, and “delete” we will initialize a CounterVec
metric with those labels and then call the GetMetricWith
function with the label value for the method to retrieve the singular metric instance. Internally, each unique method requires a separate 8-byte value to track the count.
Metrics may be defined with multiple labels. However, as a general rule of thumb memory utilization scales with the number of possible unique label values. This is because, as previously described, each unique set of label values requires a separate metric instance internally. This can become problematic in scenarios where:
- Labels have high-cardinality or are unbounded. Using labels with many potential values means the unique set of label values can be very large.
- A metric is defined with a large number of labels. If each label has 5 potential values, then the number of unique combinations scales exponentially with the number of labels as
5^n
. This gets very large very fast.
This memory utilization issue is especially important when using a Histogram
or Summary
metric as these have significantly higher base memory requirements than the Counter
or Gauge
metrics.
Benchmarking Memory Utilization
To loosely profile the memory utilization of Prometheus exporters over a variety of metrics we put together a small program that simply creates a labeled metric vector and records observations with a varying degree of unique label values. Specifically, we are testing our hypotheses that:
- Metrics without labels, or a small number of unique label values, are relatively cheap in terms of memory utilization.
- Memory utilization scales linearly with the number of unique label values.
The boilerplate code we use to benchmark is provided below. This example profiles the Counter
metric specifically, however the other experiments only require updating the metric initialization and observation functions. From a high-level view this code (1) initializes a new metric vector defined with 3 labels, namely “foo”, “bar”, and “baz”; (2) records a collection of observations where each has a unique set of label values by declaring the “baz” label with an incrementing value; and (3) dumping the heap profile.
package main
import (
"fmt"
"os"
"runtime"
"runtime/pprof"
"github.com/prometheus/client_golang/prometheus"
)
func main() {
labels := []string{"foo", "bar", "baz"}
labelValues := map[string]string{
"foo": "foo",
"bar": "bar",
}
// create and register metric vector
metricVec := prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "test",
},
labels,
)
if err := prometheus.Register(metricVec); err != nil {
panic(err)
}
// observe metrics
for i := 0; i<100000; i++ {
labelValues["baz"] = fmt.Sprintf("baz-%d", i)
metric, err := metricVec.GetMetricWith(labelValues)
if err != nil {
panic(err.Error())
}
metric.Inc()
}
// flush heap profile
f, err := os.Create("heap.prof")
if err != nil {
panic(fmt.Sprintf("could not create memory profile: ", err))
}
defer f.Close()
runtime.GC()
if err := pprof.WriteHeapProfile(f); err != nil {
panic(fmt.Sprintf("could not write memory profile: ", err))
}
}
Execution of this code produces a heap.prof
file which can be parsed and queried using the pprof
tool in go with the command go tool pprof heap.prof
. An example of printing the top 5 nodes according the heap profile is shown below:
hamersaw@ragnarok:~/development/benchmarks/prometheus-metrics$ go tool pprof heap.prof
File: yogi
Type: inuse_space
Time: Jul 7, 2022 at 3:01am (CDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 5
Showing nodes accounting for 49.52MB, 93.39% of 53.03MB total
Showing top 5 nodes out of 28
flat flat% sum% cum cum%
22MB 41.49% 41.49% 22MB 41.49% github.com/prometheus/client_golang/prometheus.MakeLabelPairs
12.50MB 23.58% 65.07% 34.50MB 65.07% github.com/prometheus/client_golang/prometheus.NewCounterVec.func1
7.52MB 14.17% 79.24% 45.02MB 84.90% github.com/prometheus/client_golang/prometheus.(*metricMap).getOrCreateMetricWithLabels
4.50MB 8.49% 87.74% 4.50MB 8.49% runtime.allocm
3MB 5.66% 93.39% 3MB 5.66% github.com/prometheus/client_golang/prometheus.extractLabelValues (inline)
We ran this boilerplate on a variety metric types and numbers of unique label values to test our hypotheses. We noticed that the overall heap usage showed a large variance. To help present useful data our heap utilization values are an average over 3 consecutive executions. As such, the numbers that result as additional executions of this experiment will differ, but should display the same trends. We present the most significant of our findings below:
First we compare the Counter
and Gauge
metrics with increasing numbers of unique label values. There are two things we think are important to note. First, for 25k unique label value sets the metrics take ~20MB, this shows that need to restrict the number of unique label values because even a relatively low number (i.e. 25k) requires a non-negligible amount of memory. Second, it is easy to see a linear increase in heap usage. This aligns with our hypothesis as each additional unique label value set requires a new instance of the metric internally and therefore increases memory utilization.
25k 50k 75k 100k
Counter 21267.43kB 32551.41kB 47126.58kB 55.52MB
Gauge 18706.79kB 34598.02kB 44053.69kB 56.52MB
Next we look at the Histogram
metric. In this experiment we compared the number of unique label values (y-axis) with the number of observations for each individual set of label values (x-axis). We see the two most important observations in this experiment to be (1) memory utilization does not increase with the number of observations as the Histogram
metric uses a Counter
for each bucket, so the memory usage will not increase with additional observations and (2) memory usage increases linearly with the number of unique label values for the same reason as the Counter
and Gauge
metrics. Without performing the experiment, we posit the Summary
metric will show similar behavior.
unique label values
observation count 250 500 750 1000
25k 28677.73kB 29971.42kB 28948.40kB 29973.05kB
50k 57.29MB 59.79MB 62.29MB 60.29MB
75k 76.53MB 80.53MB 77.53MB 82.53MB
100k 97.03MB 95.03MB 99.03MB 99.03MB
14 day(s) offloaded in the 100DaysToOffload challenge.