Describing Prometheus Query Language

Prometheus provides a query language, PromQL that allows you to select and aggregate timeseries data.

You can filter a metric to include only certain key/value pairs. For example, modify the previous query to show only metrics for the worker02 node using the expression instance:node_cpu_utilisation:rate1m{instance="worker02"}.

Prometheus Query Language provides several operators to compute new time-series metrics.

PromQL contains arithmetic operators, including addition, subtraction, multiplication, and division operators. PromQL contains comparison operators, including equality, greater than, and less than operators.

PromQL contains a number of built-in functions that you can include in PromQL expressions including:

  • sum() - which totals the value of all sample entries at a given time.
  • rate() - computes the per-second average of a time series for a given time range.
  • count() - counts the number of sample entries at a given time.

Every Prometheus alert contains a Prometheus Query Language expression. Consider the KubeCPUOvercommit alert, available in Monitoring → Alerting → Alerting Rules.

The KubeCPUOvercommit alert expression compares two ratios using the sum function, the division operator, and the greater-than operator:

  • The left ratio is the total number of CPU requests divided by the total number of CPU cores.
  • The right ratio is a count of entries minus 1 for the CPU cores metric, divided by a count of entries for the CPU metric.

Each entry in the CPU cores metric corresponds to a node. Therefore, the right side ratio corresponds to a percentage of cluster CPU capacity if a single node fails. If the expression evaluates as true for five minutes, then the alert starts firing with a warning severity.

Firing alerts can notify administrators of potential cluster problems that might require further investigation.