PromQL Reference

Q: What is PromQL and what is it used for?

PromQL (Prometheus Query Language) is the functional query language for Prometheus, a popular open-source monitoring and alerting system. It is used to select and aggregate time-series metrics in real time, write alerting rules (e.g., fire an alert if error rate exceeds 5%), and build Grafana dashboard panels visualizing system health, latency, throughput, and resource usage.

Q: What is the difference between rate() and irate() in PromQL?

rate(v[t]) computes the average per-second increase rate over the specified time window, smoothing out spikes. irate(v[t]) computes the instantaneous rate based on the last two data points in the range, making it more sensitive to sudden spikes. Use rate() for alerting and trend analysis; use irate() for dashboards where you want to see rapid fluctuations.

Q: When should I use rate() vs increase() vs delta()?

Use rate() for counters (monotonically increasing metrics like http_requests_total) when you want per-second throughput. Use increase() for counters when you want the total count over a time window. Use delta() for gauges (metrics that go up and down like temperature or memory) when you want the change over a time window. Never use delta() on counters.

Q: How do I calculate p99 latency with histogram_quantile?

Use histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])). The 0.99 is the quantile (99th percentile). The bucket metric must end in _bucket and be a histogram type. Always wrap with rate() over a time window when calculating quantiles from live data. Change 0.99 to 0.95 for p95 or 0.50 for median.

Q: How do I sum metrics across all instances but keep certain labels?

Use sum() with by() to keep specific labels: `sum(rate(http_requests_total[5m])) by (method, status)`. This sums across all instances, pods, and other dimensions while preserving the method and status labels. Use without() to drop specific labels while keeping all others: `sum(rate(http_requests_total[5m])) without (instance, pod)`.

Q: What is a range vector in PromQL and why is it required for rate()?

A range vector selects samples over a time window rather than a single point in time. It is written as metric[5m] (last 5 minutes of samples). Functions like rate(), irate(), increase(), and delta() require range vectors because they need multiple data points to compute a rate or delta. Instant vector selectors without [t] return a single sample per time series.

Q: How do I compare current metrics with data from one hour ago?

Use the offset modifier: `rate(http_requests_total[5m] offset 1h)` returns the rate from 1 hour ago. You can subtract it from the current rate to compute the change: `rate(http_requests_total[5m]) - rate(http_requests_total[5m] offset 1h)`. The @ modifier lets you query at a specific Unix timestamp: `metric @ 1609459200`.

Q: How do I predict when disk space will run out using PromQL?

Use predict_linear(): `predict_linear(node_filesystem_free_bytes[1h], 4*3600)` predicts the disk free bytes 4 hours into the future based on a linear regression over the last 1 hour of data. A common alerting rule fires when predict_linear(disk_free[6h], 24*3600) < 0, meaning the disk is predicted to fill up within 24 hours.

Free reference guide: PromQL Reference

33 results

About PromQL Reference

The PromQL Reference is a structured, searchable guide covering all essential syntax and functions of Prometheus Query Language (PromQL), the query language used by Prometheus for time-series monitoring and Grafana for visualization. It is organized into six categories: Selectors (metric name selection, label matching with =, =~, !=, !~, and __name__ regex), Operators (arithmetic +/-/*//, comparison ==/>/<, logical and/or/unless, label matching modifiers on()/ignoring(), group_left/group_right for many-to-one joins), Functions (rate, irate, increase, histogram_quantile, delta, deriv, predict_linear, math functions, label_replace), Aggregation (sum, avg, count, min/max, topk, quantile), Range Vectors ([5m] notation, offset, @ timestamp, subqueries), and Labels (by(), without(), label_join()).

SREs, platform engineers, and DevOps teams use this reference when writing Prometheus alerting rules, building Grafana dashboards, setting up SLO monitoring, and diagnosing infrastructure performance issues. PromQL's functional query model differs from SQL and requires understanding concepts like instant vectors, range vectors, and the distinction between counters (rate/increase) and gauges (delta/deriv), which this reference clarifies with concrete examples.

Each entry shows the exact PromQL syntax with a realistic query example using common Prometheus metrics such as `http_requests_total`, `node_memory_MemAvailable_bytes`, and `http_request_duration_seconds_bucket`. The reference covers advanced use cases including histogram quantile calculation for latency percentiles (p99, p95), predict_linear for disk space forecasting, subqueries for computing rates over rates, and time offset queries for comparing current vs. historical data.

Key Features

Label selector syntax: exact match {label="value"}, regex =~, negative != and !~, metric name regex with {__name__=~"pattern"}
Arithmetic operators (+, -, *, /, %, ^) for metric math like computing used memory from total minus available
Comparison operators (==, !=, >, <, >=, <=) for filtering and alert threshold expressions
Logical operators (and, or, unless) with on()/ignoring() label matching modifiers and group_left/group_right joins
Rate functions: rate() for per-second counter rate, irate() for instant rate, increase() for total increase over a range
Histogram quantile: histogram_quantile(0.99, rate(duration_bucket[5m])) for p99 latency calculation
Aggregation operators: sum(), avg(), count(), min(), max(), topk(), quantile() with by() and without() label grouping
Range vector features: [5m] window notation, offset 1h for time shifting, @ timestamp, and subqueries for nested evaluation

Frequently Asked Questions

What is PromQL and what is it used for?

PromQL (Prometheus Query Language) is the functional query language for Prometheus, a popular open-source monitoring and alerting system. It is used to select and aggregate time-series metrics in real time, write alerting rules (e.g., fire an alert if error rate exceeds 5%), and build Grafana dashboard panels visualizing system health, latency, throughput, and resource usage.

What is the difference between rate() and irate() in PromQL?

rate(v[t]) computes the average per-second increase rate over the specified time window, smoothing out spikes. irate(v[t]) computes the instantaneous rate based on the last two data points in the range, making it more sensitive to sudden spikes. Use rate() for alerting and trend analysis; use irate() for dashboards where you want to see rapid fluctuations.

When should I use rate() vs increase() vs delta()?

Use rate() for counters (monotonically increasing metrics like http_requests_total) when you want per-second throughput. Use increase() for counters when you want the total count over a time window. Use delta() for gauges (metrics that go up and down like temperature or memory) when you want the change over a time window. Never use delta() on counters.

How do I calculate p99 latency with histogram_quantile?

Use histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])). The 0.99 is the quantile (99th percentile). The bucket metric must end in _bucket and be a histogram type. Always wrap with rate() over a time window when calculating quantiles from live data. Change 0.99 to 0.95 for p95 or 0.50 for median.

How do I sum metrics across all instances but keep certain labels?

Use sum() with by() to keep specific labels: `sum(rate(http_requests_total[5m])) by (method, status)`. This sums across all instances, pods, and other dimensions while preserving the method and status labels. Use without() to drop specific labels while keeping all others: `sum(rate(http_requests_total[5m])) without (instance, pod)`.

What is a range vector in PromQL and why is it required for rate()?

A range vector selects samples over a time window rather than a single point in time. It is written as metric[5m] (last 5 minutes of samples). Functions like rate(), irate(), increase(), and delta() require range vectors because they need multiple data points to compute a rate or delta. Instant vector selectors without [t] return a single sample per time series.

How do I compare current metrics with data from one hour ago?

Use the offset modifier: `rate(http_requests_total[5m] offset 1h)` returns the rate from 1 hour ago. You can subtract it from the current rate to compute the change: `rate(http_requests_total[5m]) - rate(http_requests_total[5m] offset 1h)`. The @ modifier lets you query at a specific Unix timestamp: `metric @ 1609459200`.

How do I predict when disk space will run out using PromQL?

Use predict_linear(): `predict_linear(node_filesystem_free_bytes[1h], 4*3600)` predicts the disk free bytes 4 hours into the future based on a linear regression over the last 1 hour of data. A common alerting rule fires when predict_linear(disk_free[6h], 24*3600) < 0, meaning the disk is predicted to fill up within 24 hours.