Aggregate Processor
Description
With the Aggregate Processor you can aggregate metrics and events for specified fields, and then evaluate those aggregations using defined conditions to send alerts when those conditions are met.
Use
Evaluate Metric or Log event fields using any aggregation strategy such as Sum, Average, Min, or Max and trigger alerts based on specified conditions.
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. For example, if you set it to a five-minute tumbling window, the elements with timestamp values [0:00:00-0:05:00) are in the first window. Elements with timestamp values [0:05:00-0:10:00) are in the second window.
A Sliding window has a fixed time length, and it moves forward or “slides” at a time interval smaller than the window’s length. For example, a sliding window can be five minutes long, and slide every one minute and capture five minutes of data. The length of the slide is not user-configurable by user, the system will automatically calculate an appropriate slide based on the window size.
Each Processor input to the Aggregate Processor is a single thread. Inputs from three or more Processors can result in slower processing times.
Option | Description | Example |
---|---|---|
Group By | Select one or more field names. The processor aggregates the data based on a unique set of field values. Uses the Name, Namespace, and Tag fields for the grouping. | .app or .tags.cluster |
Evaluate | Choose the evaluation method for the fields. Note that these evaluation methods only apply to
| |
Window Type |
| Sliding |
Tumbling | Interval in seconds Range: 1 minute to 25 hours | 1800 | |
Sliding | Interval | 1800 | |
Sliding | Minimum Duration | 180 | |
Condition | The conditions to trigger an Alert based on aggregated/evaluated value. Two types of alerting are supported: | |
Threshold Alert Compare the aggregated value to a specified threshold value. Comparison operators:
| .value.value <greater_or_ equal_to> 90 | |
Change Alert Set conditions based on how much the aggregated value changed compared to the prior evaluation. This change can be based on % change or absolute value change. Percent change operators:
Value change operators:
|
|
Custom Option
If the event isn't an OTEL metrics event (for example, the metric value is not in the path .value.value), you can aggregate the value with custom aggregation logic based on Mezmo's JavaScript framework. The topic for the Script Execution Processor provides more details about Mezmo’s JavaScript framework.
For example, if you are looking to sum the error_count
property of all log events, you would use this script:
function aggregateEvent(accum, event, metadata, annotations) {
accum.error_count = accum.error_count + event.error_count
return accum
}
With a Custom aggregation strategy, it is important to note that the initial value of the accum
object is the first event in the window . Your script will only be executed for subsequent events in the window. Each time the script is executed within the window, it will be called with the previous value of accum
and the current event
. When the window elapses, the value of accum
will be emitted as the aggregated event.
For example, if you are looking to aggregate a count of events into a new field:
function aggregateEvent(accum, event, metadata, annotations) {
// The first time this script is executed will be on the second
// event in the window, with `accum` representing the first event.
//
// Initialize a new field on `accum`, setting it to
// 1 to represent the fact that 1 event is already present
// in the buffer
if (!accum.event_count) {
accum.event_count = 1
}
// Now that we've accounted for the accum event and initialized
// the new field with a value, we can add 1 to the current count.
accum.event_count = accum.event_count + 1
return accum
}
Metadata Fields
The Aggregate Processor rocessor adds these metadata fields when an event is emitted.
Metadata Field | |
---|---|
.metadata.aggregate.flush_timestamp | The time when the Processor emitted the aggregation event. This could be due to the following:
|
.metadata.aggregate.start_timestamp | Aggregation window start time |
.metadata.aggregate.end_timestamp | Aggregation window end time |
.metadata.aggregate.event_count | # of events aggregated |
Detecting Alert vs Aggregation Output
You can use these fields to determine if the event is triggered due to a threshold breach or a normal aggregation event.
An alert is triggered if
Normal Aggregation Event if