Edge Delta Top-k Node
4 minute read
Overview
Top-k queries return a list of a set length of pattern matches ranked from the top by the frequency of their occurrence. Using a Top-K query is generally more resource efficient than other queries that first get a set of all matches, even the lowest ranking, for a given pattern before ranking them. This is because Top-K queries are concerned with only the highest or top ranking matches, so it stops retrieving matches when the top matches have been discovered.
A Top-k processor searches for matches for a given Golang regex pattern and it creates a leader board string for the interval with up to a maximum of (k) number of entries. It requires a lower limit for matches, a defined data collection interval, and a character to separate the matches into named groups. These parameters are configured in the agent yaml in a top_ks section.
Example Configuration
In the following example, logs matching the pattern would be selected and the named groups combined together to form a key for the records being counted.
Only the top 3 items are picked for reporting. This is configured with the k parameter. The metrics will be reported at an interval of every 30 seconds before being removed locally. The lower_limit parameter means that only records that have at least 2 occurrences will be included in the leader board. This means that the list might be shorter than the k of 3. A comma separator is configured to combine the named group values together to form a record key.
nodes:
- name: top_api_requests
type: top_k
pattern: (?P<method>\w+) (?P<path>.+) HTTP\/\d.0
k: 3
interval: 30s
lower_limit: 2
separator: ","
This pattern would create the following named groups from the log:
- method
- path
Required Parameters
name
A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: top_k
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
k
The k
parameter specifies how many entries to include in the top matches leader board. It is an integer. The k
parameter is a required element for a top_k node.
nodes:
- name: <node name>
type: top_k
pattern: <regex pattern>
k: <list size>
lower_limit: <minimum count>
lower_limit
The lower-limit
parameter specifies a minimum count required to be included in a top_k leader board. Matching records that don’t reach this threshold will not be reported in the metrics. It is defined as an integer. A lower_limit
is required for a top_k node.
nodes:
- name: <node name>
type: top_k
pattern: <regex pattern>
k: <list size>
lower_limit: <minimum count>
pattern
The pattern
parameter specifies a Golang regex pattern that the top_k processor will look for. A pattern
is a string and is required for a top_k node. See Regex Testing for details on writing effective regex patterns.
nodes:
- name: <node name>
type: top_k
pattern: <regex pattern>
k: <list size>
lower_limit: <minimum count>
Optional Parameters
exclude_capture_names
The exclude_capture_names
parameter determines whether to include the capture names in the top_k list. It is specified with a Boolean, the default is true
and it is optional. In the following example, false
would return pw:abc123
while true
would return abc123
.
nodes:
- name: <node name>
type: top_k
pattern: (?P<pw>\w+)
exclude_capture_names: true|false
group_by
The group_by
parameter defines how to aggregate log items into buckets, based on their properties. Each entry should be an expression (CEL or Go template). When group_by
is not set, metrics are grouped by their source. It is specified as a list and is optional.
nodes:
- name: top_api_requests
type: top_k
pattern: (?P<ip>\d+\.\d+\.\d+\.\d+) - \w+ \[.*\] "(?P<method>\w+) (?P<path>.+) HTTP\/\d.0" (?P<code>.+) \d+
k: 10
interval: 30s
lower_limit: 5
separator: ","
group_by:
- item["resource"]["src_type"]
interval
The interval
parameter specifies the reporting interval for the statistics that a top_k node will generate. The node will collect values for the duration of the interval before reporting the top matches. The default is 1 minute. It is specified in the Golang duration format. It is an optional parameter for a top_k node.
nodes:
- name: <node name>
type: top_k
pattern: <regex pattern>
k: <list size>
lower_limit: <minimum count>
interval: <duration>
separator
The separator
parameter defines a character to use to separate the named groups specified in the top_k pattern. This creates the record key. It is specified with an string character. The default is a comma ,
. A separator
is optional for a top_k node.
nodes:
- name: <node name>
type: top_k
pattern: <regex pattern>
k: <list size>
lower_limit: <minimum count>
separator: "<separator character>"