Edge Delta Top-k Node

Return a list of ranked pattern matches.

Overview

Top-k queries return a list of a set length of pattern matches ranked from the top by the frequency of their occurrence. Using a Top-K query is generally more resource efficient than other queries that first get a set of all matches, even the lowest ranking, for a given pattern before ranking them. This is because Top-K queries are concerned with only the highest or top ranking matches, so it stops retrieving matches when the top matches have been discovered.

A Top-k processor searches for matches for a given Golang regex pattern and it creates a leader board string for the interval with up to a maximum of (k) number of entries. It requires a lower limit for matches, a defined data collection interval, and a character to separate the matches into named groups. These parameters are configured in the agent yaml in a top_ks section.

Example Configuration

In the following example, logs matching the pattern would be selected and the named groups combined together to form a key for the records being counted.

Only the top 3 items are picked for reporting. This is configured with the k parameter. The metrics will be reported at an interval of every 30 seconds before being removed locally. The lower_limit parameter means that only records that have at least 2 occurrences will be included in the leader board. This means that the list might be shorter than the k of 3. A comma separator is configured to combine the named group values together to form a record key.

nodes:
  - name: top_api_requests
    type: top_k
    pattern: (?P<method>\w+) (?P<path>.+) HTTP\/\d.0
    k: 3
    interval: 30s
    lower_limit: 2
    separator: ","

This pattern would create the following named groups from the log:

  • method
  • path

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: top_k

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

k

The k parameter specifies how many entries to include in the top matches leader board. It is an integer. The k parameter is a required element for a top_k node.

nodes:
  - name: <node name>
    type: top_k
    pattern: <regex pattern>
    k: <list size>
    lower_limit: <minimum count>

lower_limit

The lower-limit parameter specifies a minimum count required to be included in a top_k leader board. Matching records that don’t reach this threshold will not be reported in the metrics. It is defined as an integer. A lower_limit is required for a top_k node.

nodes:
  - name: <node name>
    type: top_k
    pattern: <regex pattern>
    k: <list size>
    lower_limit: <minimum count>

pattern

The pattern parameter specifies a Golang regex pattern that the top_k processor will look for. A pattern is a string and is required for a top_k node. See Regex Testing for details on writing effective regex patterns.

nodes:
  - name: <node name>
    type: top_k
    pattern: <regex pattern>
    k: <list size>
    lower_limit: <minimum count>

Optional Parameters

exclude_capture_names

The exclude_capture_names parameter determines whether to include the capture names in the top_k list. It is specified with a Boolean, the default is true and it is optional. In the following example, false would return pw:abc123 while true would return abc123.

nodes:
  - name: <node name>
    type: top_k
    pattern: (?P<pw>\w+)
    exclude_capture_names: true|false

group_by

The group_by parameter defines how to aggregate log items into buckets, based on their properties. Each entry should be an expression (CEL or Go template). When group_by is not set, metrics are grouped by their source. It is specified as a list and is optional.

nodes:
- name: top_api_requests
    type: top_k
    pattern: (?P<ip>\d+\.\d+\.\d+\.\d+) - \w+ \[.*\] "(?P<method>\w+) (?P<path>.+) HTTP\/\d.0" (?P<code>.+) \d+
    k: 10
    interval: 30s
    lower_limit: 5
    separator: ","
    group_by:
      - item["resource"]["src_type"]

interval

The interval parameter specifies the reporting interval for the statistics that a top_k node will generate. The node will collect values for the duration of the interval before reporting the top matches. The default is 1 minute. It is specified in the Golang duration format. It is an optional parameter for a top_k node.

nodes:
  - name: <node name>
    type: top_k
    pattern: <regex pattern>
    k: <list size>
    lower_limit: <minimum count>
    interval: <duration>    

separator

The separator parameter defines a character to use to separate the named groups specified in the top_k pattern. This creates the record key. It is specified with an string character. The default is a comma ,. A separator is optional for a top_k node.

nodes:
  - name: <node name>
    type: top_k
    pattern: <regex pattern>
    k: <list size>
    lower_limit: <minimum count> 
    separator: "<separator character>"