Edge Delta Top-k Processor

Return a list of ranked pattern matches.

Processor Recap

You can configure a processor to perform logs to metrics conversions of incoming raw log data. Once configured, the processor will populate the Anomalies and Insights pages as well as the Metrics view. Edge Delta has a number of processor types, one of which is a Top-k processor.

Top-k Processors

Top-k queries return a list of a set length of pattern matches ranked from the top by the frequency of their occurrence. Using a Top-K query is generally more resource efficient than other queries that first get a set of all matches, even the lowest ranking, for a given pattern before ranking them. This is because Top-K queries are concerned with only the highest or top ranking matches, so it stops retrieving matches when the top matches have been discovered.

A Top-k processor searches for matches for a given Golang regex pattern and it creates a leader board string for the interval with up to a maximum of (k) number of entries. It requires a lower limit for matches, a defined data collection interval, and a character to separate the matches into named groups. These parameters are configured in the agent yaml in a top_ks section.

processors:
  top_ks:
    - name: <processor_name>
      pattern: <regex_pattern>
      k: <leaderboard_size>
      interval: <duration>
      lower_limit: <lower_frequency_theshold>
      separator: <separator_character>
      <operational_parameter>: <parameter_value>

Top-k Processor Example

In the following example, logs matching the pattern would be selected and the named groups combined together to form a key for the records being counted. For example, this log will match the configured pattern: "12.195.88.88 - joe [08/Aug/2020:05:57:49 +0000] "GET /optimize/engage HTTP/1.0" 200 19092"

Only the top 10 items are picked for reporting. This is configured with the k parameter. The metrics will be reported at an interval of every 30 seconds before being removed locally. The lower_limit parameter means that only records that have at least 5 occurrences will be included in the leader board. This means that the list might be shorter than the k of 10. A comma separator is configured to combine the named group values together to form a record key. A filter configured in the filters section of the yaml has been referenced. This processor will apply only as allowed by the logic defined in the - exclude_sensitive filter.

processors:
  top_ks:
    - name: top-api-requests
      pattern: (?P<ip>\d+\.\d+\.\d+\.\d+) - \w+ \[.*\] "(?P<method>\w+) (?P<path>.+) HTTP\/\d.0" (?P<code>.+) \d+
      k: 10
      interval: 30s
      lower_limit: 5
      separator: ","
      filters:
        - exclude_sensitive

This pattern would create the following named groups from the log:

  • IP
  • method
  • path
  • code

Given the example log, the comma separator would create a record key like this: "12.195.88.88,GET,/optimize/engage,200". If the record was seen 5 times in last interval and it was one of the top k items, then the following metric would be reported: "12.195.88.88,GET,/optimize/engage,200=5" Records are ordered by their count in descending order.

Required Parameters

name

The name parameter specifies a name for a Top-k processor. You refer to this name in other places, for example to refer to a specific processor in a workflow. Names must be unique within the processor section. It is a yaml list element so it begins with a - and a space followed by the string. A name is required for a Top-k processor.

processors:
  top_ks:
    - name: <processor_name>
      pattern: <regex_pattern>
      k: <leaderboard_size>
      interval: <duration>
      lower_limit: <lower_frequency_theshold>
      separator: <separator_character>

See the example implementation in a Top-k processor.

pattern

The pattern parameter specifies a Golang regex pattern that the Top-k processor will look for. It is a string that should be wrapped in quotes to handle escapes. A pattern is required for a Top-k processor.

processors:
  top_ks:
    - name: <processor_name>
      pattern: <regex_pattern>
      k: <leaderboard_size>
      interval: <duration>
      lower_limit: <lower_frequency_theshold>
      separator: <separator_character>

See the example implementation in a Top-k processor.

k

The k parameter specifies how many entries to include in the top matches leader board. It is an integer. The k parameter is a required element for a Top-k processor.

processors:
  top_ks:
    - name: <processor_name>
      pattern: <regex_pattern>
      k: <leaderboard_size>
      interval: <duration>
      lower_limit: <lower_frequency_theshold>
      separator: <separator_character>

See the example implementation in a Top-k processor.

interval

The interval parameter specifies the reporting interval for the statistics that a Top-k processor will generate. A processor will collect values for the duration of the interval before reporting the top matches. The default is 1 minute. It is specified in the Golang duration format. It is a required parameter for a Top-k processor.

processors:
  top_ks:
    - name: <processor_name>
      pattern: <regex_pattern>
      k: <leaderboard_size>
      interval: <duration>
      lower_limit: <lower_frequency_theshold>
      separator: <separator_character>

See the example implementation in a Top-k processor.

lower_limit

The lower-limit parameter specifies a minimum count required to be included in a Top-k leader board. Matching records that don’t reach this threshold will not be reported in the metrics. It is defined as an integer. A lower_limit is required for a Top-k processor.

processors:
   top_ks:
     - name: <processor_name>
       pattern: <regex_pattern>
       k: <leaderboard_size>
       interval: <duration>
       lower_limit: <lower_frequency_theshold>
       separator: <separator_character>
 

See the example implementation in a Top-k processor.

separator

The separator parameter defines a character to use to separate the named groups specified in the Top-k pattern. This creates the record key. It is specified with an string character. The default is a comma ,. A separator is optional for a Top-k processor.

processors:
   top_ks:
     - name: <processor_name>
       pattern: <regex_pattern>
       k: <leaderboard_size>
       interval: <duration>
       lower_limit: <lower_frequency_theshold>
       separator: <separator_character>
 

See the example implementation in a Top-k processor.

Optional Parameters

filters

The filters parameter refers to a defined filter that has been configured in the filters: section of the agent yaml. The filter contains logic that defines where in the log to apply the processor. All other data is ignored by the processor. You can use a filter to prevent the processor from processing portions of a log that contain sensitive data. Filters are a yaml list element so they begin with a - and a space. They are defined with a string that matches a filter name.

processors:
   top_ks:
     - name: <processor_name>
       pattern: <regex_pattern>
       k: <leaderboard_size>
       interval: <duration>
       lower_limit: <lower_frequency_theshold>
       separator: <separator_character>
       filters:
         - <filter_reference>
 

See the example implementation in a Top-k processor.