Edge Delta Sample Processor

Filter incoming items using probabilistic sampling based on specified criteria.

Overview

The Sample Processor node enables you to filter and manage Logs and Traces based on a given sampling type and percentage, using consistent probabilistic sampling. It works by letting a specified percentage of data pass through based on various fields, and provides additional configurable options to suit different criteria for sampling.

Note: The Sample Processor node applies sampling to Logs and Traces only, and passes through all other data types.

Example Configuration

nodes:
- name: sampler
  type: sample
  percentage: 10
  field_paths:
    - item["attributes"]["foo"]
  pass_through_on_failure: true
  priority_field: item["attributes"]["priority"]
  timestamp_granularity: "1s"

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: sample

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

percentage

This parameter specifies the percentage of items that will be allowed to pass through the node without filtering. It is a required integer parameter.

nodes:
- name: sampler
  type: sample
  percentage: 10
  pass_through_on_failure: true

pass_through_on_failure

This boolean parameter determines whether items should pass through if an error occurs during the evaluation of sampling. It is required and defaults to true.

nodes:
- name: sampler
  type: sample
  percentage: 10
  pass_through_on_failure: true

Optional Parameters

field_paths

List the paths to fields used for determining how sampling should occur. If not specified, traces are sampled by trace ID and logs by timestamp, service name, and body.

nodes:
- name: sampler
  type: sample
  percentage: 10
  pass_through_on_failure: true
  field_paths:
    - item["attributes"]["foo"]

priority_field

Defines a field whose presence will override the default sampling percentage if the field has a value. This value is optional.

nodes:
- name: sampler
  type: sample
  percentage: 10
  pass_through_on_failure: true
  priority_field: item["attributes"]["priority"]

timestamp_granularity

This duration parameter specifies the granularity of timestamps when sampling by timestamp, with a minimum allowed granularity of 1 millisecond. It’s optional.

nodes:
- name: sampler
  type: sample
  percentage: 10
  pass_through_on_failure: true
  timestamp_granularity: "1s"