CEL Custom Macros

Use CEL Custom Macros to reference log fields.

CEL Macro Overview

There are several Common Expression Language (CEL) custom macros you can use to reference fields, for example in the Enrichment node’s field mappings parameter. Custom macros are defined as extensions to the CEL. As with any general CEL expression, references to fields that don’t exist will return an error. When used in a transformation or mapper, these CEL expressions are handled as best effort and expressions resulting in an error will be replaced with an empty string (“”). The inputs to these functions are CEL field path expressions referring to fields that have the given type.

See Designing Pipelines and Testing Edge Delta Nodes

CEL Macros

convert_timestamp(input location string, input format string, output format string)

This macro is used to convert timestamps. There are three options:

  • convert between datetime stamp and datetime stamp formats
  • convert a datetime stamp to a unix format
  • convert a unix format to a datetime stamp.

You specify the field location of the timestamp, the current format, and the desired format:

  • input location: Specify the location of the timestamp field using the field path or a regex_capture CEL macro.
  • input format: Provide an example of the format of the current timestamp from the following list. If the format does not match the incoming log’s timestamp format, the processor will fail.
  • output format: Provide an example of the desired format for the timestamp. You can enter an example in one of the following formats, or copy the format you require from this list:
    • “Unix Second”
    • “Unix Milli”
    • “Unix Nano”
    • “2006-01-02”
    • “2006-01-02T15:04:05Z”
    • “2006-01-02T15:04:05”
    • “2006-01-02T15:04:05.000Z”
    • “2006-01-02T15:04:05.000000Z”
    • “2006-01-02T15:04:05.000000000Z”
    • time.RFC1123
    • time.RFC1123Z
    • time.RFC3339
    • time.RFC3339Nano
    • “01/02/06”
    • “15:04”
    • “01/02/2006 15:04”
    • “January 2, 2006”
    • “15:04:05”
    • “January 2, 2006 15:04:05”
    • “January 2, 2006 15:04:05.000”
    • “January 2, 2006 15:04:05.000000”
    • “January 2, 2006 15:04:05.000000000”
    • “Mon, Jan 2, 2006 3:04 PM”
    • “2 January 2006 15:04”
    • “2 Jan 2006 15:04”

Example:

convert_timestamp(item["attributes"]["timestamp"], "2006-01-02T15:04:05.999Z", "Unix Milli")
convert_timestamp(item["attributes"]["timestamp"], "2024-01-02T15:03:06.000Z", "Unix Milli")

Both these examples will create the same configuration because the timestamp examples are in the same format even though they show different datetimes.

first_non_empty(listOfStrs []string)

  • Input: []string
  • Output: string

This macro returns the first non empty string from the input parameters.

Note hardcoded fallback values can not contains commas and the first_non_empty. In addition, this function can’t be nested within other CEL macros. However, you can apply the first_non_empty function, upsert it into the data item, and then apply any other cel macros on that new field.

  - name: transform
    type: log_transform
    transformations:
      - field_path: cluster
        operation: upsert
        value: first_non_empty([env("UNDEFINED_CLUSTER"), env("CLUSTER"), "default-cluster"])

In this example, the macro first checks if env("UNDEFINED_CLUSTER") has a non-empty value. If it does, that value will be used for the cluster field. If env("UNDEFINED_CLUSTER") is empty or not set, the macro will check env("CLUSTER"). If env("CLUSTER") has a non-empty value, that value will be used. Finally, if both environment variables are empty or not set, the hardcoded fallback "default-cluster" will be used as the value for the cluster field.

regex_match(input string, regex string)

  • Input: string, string
  • Output: bool

Returns whether or not the input string matches the regex string.

  - name: example_router
    type: route
    paths:
      - path: "pre_elastic"
        condition: regex_match(item["body"], "(?i)ERROR")

In this example, the input string, item["body"] refers to the “body” field of the log being processed. The regex string is the pattern that the input string is tested against. The regex pattern “(?i)ERROR” is used to search for the word “ERROR” in a case-insensitive manner. If this regex match returns true, meaning the word “ERROR” is present in the “body” of the log, then the route “pre_elastic” will be used.

regex_capture(input string, regexWithCaptureGroups string)

  • Input: string, string
  • Output: map[string]string

Returns one or more parts from the string using regex capture groups. The key for the returned map is the capture group and the value for the map is the value for that capture group.

  - name: transform
    type: log_transform
    transformations:
      - field_path: pod_id
        operation: upsert
        value: regex_capture(item["resource"]["ed.filepath"], "/var/logs/(?P<id>(.+))/.*")["id"]

In this example,

  • The input string (the string from which you want to extract data) item["resource"]["ed.filepath"] represents a nested field within a log that contains a file path.

  • The regexWithCaptureGroups string (the regex pattern containing one or more named capture groups) /var/logs/(?P<id>(.+))/.* has a named capture group id, which will match any characters after /var/logs/ and before the next slash /. Within the named capture group (?P<id>(.+)), defines the name of the group as id, and the (.+) part captures one or more characters. The .+ is greedy, meaning it will match as much text as possible until it reaches the following / character.

  • Output: The output is a map where each key is a capture group name, and its associated value is the substring from the input that the capture group matched. Assuming that item["resource"]["ed.filepath"] contains something like /var/logs/kubernetes_pod123/other_data, the regex_capture function would match kubernetes_pod123 for the capture group id, and the result would be a map with a single key-value pair: {"id": "kubernetes_pod123"}. After applying the function, the log or data item would include a new field, pod_id, which would be populated with the pod identifier kubernetes_pod123 extracted from the file path. The enrichment node utilizes the named capture group id from the regex pattern to obtain and assign this value. Hence, the use of ["id"] at the end of the value field accesses the value associated with the id key in the resulting map.

env(envVarKey string)

  • Input: string
  • Output: string

Returns the value from the environment variables, if the environment variable doesn’t exist an empty string will be returned.

  - name: datadog_mapping_node
    type: datadog_mapper
    dd_level: env("LEVEL")

In this example, the value for dd_level is obtained by calling the env macro with “LEVEL”, which retrieves the value of the LEVEL environment variable. For example, if the environment variable LEVEL is set to info, this configuration would result in setting dd_level to info.

from_k8s(podID string, podAttributeName string)

  • Input: string, string
  • Output: string

This configuration is particularly useful when you want to annotate log data with contextual Kubernetes information, such as the deployment name or namespace, which can be vital for understanding the source and the context of the logs when analyzing them or looking for issues in your Kubernetes environment. It returns the attribute value of a K8s pod given a pod id, if the pod id is not found this will return an error.

  - name: transform
    type: log_transform
    transformations:
      - field_path: deployment_name
        operation: upsert
        value: from_k8s(item["pod_id"], "k8s.deployment.name")
      - field_path: namespace
        operation: upsert
        value: from_k8s(item["pod_id"], "k8s.namespace.name")     

In this example:

  • podID string: This is the identifier for the Kubernetes pod from which attributes are to be retrieved: item["pod_id"].
  • podAttributeName string: This is the name of the attribute within the Kubernetes pod you want to retrieve. For example, k8s.deployment.name or k8s.namespace.name would retrieve the deployment name and namespace name of the specified pod respectively.
  • Output: The output is the value of the requested Kubernetes pod attribute as a string.

json(jsonStr string)

  • Input: string
  • Output: Map[string]any

The json macro is utilized to parse a JSON-formatted string and convert it into a map, where the map’s keys are strings and the values can be of any type (e.g., string, number, boolean, nested map, etc.). If the JSON string can’t be parsed successfully—an error in the JSON formatting, for example—an error will be returned by the macro.

  - name: resource_transform
    type: resource_transform
    target_source_type: k8s 
    source_field_overrides:
      - field: k8s.container.name
        expression: json(item["body"]).kubernetes.container.name

In this example:

  • field: k8s.container.name: This specifies the field within the source data that should be overridden. In this case, it’s the name of the Kubernetes container.
  • expression: json(item["body"]).kubernetes.container.name: This is the transformation logic or expression used to override the specified field. The expression invokes the json macro to parse the JSON string from item[“body”]. then the value associated with the key path kubernetes.container.name is accessed within the resulting map.

Suppose the JSON object in item[“body”] has the structure like the following example (parsed for readability):

{
    "kubernetes": {
        "container": {
            "name": "example-container"
        }
    }
}

The json macro will parse this JSON string into a map, and the expression will extract the value example-container for the kubernetes.container.name key path. The extracted value will then be used to set or override the field k8s.container.name in the log.

ec2_metadata(keyStr string)

  • Input: string
  • Output: string

Returns the value of given key from EC2 metadata service. If the key is not found an error is returned.

  - name: transform
    type: log_transform
    transformations:
      - field_path: instance_id
        operation: upsert
        value: ec2_metadata("instance-id")

This configuration allows the log being processed to be enriched with the EC2 instance’s ID, which can be crucial for tracking, organizing, or analyzing data based on where it originated within your AWS infrastructure. In this example:

  • field_name: instance_id: This designates the name of the new field that will be added to the data. Here, instance_id will be the name of the field added.
  • value: ec2_metadata("instance-id"): The value for the new instance_id field is obtained from the ec2_metadata macro, which fetches the value of the instance-id key.

gcp_metadata(key)

  • Input: string
  • Output: string

Returns the value given a key from GCP metadata service. If the key is not found an error is returned.

  - name: transform
    type: log_transform
    transformations:
      - field_path: instance_name
        operation: upsert
        value: gcp_metadata("instance.name")

In this example:

  • field_name: instance_name: Here, a new field called instance_name is specified to be added or updated with the value from the GCP metadata service.
  • value: gcp_metadata("instance.name"): This sets the value of the instance_name field by retrieving the instance name with the "instance.name" key.

merge(firstMap map[string]any, secondMap map[string]any)

  • Input: map[string]any, map[string]string
  • Output: map[string]string

Takes two maps and merges them together. If either map is empty the other will be returned. If both maps are empty, an empty map will be returned. The second map takes precedence if a duplicate field exists.

  - name: transform
    type: log_transform
    transformations:
      - field_path: attributes.tags
        operation: upsert
        value: merge(item["attributes"]["tags"],item["attributes"]["faas"]["tags"])

This example fuses two maps (item["attributes"]["tags"] and item["attributes"]["faas"]["tags"]) into a unified set in the attributes.tags field.