Use CEL Custom Macros

Use CEL Custom Macros to reference log fields.

CEL Macro Overview

There are several Common Expression Language (CEL) custom macros you can use to reference fields, for example in the Enrichment node’s field mappings parameter. Custom macros are defined as extensions to the CEL. As with any general CEL expression, references to fields that don’t exist will return an error. When used in a transformation or mapper, these CEL expressions are handled as best effort and expressions resulting in an error will be replaced with an empty string (“”). The inputs to these functions are CEL field path expressions referring to fields that have the given type.

Consider computational cost when using CEL macros. The macros that involve network calls (e.g., to Kubernetes or EC2/GCP metadata services) or complex regex operations tend to be the most resource-intensive. However, the actual impact will depend on factors like the volume of logs, the environment in which the macros are run, and the complexity of the data being processed.

For more on computational cost, see

CEL Macros

Convert Timestamps

convert_timestamp(input location string, input format string, output format string)

This macro is used to convert timestamps. There are three options:

  • convert between datetime stamp and datetime stamp formats
  • convert a datetime stamp to a unix format
  • convert a unix format to a datetime stamp.

You specify the field location of the timestamp, the current format, and the desired format:

  • input location: Specify the location of the timestamp field using the field path or a regex_capture CEL macro.
  • input format: Provide an example of the format of the current timestamp from the following list. If the format does not match the incoming log’s timestamp format, the processor will fail.

Note: If the input timestamp does not specify a timezone it is assumed to be UTC.

  • output format: Provide an example of the desired format for the timestamp. You can enter an example in one of the following formats, or copy the format you require from this list:
    • “Unix Second”
    • “Unix Milli”
    • “Unix Nano”
    • “2006-01-02”
    • “2006-01-02T15:04:05Z”
    • “2006-01-02T15:04:05”
    • “2006-01-02T15:04:05.000Z”
    • “2006-01-02T15:04:05.000000Z”
    • “2006-01-02T15:04:05.000000000Z”
    • time.RFC1123
    • time.RFC1123Z
    • time.RFC3339
    • time.RFC3339Nano
    • “01/02/06”
    • “15:04”
    • “01/02/2006 15:04”
    • “January 2, 2006”
    • “15:04:05”
    • “January 2, 2006 15:04:05”
    • “January 2, 2006 15:04:05.000”
    • “January 2, 2006 15:04:05.000000”
    • “January 2, 2006 15:04:05.000000000”
    • “Mon, Jan 2, 2006 3:04 PM”
    • “2 January 2006 15:04”
    • “2 Jan 2006 15:04”

Example:

convert_timestamp(item["attributes"]["timestamp"], "2006-01-02T15:04:05.999Z", "Unix Milli")
convert_timestamp(item["attributes"]["timestamp"], "2024-01-02T15:03:06.000Z", "Unix Milli")

Both these examples will create the same configuration because the timestamp examples are in the same format even though they show different datetimes.

See Manage Log Timestamps.

Computational Cost

Medium to high. Depending on the required conversion, it can be computationally intensive.

Return First Non-empty String

first_non_empty(listOfStrs []string)
  • Input: []string
  • Output: string

This macro returns the first non empty string from the input parameters.

Note hardcoded fallback values can not contains commas and the first_non_empty. In addition, this function can’t be nested within other CEL macros. However, you can apply the first_non_empty function, upsert it into the data item, and then apply any other cel macros on that new field.

  - name: transform
    type: log_transform
    transformations:
      - field_path: cluster
        operation: upsert
        value: first_non_empty([env("UNDEFINED_CLUSTER"), env("CLUSTER"), "default-cluster"])

In this example, the macro first checks if env("UNDEFINED_CLUSTER") has a non-empty value. If it does, that value will be used for the cluster field. If env("UNDEFINED_CLUSTER") is empty or not set, the macro will check env("CLUSTER"). If env("CLUSTER") has a non-empty value, that value will be used. Finally, if both environment variables are empty or not set, the hardcoded fallback "default-cluster" will be used as the value for the cluster field.

Computational Cost

Low. It performs basic iteration and string checks.

Determine Whether a Regex Matches

regex_match(input string, regex string) 
  • Input: string, string
  • Output: bool

This macro returns a Boolean value indicating whether or not the input string matches the regex string.

  - name: example_router
    type: route
    paths:
      - path: "pre_elastic"
        condition: regex_match(item["body"], "(?i)ERROR")

In this example, the input string, item["body"] refers to the “body” field of the log being processed. The regex string is the pattern that the input string is tested against. The regex pattern “(?i)ERROR” is used to search for the word “ERROR” in a case-insensitive manner. If this regex match returns true, meaning the word “ERROR” is present in the “body” of the log, then the route “pre_elastic” will be used.

Make sure you properly escape the regex pattern as a string. See Regex as a String.

Computational Cost

Medium. Involves regex operations which are more computationally intensive compared to type conversions or simple lookups.

Return Values using Regex Capture Groups

regex_capture(input string, regexWithCaptureGroups string) 
  • Input: string, string
  • Output: map[string]string

This macro returns one or more parts from the string using regex capture groups. The key for the returned map is the capture group and the value for the map is the value for that capture group.

  - name: transform
    type: log_transform
    transformations:
      - field_path: pod_id
        operation: upsert
        value: regex_capture(item["resource"]["ed.filepath"], "/var/logs/(?P<id>(.+))/.*")["id"]

In this example,

  • The input string (the string from which you want to extract data) item["resource"]["ed.filepath"] represents a nested field within a log that contains a file path.

  • The regexWithCaptureGroups string (the regex pattern containing one or more named capture groups) /var/logs/(?P<id>(.+))/.* has a named capture group id, which will match any characters after /var/logs/ and before the next slash /. Within the named capture group (?P<id>(.+)), defines the name of the group as id, and the (.+) part captures one or more characters. The .+ is greedy, meaning it will match as much text as possible until it reaches the following / character.

  • Output: The output is a map where each key is a capture group name, and its associated value is the substring from the input that the capture group matched. Assuming that item["resource"]["ed.filepath"] contains something like /var/logs/kubernetes_pod123/other_data, the regex_capture function would match kubernetes_pod123 for the capture group id, and the result would be a map with a single key-value pair: {"id": "kubernetes_pod123"}. After applying the function, the log or data item would include a new field, pod_id, which would be populated with the pod identifier kubernetes_pod123 extracted from the file path. The enrichment node utilizes the named capture group id from the regex pattern to obtain and assign this value. Hence, the use of ["id"] at the end of the value field accesses the value associated with the id key in the resulting map.

Make sure you properly escape the regex pattern as a string. See Regex as a String.

Computational Cost

High. Regular expression parsing and extraction of multiple groups can be complex and resource-intensive.

Return Value of Environment Variables

env(envVarKey string) 
  • Input: string
  • Output: string

This macro returns the value from the environment variables, if the environment variable doesn’t exist an empty string will be returned.

  - name: datadog_mapping_node
    type: datadog_mapper
    dd_level: env("LEVEL")

In this example, the value for dd_level is obtained by calling the env macro with “LEVEL”, which retrieves the value of the LEVEL environment variable. For example, if the environment variable LEVEL is set to info, this configuration would result in setting dd_level to info.

Computational Cost

Very low. It’s a straightforward lookup.

Annotate using Contextual Kubernetes Information

from_k8s(podID string, podAttributeName string) 
  • Input: string, string
  • Output: string

This macro is used to annotate log data with contextual Kubernetes information, such as the deployment name or namespace, which can be vital for understanding the source and the context of the logs when analyzing them or looking for issues in your Kubernetes environment. It returns the attribute value of a K8s pod given a pod id, if the pod id is not found this will return an error.

  - name: transform
    type: log_transform
    transformations:
      - field_path: deployment_name
        operation: upsert
        value: from_k8s(item["pod_id"], "k8s.deployment.name")
      - field_path: namespace
        operation: upsert
        value: from_k8s(item["pod_id"], "k8s.namespace.name")     

In this example:

  • podID string: This is the identifier for the Kubernetes pod from which attributes are to be retrieved: item["pod_id"].
  • podAttributeName string: This is the name of the attribute within the Kubernetes pod you want to retrieve. For example, k8s.deployment.name or k8s.namespace.name would retrieve the deployment name and namespace name of the specified pod respectively.
  • Output: The output is the value of the requested Kubernetes pod attribute as a string.

Available Labels

The following Kubernetes labels are available via the from_k8s CEL macro:

k8s.container.name
k8s.node.name
k8s.namespace.name
k8s.pod.name
k8s.statefulset.name
k8s.daemonset.name
k8s.replicaset.name
k8s.job.name
k8s.cronjob.name
k8s.deployment.name
k8s.pod.labels.{labels} (where {labels} are the unique Pod labels in that cluster)

Computational Cost

High. Requires querying the Kubernetes API which can be computationally expensive especially under high load or in a large Kubernetes cluster.

Parse JSON String Into a Map

json(jsonStr string) 
  • Input: string
  • Output: Map[string]any

The macro is used to parse a JSON-formatted string and convert it into a map, where the map’s keys are strings and the values can be of any type (e.g., string, number, boolean, nested map, etc.). If the JSON string can’t be parsed successfully—an error in the JSON formatting, for example—an error will be returned by the macro.

  - name: resource_transform
    type: resource_transform
    source_field_overrides:
      - field: k8s.container.name
        expression: json(item["body"]).kubernetes.container.name

In this example:

  • field: k8s.container.name: This specifies the field within the source data that should be overridden. In this case, it’s the name of the Kubernetes container.
  • expression: json(item["body"]).kubernetes.container.name: This is the transformation logic or expression used to override the specified field. The expression invokes the JSON macro to parse the JSON string from item[“body”]. then the value associated with the key path kubernetes.container.name is accessed within the resulting map.

Suppose the JSON object in item[“body”] has the structure like the following example (parsed for readability):

{
    "kubernetes": {
        "container": {
            "name": "example-container"
        }
    }
}

The JSON macro will parse this JSON string into a map, and the expression will extract the value example-container for the kubernetes.container.name key path. The extracted value will then be used to set or override the field k8s.container.name in the log.

Computational Cost

Medium. Parsing JSON can be computationally intensive depending on the complexity of the JSON structure.

Return EC2 Metadata

ec2_metadata(keyStr string) 
  • Input: string
  • Output: string

This macro returns the value of a given key from the EC2 metadata service. If the key is not found an error is returned.

  - name: transform
    type: log_transform
    transformations:
      - field_path: instance_id
        operation: upsert
        value: ec2_metadata("instance-id")

This configuration allows the log being processed to be enriched with the EC2 instance’s ID, which can be crucial for tracking, organizing, or analyzing data based on where it originated within your AWS infrastructure. In this example:

  • field_name: instance_id: This designates the name of the new field that will be added to the data. Here, instance_id will be the name of the field added.
  • value: ec2_metadata("instance-id"): The value for the new instance_id field is obtained from the ec2_metadata macro, which fetches the value of the instance-id key.

You can specify the following values, which are mapped either to the key or, for the cluster name, to a regex pattern for finding the tag as follows:

CEL value Key or Regex
cluster-name kubernetes.io/cluster/(?P<cluster_name>.*)
ec2launchtemplate-id aws:ec2launchtemplate:id
ec2launchtemplate-version aws:ec2launchtemplate:version
inspector-enabled AwsInspectorEnabled
cluster-autoscaler-enabled k8s.io/cluster-autoscaler/enabled
autoscaling-groupName aws:autoscaling:groupName
nodegroup-name eks:nodegroup-name
ec2-fleet-id aws:ec2:fleet-id

For other metadata, you can use any category string, such as instance-id. See the list of categories here.

Computational Cost

High. Involves network calls to metadata services which are inherently slower and resource-consuming.

Return GCP Metadata

gcp_metadata(key) 
  • Input: string
  • Output: string

This macro returns the value of a given key from the GCP metadata service. If the key is not found an error is returned.

  - name: transform
    type: log_transform
    transformations:
      - field_path: instance_name
        operation: upsert
        value: gcp_metadata("instance.name")

In this example:

  • field_name: instance_name: Here, a new field called instance_name is specified to be added or updated with the value from the GCP metadata service.
  • value: gcp_metadata("instance.name"): This sets the value of the instance_name field by retrieving the instance name with the "instance.name" key.

You can specify the following values, which are mapped to the key as follows:

CEL value Key
id instance/id
name instance/name
tags instance/tags (a comma separated list)
zone instance/zone
hostname instance/hostname (also host name)
attributes.cluster-name instance/attributes/cluster-name

You can also query pre-defined instance level metadata without the instance/ prefix.

Computational Cost

High. Involves network calls to metadata services which are inherently slower and resource-consuming.

Merge Two Maps

merge(firstMap map[string]any, secondMap map[string]any)
  • Input: map[string]any, map[string]string
  • Output: map[string]string

Takes two maps and merges them together. If either map is empty the other will be returned. If both maps are empty, an empty map will be returned. The second map takes precedence if a duplicate field exists.

  - name: transform
    type: log_transform
    transformations:
      - field_path: attributes.tags
        operation: upsert
        value: merge(item["attributes"]["tags"],item["attributes"]["faas"]["tags"])

This example fuses two maps (item["attributes"]["tags"] and item["attributes"]["faas"]["tags"]) into a unified set in the attributes.tags field.

Computational Cost

Medium to high. The complexity depends on the size and structure of the maps.

Convert Strings to Integers

int(input string)
  • Input: string
  • Output: int

This macro converts a string to an integer. If the string is not a valid integer, this will result in a conversion error. Use it to convert string fields that are expected to be numeric to an integer format, such as when using a log to metric node.

nodes:
  - name: log_transform_example
    type: ​​log_transform
    transformations:
    - field_path: attributes.new_status_code_int
      operation: upsert
      value: int(item["attributes"]["new_status_code"])

In this example, the macro converts the new_status_code attribute from a string to an integer and updates or inserts it as the new_status_code_int field.

Computational Cost

Low. Basic type conversion, computational effort is minimal unless performed on a large scale.

Convert Strings to Doubles

double(input string)
  • Input: string
  • Output: double

This macro converts a string to a double precision floating-point number. If the string is not a valid double, this will result in a conversion error. Use it to convert string fields that are expected to be numeric to a double format, such as when using a log to metric node.

nodes:
  - name: log_transform_example
    type: ​​log_transform
    transformations:
    - field_path: attributes.new_latency_double
      operation: upsert
      value: double(item["attributes"]["new_latency"])

In this example, the macro converts the new_latency attribute from a string to a double and updates or inserts it as the new_latency_double field.

Computational Cost

Low. Similar to integer conversion but with slightly more computational overhead due to the floating-point operations.

Convert Values to JSON string

to_json(object)
  • Input: complex objects or maps
  • Output: JSON string

This macro converts serializable objects (e.g., maps, arrays) into a JSON string.

  - name: transform
    type: log_transform
    transformations:
      - field_path: attributes.serialized_event
        operation: upsert
        value: to_json(item["attributes"]["network_event"])

In this example, the to_json macro converts the network_event attribute from a complex object into a JSON-formatted string.

Computational Cost

Medium. Serialization of complex objects to JSON can be resource-consuming.

Apply Math Functions

You can apply basic math functions to number values exposed by a CEL macro. This is useful for number format conversion or other schema alignment transformations. For example:

nodes:
- name: log_transform
  type: log_transform
  transformations:
  - field_path: "attributes.updated_billed_duration_ms"
    operation: "upsert"
    value: item["attributes"]["faas"]["billed_duration_ms"] * 1000

Computational Cost

Medium. Involves arithmetic operations which are generally efficient but can vary based on the complexity of the expression.

See Also: