Use CEL Custom Macros
12 minute read
CEL Macro Overview
There are several Common Expression Language (CEL) custom macros you can use to reference fields, for example in the Enrichment node’s field mappings parameter. Custom macros are defined as extensions to the CEL. As with any general CEL expression, references to fields that don’t exist will return an error. When used in a transformation or mapper, these CEL expressions are handled as best effort and expressions resulting in an error will be replaced with an empty string (“”). The inputs to these functions are CEL field path expressions referring to fields that have the given type.
Consider computational cost when using CEL macros. The macros that involve network calls (e.g., to Kubernetes or EC2/GCP metadata services) or complex regex operations tend to be the most resource-intensive. However, the actual impact will depend on factors like the volume of logs, the environment in which the macros are run, and the complexity of the data being processed.
For more on computational cost, see
CEL Macros
Convert Timestamps
convert_timestamp(input location string, input format string, output format string)
This macro is used to convert timestamps. There are three options:
- convert between datetime stamp and datetime stamp formats
- convert a datetime stamp to a unix format
- convert a unix format to a datetime stamp.
You specify the field location of the timestamp, the current format, and the desired format:
- input location: Specify the location of the timestamp field using the field path or a regex_capture CEL macro.
- input format: Provide an example of the format of the current timestamp from the following list. If the format does not match the incoming log’s timestamp format, the processor will fail.
Note: If the input timestamp does not specify a timezone it is assumed to be UTC.
- output format: Provide an example of the desired format for the timestamp. You can enter an example in one of the following formats, or copy the format you require from this list:
- “Unix Second”
- “Unix Milli”
- “Unix Nano”
- “2006-01-02”
- “2006-01-02T15:04:05Z”
- “2006-01-02T15:04:05”
- “2006-01-02T15:04:05.000Z”
- “2006-01-02T15:04:05.000000Z”
- “2006-01-02T15:04:05.000000000Z”
- time.RFC1123
- time.RFC1123Z
- time.RFC3339
- time.RFC3339Nano
- “01/02/06”
- “15:04”
- “01/02/2006 15:04”
- “January 2, 2006”
- “15:04:05”
- “January 2, 2006 15:04:05”
- “January 2, 2006 15:04:05.000”
- “January 2, 2006 15:04:05.000000”
- “January 2, 2006 15:04:05.000000000”
- “Mon, Jan 2, 2006 3:04 PM”
- “2 January 2006 15:04”
- “2 Jan 2006 15:04”
Example:
convert_timestamp(item["attributes"]["timestamp"], "2006-01-02T15:04:05.999Z", "Unix Milli")
convert_timestamp(item["attributes"]["timestamp"], "2024-01-02T15:03:06.000Z", "Unix Milli")
Both these examples will create the same configuration because the timestamp examples are in the same format even though they show different datetimes.
Computational Cost
Medium to high. Depending on the required conversion, it can be computationally intensive.
Return First Non-empty String
first_non_empty(listOfStrs []string)
- Input: []string
- Output: string
This macro returns the first non empty string from the input parameters.
Note hardcoded fallback values can not contains commas and the
first_non_empty
. In addition, this function can’t be nested within other CEL macros. However, you can apply thefirst_non_empty
function,upsert
it into the data item, and then apply any other cel macros on that new field.
- name: transform
type: log_transform
transformations:
- field_path: cluster
operation: upsert
value: first_non_empty([env("UNDEFINED_CLUSTER"), env("CLUSTER"), "default-cluster"])
In this example, the macro first checks if env("UNDEFINED_CLUSTER")
has a non-empty value. If it does, that value will be used for the cluster
field. If env("UNDEFINED_CLUSTER")
is empty or not set, the macro will check env("CLUSTER")
. If env("CLUSTER")
has a non-empty value, that value will be used. Finally, if both environment variables are empty or not set, the hardcoded fallback "default-cluster"
will be used as the value for the cluster field.
Computational Cost
Low. It performs basic iteration and string checks.
Determine Whether a Regex Matches
regex_match(input string, regex string)
- Input: string, string
- Output: bool
This macro returns a Boolean value indicating whether or not the input string matches the regex string.
- name: example_router
type: route
paths:
- path: "pre_elastic"
condition: regex_match(item["body"], "(?i)ERROR")
In this example, the input string, item["body"]
refers to the “body” field of the log being processed. The regex string is the pattern that the input string is tested against. The regex pattern “(?i)ERROR” is used to search for the word “ERROR” in a case-insensitive manner. If this regex match returns true, meaning the word “ERROR” is present in the “body” of the log, then the route “pre_elastic” will be used.
Make sure you properly escape the regex pattern as a string. See Regex as a String.
Computational Cost
Medium. Involves regex operations which are more computationally intensive compared to type conversions or simple lookups.
Return Values using Regex Capture Groups
regex_capture(input string, regexWithCaptureGroups string)
- Input: string, string
- Output: map[string]string
This macro returns one or more parts from the string using regex capture groups. The key for the returned map is the capture group and the value for the map is the value for that capture group.
- name: transform
type: log_transform
transformations:
- field_path: pod_id
operation: upsert
value: regex_capture(item["resource"]["ed.filepath"], "/var/logs/(?P<id>(.+))/.*")["id"]
In this example,
-
The input string (the string from which you want to extract data)
item["resource"]["ed.filepath"]
represents a nested field within a log that contains a file path. -
The regexWithCaptureGroups string (the regex pattern containing one or more named capture groups)
/var/logs/(?P<id>(.+))/.*
has a named capture groupid
, which will match any characters after/var/logs/
and before the next slash/
. Within the named capture group(?P<id>(.+))
, defines the name of the group asid
, and the(.+)
part captures one or more characters. The.+
is greedy, meaning it will match as much text as possible until it reaches the following/
character. -
Output: The output is a map where each key is a capture group name, and its associated value is the substring from the input that the capture group matched. Assuming that
item["resource"]["ed.filepath"]
contains something like/var/logs/kubernetes_pod123/other_data
, the regex_capture function would matchkubernetes_pod123
for the capture groupid
, and the result would be a map with a single key-value pair:{"id": "kubernetes_pod123"}
. After applying the function, the log or data item would include a new field,pod_id
, which would be populated with the pod identifierkubernetes_pod123
extracted from the file path. The enrichment node utilizes the named capture groupid
from the regex pattern to obtain and assign this value. Hence, the use of["id"]
at the end of the value field accesses the value associated with theid
key in the resulting map.
Make sure you properly escape the regex pattern as a string. See Regex as a String.
Computational Cost
High. Regular expression parsing and extraction of multiple groups can be complex and resource-intensive.
Return Value of Environment Variables
env(envVarKey string)
- Input: string
- Output: string
This macro returns the value from the environment variables, if the environment variable doesn’t exist an empty string will be returned.
- name: datadog_mapping_node
type: datadog_mapper
dd_level: env("LEVEL")
In this example, the value for dd_level
is obtained by calling the env macro with “LEVEL”, which retrieves the value of the LEVEL environment variable. For example, if the environment variable LEVEL
is set to info
, this configuration would result in setting dd_level
to info
.
Computational Cost
Very low. It’s a straightforward lookup.
Annotate using Contextual Kubernetes Information
from_k8s(podID string, podAttributeName string)
- Input: string, string
- Output: string
This macro is used to annotate log data with contextual Kubernetes information, such as the deployment name or namespace, which can be vital for understanding the source and the context of the logs when analyzing them or looking for issues in your Kubernetes environment. It returns the attribute value of a K8s pod given a pod id, if the pod id is not found this will return an error.
- name: transform
type: log_transform
transformations:
- field_path: deployment_name
operation: upsert
value: from_k8s(item["pod_id"], "k8s.deployment.name")
- field_path: namespace
operation: upsert
value: from_k8s(item["pod_id"], "k8s.namespace.name")
In this example:
- podID string: This is the identifier for the Kubernetes pod from which attributes are to be retrieved:
item["pod_id"]
. - podAttributeName string: This is the name of the attribute within the Kubernetes pod you want to retrieve. For example,
k8s.deployment.name
ork8s.namespace.name
would retrieve the deployment name and namespace name of the specified pod respectively. - Output: The output is the value of the requested Kubernetes pod attribute as a string.
Available Labels
The following Kubernetes labels are available via the from_k8s
CEL macro:
k8s.container.name
k8s.node.name
k8s.namespace.name
k8s.pod.name
k8s.statefulset.name
k8s.daemonset.name
k8s.replicaset.name
k8s.job.name
k8s.cronjob.name
k8s.deployment.name
k8s.pod.labels.{labels} (where {labels} are the unique Pod labels in that cluster)
Computational Cost
High. Requires querying the Kubernetes API which can be computationally expensive especially under high load or in a large Kubernetes cluster.
Parse JSON String Into a Map
json(jsonStr string)
- Input: string
- Output: Map[string]any
The macro is used to parse a JSON-formatted string and convert it into a map, where the map’s keys are strings and the values can be of any type (e.g., string, number, boolean, nested map, etc.). If the JSON string can’t be parsed successfully—an error in the JSON formatting, for example—an error will be returned by the macro.
- name: resource_transform
type: resource_transform
source_field_overrides:
- field: k8s.container.name
expression: json(item["body"]).kubernetes.container.name
In this example:
field: k8s.container.name
: This specifies the field within the source data that should be overridden. In this case, it’s the name of the Kubernetes container.expression: json(item["body"]).kubernetes.container.name
: This is the transformation logic or expression used to override the specified field. The expression invokes the JSON macro to parse the JSON string from item[“body”]. then the value associated with the key pathkubernetes.container.name
is accessed within the resulting map.
Suppose the JSON object in item[“body”] has the structure like the following example (parsed for readability):
{
"kubernetes": {
"container": {
"name": "example-container"
}
}
}
The JSON macro will parse this JSON string into a map, and the expression will extract the value example-container
for the kubernetes.container.name
key path. The extracted value will then be used to set or override the field k8s.container.name
in the log.
Computational Cost
Medium. Parsing JSON can be computationally intensive depending on the complexity of the JSON structure.
Return EC2 Metadata
ec2_metadata(keyStr string)
- Input: string
- Output: string
This macro returns the value of a given key from the EC2 metadata service. If the key is not found an error is returned.
- name: transform
type: log_transform
transformations:
- field_path: instance_id
operation: upsert
value: ec2_metadata("instance-id")
This configuration allows the log being processed to be enriched with the EC2 instance’s ID, which can be crucial for tracking, organizing, or analyzing data based on where it originated within your AWS infrastructure. In this example:
field_name: instance_id
: This designates the name of the new field that will be added to the data. Here, instance_id will be the name of the field added.value: ec2_metadata("instance-id")
: The value for the new instance_id field is obtained from the ec2_metadata macro, which fetches the value of the instance-id key.
You can specify the following values, which are mapped either to the key or, for the cluster name, to a regex pattern for finding the tag as follows:
CEL value | Key or Regex |
---|---|
cluster-name | kubernetes.io/cluster/(?P<cluster_name>.*) |
ec2launchtemplate-id | aws:ec2launchtemplate:id |
ec2launchtemplate-version | aws:ec2launchtemplate:version |
inspector-enabled | AwsInspectorEnabled |
cluster-autoscaler-enabled | k8s.io/cluster-autoscaler/enabled |
autoscaling-groupName | aws:autoscaling:groupName |
nodegroup-name | eks:nodegroup-name |
ec2-fleet-id | aws:ec2:fleet-id |
For other metadata, you can use any category string, such as instance-id
. See the list of categories here.
Computational Cost
High. Involves network calls to metadata services which are inherently slower and resource-consuming.
Return GCP Metadata
gcp_metadata(key)
- Input: string
- Output: string
This macro returns the value of a given key from the GCP metadata service. If the key is not found an error is returned.
- name: transform
type: log_transform
transformations:
- field_path: instance_name
operation: upsert
value: gcp_metadata("instance.name")
In this example:
field_name: instance_name
: Here, a new field calledinstance_name
is specified to be added or updated with the value from the GCP metadata service.value: gcp_metadata("instance.name")
: This sets the value of theinstance_name
field by retrieving the instance name with the"instance.name"
key.
You can specify the following values, which are mapped to the key as follows:
CEL value | Key |
---|---|
id | instance/id |
name | instance/name |
tags | instance/tags (a comma separated list) |
zone | instance/zone |
hostname | instance/hostname (also host name) |
attributes.cluster-name | instance/attributes/cluster-name |
You can also query pre-defined instance level metadata without the instance/
prefix.
Computational Cost
High. Involves network calls to metadata services which are inherently slower and resource-consuming.
Merge Two Maps
merge(firstMap map[string]any, secondMap map[string]any)
- Input: map[string]any, map[string]string
- Output: map[string]string
Takes two maps and merges them together. If either map is empty the other will be returned. If both maps are empty, an empty map will be returned. The second map takes precedence if a duplicate field exists.
- name: transform
type: log_transform
transformations:
- field_path: attributes.tags
operation: upsert
value: merge(item["attributes"]["tags"],item["attributes"]["faas"]["tags"])
This example fuses two maps (item["attributes"]["tags"]
and item["attributes"]["faas"]["tags"]
) into a unified set in the attributes.tags
field.
Computational Cost
Medium to high. The complexity depends on the size and structure of the maps.
Convert Strings to Integers
int(input string)
- Input: string
- Output: int
This macro converts a string to an integer. If the string is not a valid integer, this will result in a conversion error. Use it to convert string fields that are expected to be numeric to an integer format, such as when using a log to metric node.
nodes:
- name: log_transform_example
type: log_transform
transformations:
- field_path: attributes.new_status_code_int
operation: upsert
value: int(item["attributes"]["new_status_code"])
In this example, the macro converts the new_status_code
attribute from a string to an integer and updates or inserts it as the new_status_code_int
field.
Computational Cost
Low. Basic type conversion, computational effort is minimal unless performed on a large scale.
Convert Strings to Doubles
double(input string)
- Input: string
- Output: double
This macro converts a string to a double precision floating-point number. If the string is not a valid double, this will result in a conversion error. Use it to convert string fields that are expected to be numeric to a double format, such as when using a log to metric node.
nodes:
- name: log_transform_example
type: log_transform
transformations:
- field_path: attributes.new_latency_double
operation: upsert
value: double(item["attributes"]["new_latency"])
In this example, the macro converts the new_latency
attribute from a string to a double and updates or inserts it as the new_latency_double
field.
Computational Cost
Low. Similar to integer conversion but with slightly more computational overhead due to the floating-point operations.
Convert Values to JSON string
to_json(object)
- Input: complex objects or maps
- Output: JSON string
This macro converts serializable objects (e.g., maps, arrays) into a JSON string.
- name: transform
type: log_transform
transformations:
- field_path: attributes.serialized_event
operation: upsert
value: to_json(item["attributes"]["network_event"])
In this example, the to_json
macro converts the network_event
attribute from a complex object into a JSON-formatted string.
Computational Cost
Medium. Serialization of complex objects to JSON can be resource-consuming.
Apply Math Functions
You can apply basic math functions to number values exposed by a CEL macro. This is useful for number format conversion or other schema alignment transformations. For example:
nodes:
- name: log_transform
type: log_transform
transformations:
- field_path: "attributes.updated_billed_duration_ms"
operation: "upsert"
value: item["attributes"]["faas"]["billed_duration_ms"] * 1000
Computational Cost
Medium. Involves arithmetic operations which are generally efficient but can vary based on the complexity of the expression.