OTTL Editor Functions in Edge Delta
17 minute read
OTTL Editor Functions
Editors transform telemetry data. They are meant to modify the underlying telemetry data by applying various functions.
append
This function is used to append values to a specified target. However, the resulting field is always of type pcommon.Slice
. The transformation of scalar values into arrays or appending different types into a common slice could lead to inconsistencies and potential issues in processing or data integrity.
Instead of append
, consider using more controlled functions like set
that allow you to add or modify values without changing the underlying structure of the existing data fields. These functions typically allow you to maintain key-value pairs, which might be more predictable and less prone to causing unintentional schema changes.
delete_key
This function is used to remove a specified key and its associated value from a target field or object within a log entry. It is useful for cleaning up logs by eliminating unnecessary or sensitive information. If a batch of keys need to be deleted in a single operation you can either delete them using a regex pattern delete_matching_keys or, if no regex pattern covers them all, use the Edge Delta extension function edx_delete_keys.
Syntax: delete_key(target, key)
- Target: The target refers to the field containing the key you want to delete. It is typically a parent container, such as a JSON object or an associative array, which holds multiple key-value pairs.
- Key: The key is the specific identifier within the target that you wish to delete. It refers to the name of the field entry that you intend to remove, along with its associated value.
Input
{
"_type": "log"
"body": "..."
"resource": {
"container.id": "123456789"
"container.image.name": "docker.io/edgedelta/loggen:latest"
"ed.conf.id": "123456789"
"ed.domain": "pipeline"
"ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
"ed.org.id": "987654321"
"ed.source.name": "Kubernetes Source"
"ed.source.type": "kubernetes_input"
"ed.tag": "loggen"
"host.ip": "172.18.0.5"
"host.name": "loggencluster-worker2"
"k8s.container.name": "loggen"
"k8s.deployment.name": "loggen"
"k8s.namespace.name": "loggenlogs"
"k8s.node.name": "loggencluster-worker2"
"k8s.pod.name": "loggen-7cc748d75-xh8lq"
"k8s.pod.uid": "123456789"
"k8s.replicaset.name": "loggen-7cc748d75"
"service.name": "loggen"
"src_type": "K8s"
}
"timestamp": 1733369799254
}
Statement
delete_key(resource, "service.name")
Output
{
"_type": "log"
"body": "..."
"resource": {
"container.id": "123456789"
"container.image.name": "docker.io/edgedelta/loggen:latest"
"ed.conf.id": "123456789"
"ed.domain": "pipeline"
"ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
"ed.org.id": "987654321"
"ed.source.name": "Kubernetes Source"
"ed.source.type": "kubernetes_input"
"ed.tag": "loggen"
"host.ip": "172.18.0.5"
"host.name": "loggencluster-worker2"
"k8s.container.name": "loggen"
"k8s.deployment.name": "loggen"
"k8s.namespace.name": "loggenlogs"
"k8s.node.name": "loggencluster-worker2"
"k8s.pod.name": "loggen-7cc748d75-xh8lq"
"k8s.pod.uid": "123456789"
"k8s.replicaset.name": "loggen-7cc748d75"
"src_type": "K8s"
}
"timestamp": 1733369799254
}
The log entry is modified to exclude the service.name
field from resource
, leaving the remaining entries in the resource
object.
delete_matching_keys
This function is used to remove keys from a specified target that match a given pattern. It’s useful for eliminating multiple entries based on pattern matching, helping to clean up or censor log entries by dynamically selecting keys. If a batch of keys need to be deleted in a single operation and no regex pattern covers them all, use an Edge Delta extension function edx_delete_keys or edx_delete_matching_keys.
Syntax: delete_matching_keys(target, pattern)
- Target: The
target
refers to the field or object within the log entry from which you wish to remove keys. It typically points to a parent container, such as a JSON object or associative array, which holds multiple key-value pairs. - Pattern: The
pattern
is a regular expression that specifies the keys to be removed. Keys that match this pattern will be deleted along with their associated values.
Input
{
"_type": "log"
"body": "..."
"resource": {
"container.id": "123456789"
"container.image.name": "docker.io/edgedelta/loggen:latest"
"ed.conf.id": "123456789"
"ed.domain": "pipeline"
"ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
"ed.org.id": "987654321"
"ed.source.name": "Kubernetes Source"
"ed.source.type": "kubernetes_input"
"ed.tag": "loggen"
"host.ip": "172.18.0.5"
"host.name": "loggencluster-worker2"
"k8s.container.name": "loggen"
"k8s.deployment.name": "loggen"
"k8s.namespace.name": "loggenlogs"
"k8s.node.name": "loggencluster-worker2"
"k8s.pod.name": "loggen-7cc748d75-xh8lq"
"k8s.pod.uid": "123456789"
"k8s.replicaset.name": "loggen-7cc748d75"
"service.name": "loggen"
"src_type": "K8s"
}
"timestamp": 1733375091978
}
Statement
delete_matching_keys(resource, ".*\\.name$")
See Understand Escaping Characters.
Output
{
"_type": "log"
"body": "..."
"resource": {
"container.id": "123456789"
"ed.conf.id": "123456789"
"ed.domain": "pipeline"
"ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
"ed.org.id": "987654321"
"ed.source.type": "kubernetes_input"
"ed.tag": "loggen"
"host.ip": "172.18.0.5"
"k8s.pod.uid": "123456789"
"src_type": "K8s"
}
"timestamp": 1733375091978
}
The log entry is edited to remove any keys in the resource
object matching the pattern .*\\.name$
, while preserving other fields within the resource
.
flatten
This function is used to convert nested structures within a log entry into a flat format, typically by turning nested paths into a single level of key-value pairs. It’s particularly useful for simplifying data access and storage when dealing with complex nested structures.
Syntax: flatten(target)
- Target: The
target
is the field or object within the log entry that you wish to flatten. It often involves nested JSON objects or arrays which are to be transformed into a simpler structure.
Imagine the agent ingests this log message:
{"eventVersion": "1.08", "userIdentity": {"type": "AssumedRole", "invokedBy": "lambda.amazonaws.com"}, "eventTime": "2024-12-05T05:10:57.227003Z", "eventSource": "ec2.amazonaws.com", "eventName": "ListStacks", "awsRegion": "us-west-2", "sourceIPAddress": "211.46.216.146", "userAgent": "ec2.amazonaws.com", "requestParameters": {}, "responseElements": {"credentials": {"accessKeyId": "A1B2C3D4E5F6G7H8I9J0", "expiration": "2024-12-05T05:10:57.227053Z", "sessionToken": "123456789876"}, "assumedRoleUser": {"assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck", "arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck"}}, "requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst", "eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst", "readOnly": "true", "resources": [{"accountId": 123456789012, "type": "AWS::IAM::Role", "ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole"}], "eventType": "AwsApiCall", "managementEvent": "true", "recipientAccountId": 123456789012, "sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f", "eventCategory": "Management"}
Bear in mind the log is escaped when ingested. See Understand Escaping Characters.
In this example, assume the following OTTL statements have been executed on the log:
set(attributes["decoded_body"], Decode(body, "utf-8"))
set(attributes["parsed_body"], ParseJSON(attributes["decoded_body"]))
To start, the body was decoded from a byte array. See Working with the body for more information about decoding the body.
Next the JSON object was parsed into nested key value pairs. Now the log is ready to be flattened.
Input
{
"_type": "log",
"attributes": {
"decoded_body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
"parsed_body": {
"awsRegion": "us-west-2",
"eventCategory": "Management",
"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
"eventName": "ListStacks",
"eventSource": "ec2.amazonaws.com",
"eventTime": "2024-12-05T05:10:57.227003Z",
"eventType": "AwsApiCall",
"eventVersion": "1.08",
"managementEvent": "true",
"readOnly": "true",
"recipientAccountId": 123456789012,
"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
"requestParameters": {},
"resources": [
{
"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
"accountId": 123456789012,
"type": "AWS::IAM::Role"
}
],
"responseElements": {
"assumedRoleUser": {
"arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
"assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck"
},
"credentials": {
"accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
"expiration": "2024-12-05T05:10:57.227053Z",
"sessionToken": "123456789876"
}
},
"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
"sourceIPAddress": "211.46.216.146",
"userAgent": "ec2.amazonaws.com",
"userIdentity": {
"invokedBy": "lambda.amazonaws.com",
"type": "AssumedRole"
}
}
},
"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
"resource": {...},
"timestamp": 1733376154621
}
Statement
flatten(attributes["parsed_body"])
Output
{
"_type": "log",
"attributes": {
"decoded_body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
"parsed_body": {
"awsRegion": "us-west-2",
"eventCategory": "Management",
"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
"eventName": "ListStacks",
"eventSource": "ec2.amazonaws.com",
"eventTime": "2024-12-05T05:10:57.227003Z",
"eventType": "AwsApiCall",
"eventVersion": "1.08",
"managementEvent": "true",
"readOnly": "true",
"recipientAccountId": 123456789012,
"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
"resources.0": {
"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
"accountId": 123456789012,
"type": "AWS::IAM::Role"
},
"responseElements.assumedRoleUser.arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
"responseElements.assumedRoleUser.assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck",
"responseElements.credentials.accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
"responseElements.credentials.expiration": "2024-12-05T05:10:57.227053Z",
"responseElements.credentials.sessionToken": "123456789876",
"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
"sourceIPAddress": "211.46.216.146",
"userAgent": "ec2.amazonaws.com",
"userIdentity.invokedBy": "lambda.amazonaws.com",
"userIdentity.type": "AssumedRole"
}
},
"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
"resource": {...},
"timestamp": 1733376208046
}
In this example, the flatten
function is used to transform the nested JSON structure in the parsed_body
attribute of the log into a single level of key-value pairs. The output demonstrates how each nested level is converted to a flat format by appending the parent keys as prefixes to their child attributes. This transformation reduces the complexity of accessing deeply nested data. For instance, nested elements such as userIdentity
which contains subfields like type
and invokedBy
, are transformed to userIdentity.type
and userIdentity.invokedBy
respectively in the flattened output.
keep_keys
This function is used to retain specified keys within a target field or object in a log entry. It provides a precise method for filtering log data by keeping only the entries that match the specified keys. Unlike keep_matching_keys
, this function requires an explicit list of keys to retain. It is best when you know the exact keys you want to keep ahead of time. The keep_matching_keys
function uses a pattern or regular expression to determine which keys to retain, making it more flexible and powerful for dynamic or large datasets where the exact keys might not be known.
Syntax: keep_keys(target, keys)
- Target: The
target
is the field or object within the log entry you wish to filter. It typically points to a JSON object or associative array with multiple key-value pairs. - Keys: The
keys
parameter is an array of specific keys you want to retain in the target. Only these keys will be kept, and all others will be removed.
Input
{
"_type": "log",
"attributes": {
"decoded_body": "...",
"parsed_body": {
"awsRegion": "us-west-2",
"eventCategory": "Management",
"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
"eventName": "ListStacks",
"eventSource": "ec2.amazonaws.com",
"eventTime": "2024-12-05T05:10:57.227003Z",
"eventType": "AwsApiCall",
"eventVersion": "1.08",
"managementEvent": "true",
"readOnly": "true",
"recipientAccountId": 123456789012,
"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
"resources.0": {
"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
"accountId": 123456789012,
"type": "AWS::IAM::Role"
},
"responseElements.assumedRoleUser.arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
"responseElements.assumedRoleUser.assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck",
"responseElements.credentials.accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
"responseElements.credentials.expiration": "2024-12-05T05:10:57.227053Z",
"responseElements.credentials.sessionToken": "123456789876",
"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
"sourceIPAddress": "211.46.216.146",
"userAgent": "ec2.amazonaws.com",
"userIdentity.invokedBy": "lambda.amazonaws.com",
"userIdentity.type": "AssumedRole"
}
},
"body": "...",
"resource": {...},
"timestamp": 1733377758772
}
Statement
keep_keys(attributes["parsed_body"], keys=["eventCategory", "eventName"])
Output
{
"_type": "log",
"attributes": {
"decoded_body": "...",
"parsed_body": {
"eventCategory": "Management",
"eventName": "ListStacks"
}
},
"body": "...",
"resource": {...},
"timestamp": 1733377774908
}
The log entry is updated to retain only the specified eventCategory
and eventName
within the attributes["parsed_body"]
object.
keep_matching_keys
This function is used to retain only the keys from a specified target that match a given pattern. It helps streamline data by keeping desired entries and removing non-matching ones.
Syntax: keep_matching_keys(target, pattern)
- Target: The
target
refers to the field or object within the log entry where you want to retain keys. It usually specifies a parent container, like a JSON object or associative array, which holds multiple key-value pairs. - Pattern: The
pattern
is a regular expression designating which keys to keep within the target. Only keys that match this pattern will be retained along with their associated values.
Input
{
"_type": "log",
"attributes": {
"parsed_body": {
"awsRegion": "us-west-2",
"eventCategory": "Management",
"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
"eventName": "ListStacks",
"eventSource": "ec2.amazonaws.com",
"eventTime": "2024-12-05T05:10:57.227003Z",
"eventType": "AwsApiCall",
"eventVersion": "1.08",
"managementEvent": "true",
"readOnly": "true",
"recipientAccountId": 123456789012,
"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
"resources.0": {
"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
"accountId": 123456789012,
"type": "AWS::IAM::Role"
},
"responseElements.assumedRoleUser.arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
"responseElements.assumedRoleUser.assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck",
"responseElements.credentials.accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
"responseElements.credentials.expiration": "2024-12-05T05:10:57.227053Z",
"responseElements.credentials.sessionToken": "123456789876",
"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
"sourceIPAddress": "211.46.216.146",
"userAgent": "ec2.amazonaws.com",
"userIdentity.invokedBy": "lambda.amazonaws.com",
"userIdentity.type": "AssumedRole"
}
},
"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
"resource": {...},
"timestamp": 1733378455697
}
Statement
keep_matching_keys(attributes["parsed_body"], "^event.*")
Output
{
"_type": "log",
"attributes": {
"parsed_body": {
"eventCategory": "Management",
"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
"eventName": "ListStacks",
"eventSource": "ec2.amazonaws.com",
"eventTime": "2024-12-05T05:10:57.227003Z",
"eventType": "AwsApiCall",
"eventVersion": "1.08"
}
},
"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
"resource": {...},
"timestamp": 1733378478884
}
The log entry retains only the keys within the attributes["parsed_body"]
object that match the pattern ^event.*
, for example, keeping the eventCategory
and eventID
keys, while removing all others.
limit
The limit
function reduces the number of elements in a pcommon.Map
to be no greater than the specified limit. It ensures key-value pairs are restricted to a specific count while maintaining essential, prioritized keys.
Syntax: limit(target, limit, priority_keys[])
- Target: The
target
is a path expression to apcommon.Map
type field within the log entry to be limited. - Limit: A non-negative integer specifying the maximum number of items to retain in the map.
- Priority Keys: A list of strings indicating attribute keys that should not be dropped when limiting. These keys ensure that critical data is preserved.
Input
{
"_type": "log",
"attributes": {...},
"body": "...",
"resource": {
"ed.conf.id": "123456789",
"ed.domain": "pipeline",
"ed.org.id": "987654321",
"ed.source.name": "Kubernetes Source",
"ed.source.type": "memory_input",
"ed.tag": "loggen",
"host.ip": "10.0.0.1",
"host.name": "ED_TEST",
"service.name": "ed-tester",
"src_type": null
},
"timestamp": 1733378647488
}
Statement
limit(resource, 5, ["ed.org.id", "ed.tag", "ed.conf.id"])
Output
{
"_type": "log",
"attributes": {...},
"body": "...",
"resource": {
"ed.conf.id": "123456789",
"ed.org.id": "987654321",
"ed.source.name": "Kubernetes Source",
"ed.tag": "loggen",
"host.name": "ED_TEST"
},
"timestamp": 1733378762172
}
The resource
field is limited to 5 entries, retaining specified priority keys ed.org.id
, ed.tag
, and ed.conf.id
while randomly selecting two additional entries until the limit is reached.
merge_maps
The merge_maps
function combines key-value pairs from a source map into a target map. Using strategies such as insert
, upsert
, or update
, this function allows for flexible handling of key conflicts and integration scenarios.
Syntax: merge_maps(target, source, strategy)
- Target: The
target
is the map where new entries will be merged or updated. - Source: The
source
is another map containing entries to be merged into the target. - Strategy: The merging strategy defines how key conflicts are handled:
insert
: Adds entries only if they do not exist in the target.upsert
: Adds entries from the source and updates existing matching keys.update
: Only updates existing keys in the target if they appear in the source.
Input
{
"_type": "log",
"attributes": {
"parsed_body": {
"ed.tag": "log-gen",
"eventCategory": "Management",
"eventName": "ListStacks"
}
},
"body": "...",
"resource": {
"ed.conf.id": "123456789",
"ed.org.id": "987654321",
"ed.source.type": "memory_input",
"ed.tag": "loggen",
"service.name": "ed-tester"
},
"timestamp": 1733380292966
}
Statement
merge_maps(resource, attributes["parsed_body"], "upsert")
Output
{
"_type": "log",
"attributes": {
"parsed_body": {
"ed.tag": "log-gen",
"eventCategory": "Management",
"eventName": "ListStacks"
}
},
"body": "...",
"resource": {
"ed.conf.id": "123456789",
"ed.org.id": "987654321",
"ed.tag": "log-gen",
"eventCategory": "Management",
"eventName": "ListStacks",
"src_type": null
},
"timestamp": 1733380313647
}
The function merges attributes["parsed_body"]
into resource
using the upsert
strategy, updating keys from the source or adding them if they don’t exist. For example, ed.tag
in resources
was updated with the value from attributes
.
replace_match
The replace_match
function is used to replace entire strings when they match a specified glob pattern. It simplifies transforming telemetry data by matching exact pattern formats and substituting them with desired strings.
Syntax: replace_match(target, pattern, replacement, function, ReplacementFormat)
- Target: A path expression to a telemetry field that needs to be checked against the pattern.
- Pattern: A filepath match string, which defines the criteria for a match.
- Replacement: The string or path expression to a string telemetry field that will replace any match.
- Function: (Optional) An optional converter function that applies to the replacement string, allowing customization like hashing.
- ReplacementFormat: (Optional) Specifies the formatting pattern for the replacement, demanding exactly one
%s
placeholder for its content.
Input
{
"_type": "log",
"body": "...",
"resource": {
"ed.conf.id": "123456789",
"ed.org.id": "987654321",
"ed.tag": "ed-dev-alb-logs-v3",
"host.ip": "10.151.135.237",
"host.name": "default-deployment-5c69f64d9-78wvm",
"messaging.system": "s3_sqs",
"service.name": "s3-sqs-s3_input",
"src_type": "s3_sqs"
},
"timestamp": 1730511053177
}
Statement
replace_match(resource["host.name"], "default-deployment-*", "anonymized-host")
Output
{
"_type": "log",
"body": "...",
"resource": {
"ed.conf.id": "123456789",
"ed.org.id": "987654321",
"ed.tag": "ed-dev-alb-logs-v3",
"host.ip": "10.151.135.237",
"host.name": "anonymized-host",
"messaging.system": "s3_sqs",
"service.name": "s3-sqs-s3_input",
"src_type": "s3_sqs"
},
"timestamp": 1730511053177
}
In this example, the host.name
in the resource
object matches the pattern "default-deployment-*"
and is replaced with "anonymized-host"
, effectively anonymizing the host’s identity within logs.
replace_all_matches
The replace_all_matches
function is used to replace any matching string value within a map type field with a specified replacement string. It is particularly useful for anonymizing or reformatting structured data within log entries.
Syntax: replace_all_matches(target, pattern, replacement, Function, replacementFormat)
- Target: The
target
is a path expression to a map type field, indicating where the replacement should occur. - Pattern: The
pattern
is a string using to identify strings for replacement. - Replacement: The
replacement
is the string or path expression that will replace each match found. - Function (Optional): An optional converter function applied to the replacement string, such as a hash function.
- ReplacementFormat (Optional): An optional string format that must contain exactly one
%s
specifier for formatted replacements.
Input
{
"_type": "log",
"attributes": {
"action": "/user/1234/list/5678",
"details": "User 1234 performed an operation on list 5678. (/user/1234/list/5678)",
"url": "/user/1234/list/5678"
},
"body": "...",
"resource": {...},
"timestamp": 1733437963210
}
Statement
replace_all_matches(attributes, "/user/*/list/*", "/user/{userId}/list/{listId}")
Output
{
"_type": "log",
"attributes": {
"action": "/user/{userId}/list/{listId}",
"details": "User 1234 performed an operation on list 5678. (/user/1234/list/5678)",
"url": "/user/{userId}/list/{listId}"
},
"body": "...",
"resource": {...},
"timestamp": 1733437980591
}
The function replaces user and list IDs in the url
field and the action
field using generalized placeholders, without altering other fields like details
.
replace_pattern
The replace_pattern
function is used to replace parts of a single string field within a specified target that matches a regex pattern.
Syntax: replace_pattern(target, regex, replacement, function, replacementFormat)
- Target: A path expression pointing to a telemetry field that is subject to pattern matching and substitution.
- Regex: A string denoting the regular expression pattern for finding matching substrings.
- Replacement: The string or path expression to a string telemetry field that will replace matching segments.
- Function: (Optional) An optional transformation function that is applied to the replacement string, such as hashing.
- ReplacementFormat: (Optional) Specifies the format for the replacement, requiring exactly one
%s
format specifier.
Input
{
"_type": "log",
"attributes": {
"decoded_body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"ip_addresses": "192.168.1.45, 10.10.10.10, 172.16.0.1"
},
"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"resource": {...},
"timestamp": 1733438810208
}
Statement
replace_pattern(attributes["decoded_body"], "192\\.168\\.1\\.\\d+", "192.168.1.xxx")
See Understand Escaping Characters.
Output
{
"_type": "log",
"attributes": {
"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
"ip_addresses": "192.168.1.45, 10.10.10.10, 172.16.0.1"
},
"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"resource": {...},
"timestamp": 1733438843817
}
In this example, the decoded_body
field contains an IP address that is matched against the pattern "192\.168\.1\.\d+"
and replaced with "192.168.1.xxx"
. The ip_addresses
field remains unaffected.
replace_all_patterns
The replace_all_patterns
function is used to substitute parts of all string values or keys within a map that conform to a specific regex pattern with a new string.
Syntax: replace_all_patterns(target, mode, regex, replacement, function, replacementFormat)
- Target: The path expression to a map type field which contains the data to be transformed.
- Mode: Specifies whether replacements are applied to the map’s key or value. Acceptable options are
key
orvalue
. - Regex: The regular expression pattern that identifies what portions of the target should be replaced.
- Replacement: This is the new string that replaces matched segments and can reference matched groups using a specific syntax.
- Function: (Optional) An optional converter function that processes the replacement string.
- ReplacementFormat: (Optional) Specifies a formatting pattern for replacements including a
%s
placeholder for the main replacement content.
Input
{
"_type": "log",
"attributes": {
"decoded_body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"ip_addresses": "192.168.1.45, 10.10.10.10, 172.16.0.1"
},
"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"resource": {...},
"timestamp": 1733438978059
}
Statement
replace_all_patterns(attributes, "value", "192\\.168\\.1\\.\\d+", "192.168.1.xxx")
See Understand Escaping Characters.
Output
{
"_type": "log",
"attributes": {
"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
"ip_addresses": "192.168.1.xxx, 10.10.10.10, 172.16.0.1"
},
"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"resource": {...},
"timestamp": 1733439002570
}
In this example, the attributes
object contains fields decoded_body
and ip_addresses
whose values are matched against the pattern "192\.168\.1\.\d+"
, and the matches are replaced with "192.168.1.xxx"
, effectively anonymizing that segment in all string values of the map.
set
This function is used to explicitly set a telemetry field to a specified value, providing flexibility in updating or assigning values within the telemetry data structure.
Syntax: set(target, value)
- Target: A path expression indicating the telemetry field where the value will be set.
- Value: The value to be assigned to the target. This can be of any data type. If the value resolves to
nil
(for example, if it references an unset map value), no action will occur.
Input
{
"_type": "log",
"attributes": {
"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
"ip_addresses": "192.168.1.xxx, 10.10.10.10, 172.16.0.1"
},
"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"resource": {
"ed.conf.id": "123456789",
"ed.domain": "pipeline",
"ed.org.id": "987654321",
"ed.source.name": "Kubernetes Source",
"ed.source.type": "memory_input",
"ed.tag": "loggen",
"host.ip": "10.0.0.1",
"host.name": "ED_TEST",
"service.name": "ed-tester",
"src_type": null
},
"timestamp": 1733439280081
}
Statement
set(attributes["host"], resource["host.ip"])
set(attributes["notes"], "comment")
Output
{
"_type": "log",
"attributes": {
"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
"host": "10.0.0.1",
"ip_addresses": "192.168.1.xxx, 10.10.10.10, 172.16.0.1",
"notes": "comment"
},
"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
"resource": {
"ed.conf.id": "123456789",
"ed.domain": "pipeline",
"ed.org.id": "987654321",
"ed.source.name": "Kubernetes Source",
"ed.source.type": "memory_input",
"ed.tag": "loggen",
"host.ip": "10.0.0.1",
"host.name": "ED_TEST",
"service.name": "ed-tester",
"src_type": null
},
"timestamp": 1733439335943
}
In this example, the host
field within the attributes
object is set to the value from host.ip
in resource
. In addition, a second statement adds a static string value to notes
in attributes
.
truncate_all
This function is used to truncate all string values within a specified map so that none exceed a given character limit. This function helps manage value length in telemetry data to ensure consistency and compliance with size constraints.
Syntax: truncate_all(target, limit)
- Target: A path expression pointing to map type field whose string values need to be truncated.
- Limit: An integer representing the maximum number of characters allowed for each string value. Non-string values within the map are unaffected.
Input
{
"_type": "log",
"attributes": {
"manifest": "{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"name\":\"nginx-deployment\",\"labels\":{\"app\":\"nginx\"}},\"spec\":{\"replicas\":3,\"selector\":{\"matchLabels\":{\"app\":\"nginx\"}},\"template\":{\"metadata\":{\"labels\":{\"app\":\"nginx\"}},\"spec\":{\"containers\":[{\"name\":\"nginx\",\"image\":\"nginx:1.14.2\",\"ports\":[{\"containerPort\":80}]}]}}}"
},
"body": "Deployment issue",
"resource": {...},
"timestamp": 1733439976983
}
Statement
truncate_all(attributes, 80)
Output
{
"_type": "log",
"attributes": {
"manifest": "{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"name\":\"nginx-deployment"
},
"body": "Deployment issue",
"resource": {...},
"timestamp": 1733440045272
}
The string value within attributes, which exceeds 30 characters, is truncated to exactly 80 characters.