OTTL Editor Functions in Edge Delta

Learn about OTTL Editor Functions.

OTTL Editor Functions

Editors transform telemetry data. They are meant to modify the underlying telemetry data by applying various functions.

append

This function is used to append values to a specified target. However, the resulting field is always of type pcommon.Slice. The transformation of scalar values into arrays or appending different types into a common slice could lead to inconsistencies and potential issues in processing or data integrity.

Instead of append, consider using more controlled functions like set that allow you to add or modify values without changing the underlying structure of the existing data fields. These functions typically allow you to maintain key-value pairs, which might be more predictable and less prone to causing unintentional schema changes.

delete_key

This function is used to remove a specified key and its associated value from a target field or object within a log entry. It is particularly useful for cleaning up logs by eliminating unnecessary or sensitive information.

Syntax: delete_key(target, key)

  • Target: The target refers to the field or object within the log entry from which you want to remove the key. It is typically a parent container, such as a JSON object or an associative array, which holds multiple key-value pairs.
  • Key: The key is the specific identifier within the target that you wish to delete. It refers to the name of the field entry that you intend to remove, along with its associated value.

Input

{
  "_type": "log"
  "body": "..."
  "resource": {
    "container.id": "123456789"
    "container.image.name": "docker.io/edgedelta/loggen:latest"
    "ed.conf.id": "123456789"
    "ed.domain": "pipeline"
    "ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
    "ed.org.id": "987654321"
    "ed.source.name": "Kubernetes Source"
    "ed.source.type": "kubernetes_input"
    "ed.tag": "loggen"
    "host.ip": "172.18.0.5"
    "host.name": "loggencluster-worker2"
    "k8s.container.name": "loggen"
    "k8s.deployment.name": "loggen"
    "k8s.namespace.name": "loggenlogs"
    "k8s.node.name": "loggencluster-worker2"
    "k8s.pod.name": "loggen-7cc748d75-xh8lq"
    "k8s.pod.uid": "123456789"
    "k8s.replicaset.name": "loggen-7cc748d75"
    "service.name": "loggen"
    "src_type": "K8s"
  }
  "timestamp": 1733369799254
}

Statement

delete_key(resource, "service.name")

Output

{
  "_type": "log"
  "body": "..."
  "resource": {
    "container.id": "123456789"
    "container.image.name": "docker.io/edgedelta/loggen:latest"
    "ed.conf.id": "123456789"
    "ed.domain": "pipeline"
    "ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
    "ed.org.id": "987654321"
    "ed.source.name": "Kubernetes Source"
    "ed.source.type": "kubernetes_input"
    "ed.tag": "loggen"
    "host.ip": "172.18.0.5"
    "host.name": "loggencluster-worker2"
    "k8s.container.name": "loggen"
    "k8s.deployment.name": "loggen"
    "k8s.namespace.name": "loggenlogs"
    "k8s.node.name": "loggencluster-worker2"
    "k8s.pod.name": "loggen-7cc748d75-xh8lq"
    "k8s.pod.uid": "123456789"
    "k8s.replicaset.name": "loggen-7cc748d75"
    "src_type": "K8s"
  }
  "timestamp": 1733369799254
}

The log entry is modified to exclude the service.name field from resource, leaving the remaining entries in the resource object.

delete_matching_keys

This function is used to remove keys from a specified target that match a given pattern. It’s useful for eliminating multiple entries based on pattern matching, helping to clean up or censor log entries by dynamically selecting keys.

Syntax: delete_matching_keys(target, pattern)

  • Target: The target refers to the field or object within the log entry from which you wish to remove keys. It typically points to a parent container, such as a JSON object or associative array, which holds multiple key-value pairs.
  • Pattern: The pattern is a regular expression that specifies the keys to be removed. Keys that match this pattern will be deleted along with their associated values.

Input

{
  "_type": "log"
  "body": "..."
  "resource": {
    "container.id": "123456789"
    "container.image.name": "docker.io/edgedelta/loggen:latest"
    "ed.conf.id": "123456789"
    "ed.domain": "pipeline"
    "ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
    "ed.org.id": "987654321"
    "ed.source.name": "Kubernetes Source"
    "ed.source.type": "kubernetes_input"
    "ed.tag": "loggen"
    "host.ip": "172.18.0.5"
    "host.name": "loggencluster-worker2"
    "k8s.container.name": "loggen"
    "k8s.deployment.name": "loggen"
    "k8s.namespace.name": "loggenlogs"
    "k8s.node.name": "loggencluster-worker2"
    "k8s.pod.name": "loggen-7cc748d75-xh8lq"
     "k8s.pod.uid": "123456789"
    "k8s.replicaset.name": "loggen-7cc748d75"
    "service.name": "loggen"
    "src_type": "K8s"
  }
  "timestamp": 1733375091978
}

Statement

delete_matching_keys(resource, pattern=".*\\.name$")

See Understand Escaping Characters.

Output

{
  "_type": "log"
  "body": "..."
  "resource": {
    "container.id": "123456789"
    "ed.conf.id": "123456789"
    "ed.domain": "pipeline"
    "ed.filepath": "/var/log/pods/loggenlogs_loggen-123456789/loggen/0.log"
    "ed.org.id": "987654321"
    "ed.source.type": "kubernetes_input"
    "ed.tag": "loggen"
    "host.ip": "172.18.0.5"
     "k8s.pod.uid": "123456789"
    "src_type": "K8s"
  }
  "timestamp": 1733375091978
}

The log entry is edited to remove any keys in the resource object matching the pattern .*\\.name$, while preserving other fields within the resource.

flatten

This function is used to convert nested structures within a log entry into a flat format, typically by turning nested paths into a single level of key-value pairs. It’s particularly useful for simplifying data access and storage when dealing with complex nested structures.

Syntax: flatten(target)

  • Target: The target is the field or object within the log entry that you wish to flatten. It often involves nested JSON objects or arrays which are to be transformed into a simpler structure.

Imagine the agent ingests this log message:

{"eventVersion": "1.08", "userIdentity": {"type": "AssumedRole", "invokedBy": "lambda.amazonaws.com"}, "eventTime": "2024-12-05T05:10:57.227003Z", "eventSource": "ec2.amazonaws.com", "eventName": "ListStacks", "awsRegion": "us-west-2", "sourceIPAddress": "211.46.216.146", "userAgent": "ec2.amazonaws.com", "requestParameters": {}, "responseElements": {"credentials": {"accessKeyId": "A1B2C3D4E5F6G7H8I9J0", "expiration": "2024-12-05T05:10:57.227053Z", "sessionToken": "123456789876"}, "assumedRoleUser": {"assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck", "arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck"}}, "requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst", "eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst", "readOnly": "true", "resources": [{"accountId": 123456789012, "type": "AWS::IAM::Role", "ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole"}], "eventType": "AwsApiCall", "managementEvent": "true", "recipientAccountId": 123456789012, "sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f", "eventCategory": "Management"}

Bear in mind the log is escaped when ingested. See Understand Escaping Characters.

In this example, assume the following OTTL statements have been executed on the log:

set(attributes["decoded_body"], Decode(body, "utf-8"))
set(attributes["parsed_body"], ParseJSON(attributes["decoded_body"]))

To start, the body was decoded from a byte array. See Working with the body for more information about decoding the body.

Next the JSON object was parsed into nested key value pairs. Now the log is ready to be flattened.

Input

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
			"parsed_body": {
				"awsRegion": "us-west-2",
				"eventCategory": "Management",
				"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
				"eventName": "ListStacks",
				"eventSource": "ec2.amazonaws.com",
				"eventTime": "2024-12-05T05:10:57.227003Z",
				"eventType": "AwsApiCall",
				"eventVersion": "1.08",
				"managementEvent": "true",
				"readOnly": "true",
				"recipientAccountId": 123456789012,
				"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
				"requestParameters": {},
				"resources": [
					{
						"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
						"accountId": 123456789012,
						"type": "AWS::IAM::Role"
					}
				],
				"responseElements": {
					"assumedRoleUser": {
						"arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
						"assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck"
					},
					"credentials": {
						"accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
						"expiration": "2024-12-05T05:10:57.227053Z",
						"sessionToken": "123456789876"
					}
				},
				"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
				"sourceIPAddress": "211.46.216.146",
				"userAgent": "ec2.amazonaws.com",
				"userIdentity": {
					"invokedBy": "lambda.amazonaws.com",
					"type": "AssumedRole"
				}
			}
		},
		"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
		"resource": {...},
		"timestamp": 1733376154621
	}
]

Statement

flatten(attributes["parsed_body"])

Output

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
			"parsed_body": {
				"awsRegion": "us-west-2",
				"eventCategory": "Management",
				"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
				"eventName": "ListStacks",
				"eventSource": "ec2.amazonaws.com",
				"eventTime": "2024-12-05T05:10:57.227003Z",
				"eventType": "AwsApiCall",
				"eventVersion": "1.08",
				"managementEvent": "true",
				"readOnly": "true",
				"recipientAccountId": 123456789012,
				"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
				"resources.0": {
					"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
					"accountId": 123456789012,
					"type": "AWS::IAM::Role"
				},
				"responseElements.assumedRoleUser.arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
				"responseElements.assumedRoleUser.assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck",
				"responseElements.credentials.accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
				"responseElements.credentials.expiration": "2024-12-05T05:10:57.227053Z",
				"responseElements.credentials.sessionToken": "123456789876",
				"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
				"sourceIPAddress": "211.46.216.146",
				"userAgent": "ec2.amazonaws.com",
				"userIdentity.invokedBy": "lambda.amazonaws.com",
				"userIdentity.type": "AssumedRole"
			}
		},
		"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
		"resource": {...},
		"timestamp": 1733376208046
	}
]

In this example, the flatten function is used to transform the nested JSON structure in the parsed_body attribute of the log into a single level of key-value pairs. The output demonstrates how each nested level is converted to a flat format by appending the parent keys as prefixes to their child attributes. This transformation reduces the complexity of accessing deeply nested data. For instance, nested elements such as userIdentity which contains subfields like type and invokedBy, are transformed to userIdentity.type and userIdentity.invokedBy respectively in the flattened output.

keep_keys

This function is used to retain specified keys within a target field or object in a log entry. It provides a precise method for filtering log data by keeping only the entries that match the specified keys. Unlike keep_matching_keys, this function requires an explicit list of keys to retain. It is best when you know the exact keys you want to keep ahead of time. The keep_matching_keys function uses a pattern or regular expression to determine which keys to retain, making it more flexible and powerful for dynamic or large datasets where the exact keys might not be known.

Syntax: keep_keys(target, keys)

  • Target: The target is the field or object within the log entry you wish to filter. It typically points to a JSON object or associative array with multiple key-value pairs.
  • Keys: The keys parameter is an array of specific keys you want to retain in the target. Only these keys will be kept, and all others will be removed.

Input

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "...",
			"parsed_body": {
				"awsRegion": "us-west-2",
				"eventCategory": "Management",
				"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
				"eventName": "ListStacks",
				"eventSource": "ec2.amazonaws.com",
				"eventTime": "2024-12-05T05:10:57.227003Z",
				"eventType": "AwsApiCall",
				"eventVersion": "1.08",
				"managementEvent": "true",
				"readOnly": "true",
				"recipientAccountId": 123456789012,
				"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
				"resources.0": {
					"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
					"accountId": 123456789012,
					"type": "AWS::IAM::Role"
				},
				"responseElements.assumedRoleUser.arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
				"responseElements.assumedRoleUser.assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck",
				"responseElements.credentials.accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
				"responseElements.credentials.expiration": "2024-12-05T05:10:57.227053Z",
				"responseElements.credentials.sessionToken": "123456789876",
				"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
				"sourceIPAddress": "211.46.216.146",
				"userAgent": "ec2.amazonaws.com",
				"userIdentity.invokedBy": "lambda.amazonaws.com",
				"userIdentity.type": "AssumedRole"
			}
		},
		"body": "...",
		"resource": {...},
		"timestamp": 1733377758772
	}
]

Statement

keep_keys(attributes["parsed_body"], keys=["eventCategory", "eventName"])

Output

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "...",
			"parsed_body": {
				"eventCategory": "Management",
				"eventName": "ListStacks"
			}
		},
		"body": "...",
		"resource": {...},
		"timestamp": 1733377774908
	}
]

The log entry is updated to retain only the specified eventCategory and eventName within the attributes["parsed_body"] object.

keep_matching_keys

This function is used to retain only the keys from a specified target that match a given pattern. It helps streamline data by keeping desired entries and removing non-matching ones.

Syntax: keep_matching_keys(target, pattern)

  • Target: The target refers to the field or object within the log entry where you want to retain keys. It usually specifies a parent container, like a JSON object or associative array, which holds multiple key-value pairs.
  • Pattern: The pattern is a regular expression designating which keys to keep within the target. Only keys that match this pattern will be retained along with their associated values.

Input

[
	{
		"_type": "log",
		"attributes": {
			"parsed_body": {
				"awsRegion": "us-west-2",
				"eventCategory": "Management",
				"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
				"eventName": "ListStacks",
				"eventSource": "ec2.amazonaws.com",
				"eventTime": "2024-12-05T05:10:57.227003Z",
				"eventType": "AwsApiCall",
				"eventVersion": "1.08",
				"managementEvent": "true",
				"readOnly": "true",
				"recipientAccountId": 123456789012,
				"requestID": "abcd1234-efgh-5678-ijkl-9012mnopqrst",
				"resources.0": {
					"ARN": "arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole",
					"accountId": 123456789012,
					"type": "AWS::IAM::Role"
				},
				"responseElements.assumedRoleUser.arn": "arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck",
				"responseElements.assumedRoleUser.assumedRoleId": "A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck",
				"responseElements.credentials.accessKeyId": "A1B2C3D4E5F6G7H8I9J0",
				"responseElements.credentials.expiration": "2024-12-05T05:10:57.227053Z",
				"responseElements.credentials.sessionToken": "123456789876",
				"sharedEventID": "01234567-89ab-cdef-edcb-a9876543210f",
				"sourceIPAddress": "211.46.216.146",
				"userAgent": "ec2.amazonaws.com",
				"userIdentity.invokedBy": "lambda.amazonaws.com",
				"userIdentity.type": "AssumedRole"
			}
		},
		"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
		"resource": {...},
		"timestamp": 1733378455697
	}
]

Statement

keep_matching_keys(attributes["parsed_body"], pattern="^event.*")

Output

[
	{
		"_type": "log",
		"attributes": {
			"parsed_body": {
				"eventCategory": "Management",
				"eventID": "mnop5678-abcd-1234-efgh-5678ijklqrst",
				"eventName": "ListStacks",
				"eventSource": "ec2.amazonaws.com",
				"eventTime": "2024-12-05T05:10:57.227003Z",
				"eventType": "AwsApiCall",
				"eventVersion": "1.08"
			}
		},
		"body": "{\"eventVersion\": \"1.08\", \"userIdentity\": {\"type\": \"AssumedRole\", \"invokedBy\": \"lambda.amazonaws.com\"}, \"eventTime\": \"2024-12-05T05:10:57.227003Z\", \"eventSource\": \"ec2.amazonaws.com\", \"eventName\": \"ListStacks\", \"awsRegion\": \"us-west-2\", \"sourceIPAddress\": \"211.46.216.146\", \"userAgent\": \"ec2.amazonaws.com\", \"requestParameters\": {}, \"responseElements\": {\"credentials\": {\"accessKeyId\": \"A1B2C3D4E5F6G7H8I9J0\", \"expiration\": \"2024-12-05T05:10:57.227053Z\", \"sessionToken\": \"123456789876\"}, \"assumedRoleUser\": {\"assumedRoleId\": \"A1B2C3D4E5F6G7H8I9J0:AWSConfig-BucketConfigCheck\", \"arn\": \"arn:aws:iam::123456789012:role/ABCDEFGHIJKLM123456789/AWSConfig-BucketConfigCheck\"}}, \"requestID\": \"abcd1234-efgh-5678-ijkl-9012mnopqrst\", \"eventID\": \"mnop5678-abcd-1234-efgh-5678ijklqrst\", \"readOnly\": \"true\", \"resources\": [{\"accountId\": 123456789012, \"type\": \"AWS::IAM::Role\", \"ARN\": \"arn:aws:iam::123456789012:role/aws-controltower-ForwardSnsNotificationRole\"}], \"eventType\": \"AwsApiCall\", \"managementEvent\": \"true\", \"recipientAccountId\": 123456789012, \"sharedEventID\": \"01234567-89ab-cdef-edcb-a9876543210f\", \"eventCategory\": \"Management\"}",
		"resource": {...},
		"timestamp": 1733378478884
	}
]

The log entry retains only the keys within the attributes["parsed_body"] object that match the pattern ^event.*, for example, keeping the eventCategory and eventID keys, while removing all others.

limit

The limit function reduces the number of elements in a pcommon.Map to be no greater than the specified limit. It ensures key-value pairs are restricted to a specific count while maintaining essential, prioritized keys.

Syntax: limit(target, limit, priority_keys[])

  • Target: The target is a path expression to a pcommon.Map type field within the log entry to be limited.
  • Limit: A non-negative integer specifying the maximum number of items to retain in the map.
  • Priority Keys: A list of strings indicating attribute keys that should not be dropped when limiting. These keys ensure that critical data is preserved.

Input

[
	{
		"_type": "log",
		"attributes": {...},
		"body": "...",
		"resource": {
			"ed.conf.id": "123456789",
			"ed.domain": "pipeline",
			"ed.org.id": "987654321",
			"ed.source.name": "Kubernetes Source",
			"ed.source.type": "memory_input",
			"ed.tag": "loggen",
			"host.ip": "10.0.0.1",
			"host.name": "ED_TEST",
			"service.name": "ed-tester",
			"src_type": null
		},
		"timestamp": 1733378647488
	}
]

Statement

limit(resource, 5, ["ed.org.id", "ed.tag", "ed.conf.id"])

Output

[
	{
		"_type": "log",
		"attributes": {...},
		"body": "...",
		"resource": {
			"ed.conf.id": "123456789",
			"ed.org.id": "987654321",
			"ed.source.name": "Kubernetes Source",
			"ed.tag": "loggen",
			"host.name": "ED_TEST"
		},
		"timestamp": 1733378762172
	}
]

The resource field is limited to 5 entries, retaining specified priority keys ed.org.id, ed.tag, and ed.conf.id while randomly selecting two additional entries until the limit is reached.

merge_maps

The merge_maps function combines key-value pairs from a source map into a target map. Using strategies such as insert, upsert, or update, this function allows for flexible handling of key conflicts and integration scenarios.

Syntax: merge_maps(target, source, strategy)

  • Target: The target is the map where new entries will be merged or updated.
  • Source: The source is another map containing entries to be merged into the target.
  • Strategy: The merging strategy defines how key conflicts are handled:
    • insert: Adds entries only if they do not exist in the target.
    • upsert: Adds entries from the source and updates existing matching keys.
    • update: Only updates existing keys in the target if they appear in the source.

Input

[
	{
		"_type": "log",
		"attributes": {
			"parsed_body": {
				"ed.tag": "log-gen",
				"eventCategory": "Management",
				"eventName": "ListStacks"
			}
		},
		"body": "...",
		"resource": {
			"ed.conf.id": "123456789",
			"ed.org.id": "987654321",
			"ed.source.type": "memory_input",
			"ed.tag": "loggen",
			"service.name": "ed-tester"
		},
		"timestamp": 1733380292966
	}
]

Statement

merge_maps(resource, attributes["parsed_body"], "upsert")

Output

[
	{
		"_type": "log",
		"attributes": {
			"parsed_body": {
				"ed.tag": "log-gen",
				"eventCategory": "Management",
				"eventName": "ListStacks"
			}
		},
		"body": "...",
		"resource": {
			"ed.conf.id": "123456789",
			"ed.org.id": "987654321",
			"ed.tag": "log-gen",
			"eventCategory": "Management",
			"eventName": "ListStacks",
			"src_type": null
		},
		"timestamp": 1733380313647
	}
]

The function merges attributes["parsed_body"] into resource using the upsert strategy, updating keys from the source or adding them if they don’t exist. For example, ed.tag in resources was updated with the value from attributes.

replace_match

The replace_match function is used to replace entire strings when they match a specified glob pattern. It simplifies transforming telemetry data by matching exact pattern formats and substituting them with desired strings.

Syntax: replace_match(target, pattern, replacement, function, ReplacementFormat)

  • Target: A path expression to a telemetry field that needs to be checked against the pattern.
  • Pattern: A filepath match string, which defines the criteria for a match.
  • Replacement: The string or path expression to a string telemetry field that will replace any match.
  • Function: (Optional) An optional converter function that applies to the replacement string, allowing customization like hashing.
  • ReplacementFormat: (Optional) Specifies the formatting pattern for the replacement, demanding exactly one %s placeholder for its content.

Input

{
  "_type": "log",
  "body": "...",
  "resource": {
    "ed.conf.id": "123456789",
    "ed.org.id": "987654321",
    "ed.tag": "ed-dev-alb-logs-v3",
    "host.ip": "10.151.135.237",
    "host.name": "default-deployment-5c69f64d9-78wvm",
    "messaging.system": "s3_sqs",
    "service.name": "s3-sqs-s3_input",
    "src_type": "s3_sqs"
  },
  "timestamp": 1730511053177
}

Statement

replace_match(resource["host.name"], "default-deployment-*", "anonymized-host")

Output

{
  "_type": "log",
  "body": "...",
  "resource": {
    "ed.conf.id": "123456789",
    "ed.org.id": "987654321",
    "ed.tag": "ed-dev-alb-logs-v3",
    "host.ip": "10.151.135.237",
    "host.name": "anonymized-host",
    "messaging.system": "s3_sqs",
    "service.name": "s3-sqs-s3_input",
    "src_type": "s3_sqs"
  },
  "timestamp": 1730511053177
}

In this example, the host.name in the resource object matches the pattern "default-deployment-*" and is replaced with "anonymized-host", effectively anonymizing the host’s identity within logs.

replace_all_matches

The replace_all_matches function is used to replace any matching string value within a map type field with a specified replacement string. It is particularly useful for anonymizing or reformatting structured data within log entries.

Syntax: replace_all_matches(target, pattern, replacement, Function, replacementFormat)

  • Target: The target is a path expression to a map type field, indicating where the replacement should occur.
  • Pattern: The pattern is a string using to identify strings for replacement.
  • Replacement: The replacement is the string or path expression that will replace each match found.
  • Function (Optional): An optional converter function applied to the replacement string, such as a hash function.
  • ReplacementFormat (Optional): An optional string format that must contain exactly one %s specifier for formatted replacements.

Input

[
	{
		"_type": "log",
		"attributes": {
			"action": "/user/1234/list/5678",
			"details": "User 1234 performed an operation on list 5678. (/user/1234/list/5678)",
			"url": "/user/1234/list/5678"
		},
		"body": "...",
		"resource": {...},
		"timestamp": 1733437963210
	}
]

Statement

replace_all_matches(attributes, "/user/*/list/*", "/user/{userId}/list/{listId}")

Output

[
	{
		"_type": "log",
		"attributes": {
			"action": "/user/{userId}/list/{listId}",
			"details": "User 1234 performed an operation on list 5678. (/user/1234/list/5678)",
			"url": "/user/{userId}/list/{listId}"
		},
		"body": "...",
		"resource": {...},
		"timestamp": 1733437980591
	}
]

The function replaces user and list IDs in the url field and the action field using generalized placeholders, without altering other fields like details.

replace_pattern

The replace_pattern function is used to replace parts of a single string field within a specified target that matches a regex pattern.

Syntax: replace_pattern(target, regex, replacement, function, replacementFormat)

  • Target: A path expression pointing to a telemetry field that is subject to pattern matching and substitution.
  • Regex: A string denoting the regular expression pattern for finding matching substrings.
  • Replacement: The string or path expression to a string telemetry field that will replace matching segments.
  • Function: (Optional) An optional transformation function that is applied to the replacement string, such as hashing.
  • ReplacementFormat: (Optional) Specifies the format for the replacement, requiring exactly one %s format specifier.

Input

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "User's IP address: 192.168.1.45; Action: Login attempt.",
			"ip_addresses": "192.168.1.45, 10.10.10.10, 172.16.0.1"
		},
		"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
		"resource": {...},
		"timestamp": 1733438810208
	}
]

Statement

replace_pattern(attributes["decoded_body"], "192\\.168\\.1\\.\\d+", "192.168.1.xxx")

See Understand Escaping Characters.

Output

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
			"ip_addresses": "192.168.1.45, 10.10.10.10, 172.16.0.1"
		},
		"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
		"resource": {...},
		"timestamp": 1733438843817
	}
]

In this example, the decoded_body field contains an IP address that is matched against the pattern "192\.168\.1\.\d+" and replaced with "192.168.1.xxx". The ip_addresses field remains unaffected.

replace_all_patterns

The replace_all_patterns function is used to substitute parts of all string values or keys within a map that conform to a specific regex pattern with a new string.

Syntax: replace_all_patterns(target, mode, regex, replacement, function, replacementFormat)

  • Target: The path expression to a map type field which contains the data to be transformed.
  • Mode: Specifies whether replacements are applied to the map’s key or value. Acceptable options are key or value.
  • Regex: The regular expression pattern that identifies what portions of the target should be replaced.
  • Replacement: This is the new string that replaces matched segments and can reference matched groups using a specific syntax.
  • Function: (Optional) An optional converter function that processes the replacement string.
  • ReplacementFormat: (Optional) Specifies a formatting pattern for replacements including a %s placeholder for the main replacement content.

Input

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "User's IP address: 192.168.1.45; Action: Login attempt.",
			"ip_addresses": "192.168.1.45, 10.10.10.10, 172.16.0.1"
		},
		"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
		"resource": {...},
		"timestamp": 1733438978059
	}
]

Statement

replace_all_patterns(attributes, "value", "192\\.168\\.1\\.\\d+", "192.168.1.xxx")

See Understand Escaping Characters.

Output

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
			"ip_addresses": "192.168.1.xxx, 10.10.10.10, 172.16.0.1"
		},
		"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
		"resource": {...},
		"timestamp": 1733439002570
	}
]

In this example, the attributes object contains fields decoded_body and ip_addresses whose values are matched against the pattern "192\.168\.1\.\d+", and the matches are replaced with "192.168.1.xxx", effectively anonymizing that segment in all string values of the map.

set

This function is used to explicitly set a telemetry field to a specified value, providing flexibility in updating or assigning values within the telemetry data structure.

Syntax: set(target, value)

  • Target: A path expression indicating the telemetry field where the value will be set.
  • Value: The value to be assigned to the target. This can be of any data type. If the value resolves to nil (for example, if it references an unset map value), no action will occur.

Input

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
			"ip_addresses": "192.168.1.xxx, 10.10.10.10, 172.16.0.1"
		},
		"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
		"resource": {
			"ed.conf.id": "123456789",
			"ed.domain": "pipeline",
			"ed.org.id": "987654321",
			"ed.source.name": "Kubernetes Source",
			"ed.source.type": "memory_input",
			"ed.tag": "loggen",
			"host.ip": "10.0.0.1",
			"host.name": "ED_TEST",
			"service.name": "ed-tester",
			"src_type": null
		},
		"timestamp": 1733439280081
	}
]

Statement

set(attributes["host"], resource["host.ip"])
set(attributes["notes"], "comment")

Output

[
	{
		"_type": "log",
		"attributes": {
			"decoded_body": "User's IP address: 192.168.1.xxx; Action: Login attempt.",
			"host": "10.0.0.1",
			"ip_addresses": "192.168.1.xxx, 10.10.10.10, 172.16.0.1",
			"notes": "comment"
		},
		"body": "User's IP address: 192.168.1.45; Action: Login attempt.",
		"resource": {
			"ed.conf.id": "123456789",
			"ed.domain": "pipeline",
			"ed.org.id": "987654321",
			"ed.source.name": "Kubernetes Source",
			"ed.source.type": "memory_input",
			"ed.tag": "loggen",
			"host.ip": "10.0.0.1",
			"host.name": "ED_TEST",
			"service.name": "ed-tester",
			"src_type": null
		},
		"timestamp": 1733439335943
	}
]

In this example, the host field within the attributes object is set to the value from host.ip in resource. In addition, a second statement adds a static string value to notes in attributes.

truncate_all

This function is used to truncate all string values within a specified map so that none exceed a given character limit. This function helps manage value length in telemetry data to ensure consistency and compliance with size constraints.

Syntax: truncate_all(target, limit)

  • Target: A path expression pointing to map type field whose string values need to be truncated.
  • Limit: An integer representing the maximum number of characters allowed for each string value. Non-string values within the map are unaffected.

Input

[
	{
		"_type": "log",
		"attributes": {
			"manifest": "{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"name\":\"nginx-deployment\",\"labels\":{\"app\":\"nginx\"}},\"spec\":{\"replicas\":3,\"selector\":{\"matchLabels\":{\"app\":\"nginx\"}},\"template\":{\"metadata\":{\"labels\":{\"app\":\"nginx\"}},\"spec\":{\"containers\":[{\"name\":\"nginx\",\"image\":\"nginx:1.14.2\",\"ports\":[{\"containerPort\":80}]}]}}}"
		},
		"body": "Deployment issue",
		"resource": {...},
		"timestamp": 1733439976983
	}
]

Statement

truncate_all(attributes, 80)

Output

[
	{
		"_type": "log",
		"attributes": {
			"manifest": "{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"name\":\"nginx-deployment"
		},
		"body": "Deployment issue",
		"resource": {...},
		"timestamp": 1733440045272
	}
]

The string value within attributes, which exceeds 30 characters, is truncated to exactly 80 characters.