Regex as a String
3 minute read
Overview
It is important to understand how special characters are escaped in the Edge Delta pipeline and how to deal with them when providing regex patterns in string format.
Escaping Characters
Consider an error message: The error code "ERR\57" occurred in Module A.
.
When ingested by the agent and escaped into JSON, it appears in the pipeline as follows:
{
"body": "The error code \"ERR\\57\" occurred in Module A."
}
Note how special characters are escaped. For basic masking, you can use the following regex pattern that takes the escaping into account:
(?P<error>"ERR\\\d+")
(?P<error>...)
: This syntax is used for named capturing groups in regex. It defines a group namederror
.P<error>
: Specifies the name of this capturing group."ERR
: Matches the literal"ERR
.\\
: Matches the single backslash\
present in the error code.\d+
: Captures one or more digits, representing the numeric portion of the error code."
: Matches the closing double quote after the numerals.
Regex in CEL
CEL takes regex input as a string and requires extra escaping: once for the JSON layer and once for the regex engine.
When embedding a regex pattern within a CEL expression, the pattern is treated as a string literal. For example the regex_capture macro takes, as an input, a regex pattern as a string:
regex_capture(input string, regexWithCaptureGroups string)
Strings must be enclosed in double quotes. Furthermore, inside this quoted string, specific characters such as \
and "
must be escaped with a backslash \
for the regex engine. Since the initial backslash is needed to denote the escape in the JSON context, the result is the need for double escapes:
To re-iterate:
The error code "ERR\57" occurred in Module A.
appears in the pipeline as follows:
{
"body": "The error code \"ERR\\57\" occurred in Module A."
}
The normal regex pattern (?P<error>"ERR\\\d+")
, when encased as a string becomes:
regex_capture(item["body"], "(?P<error>\"ERR\\\\\\d+\")")["error"]
\"ERR
: Escapes the opening double quote for the regex engine, matching"ERR
literally.\\\
: Represents the original single backslash afterERR
escaped twice: once for the data layer and once for the regex engine.\\\d+
: Represents one or more digits\d+
, escaped twice: once for the data layer and once for the regex engine.\"
: Escapes the closing double quote of the error code, for the regex engine.
Note the properly closed brackets for the capture group and for the
regex_capture()
. And note the addition of the name of the regex capture [“error”].
Regex in OTTL
Like CEL, OTTL takes regex input as a string and requires extra escaping: once for the JSON layer and once for the regex engine. For example, consider the delete_matching_keys function.
delete_matching_keys(target, pattern)`
Consider this log after ingestion (ie JSON escaping has been added)
{
"_type": "log",
"body": "A message",
"resource": {
"\"ERR\\57\"": "strange.name",
"\"ERR\\58\"": "strange.name2"
},
"timestamp": 1730511053177
}
The normal regex pattern (without a named group) for the two resource keys "ERR\\\d+"
, when encased as a string becomes:
delete_matching_keys(resource, "\"ERR\\\\\\d+\"")
\"ERR
: Escapes the opening double quote for the regex engine, matching"ERR
literally.\\\
: Represents the original single backslash afterERR
escaped twice: once for the data layer and once for the regex engine.\\\d+
: Represents one or more digits\d+
, escaped twice: once for the data layer and once for the regex engine.\"
: Escapes the closing double quote of the error code, for the regex engine.