Regex as a String

Use Regex as a String in CEL Macros and OTTL statements.

Overview

It is important to understand how special characters are escaped in the Edge Delta pipeline and how to deal with them when providing regex patterns in string format.

Escaping Characters

Consider an error message: The error code "ERR\57" occurred in Module A..

When ingested by the agent and escaped into JSON, it appears in the pipeline as follows:

{
  "body": "The error code \"ERR\\57\" occurred in Module A."
}

Note how special characters are escaped. For basic masking, you can use the following regex pattern that takes the escaping into account:

(?P<error>"ERR\\\d+")
  • (?P<error>...): This syntax is used for named capturing groups in regex. It defines a group named error.
  • P<error>: Specifies the name of this capturing group.
  • "ERR: Matches the literal "ERR.
  • \\: Matches the single backslash \ present in the error code.
  • \d+: Captures one or more digits, representing the numeric portion of the error code.
  • ": Matches the closing double quote after the numerals.

Regex in CEL

CEL takes regex input as a string and requires extra escaping: once for the JSON layer and once for the regex engine.

When embedding a regex pattern within a CEL expression, the pattern is treated as a string literal. For example the regex_capture macro takes, as an input, a regex pattern as a string:

regex_capture(input string, regexWithCaptureGroups string) 

Strings must be enclosed in double quotes. Furthermore, inside this quoted string, specific characters such as \ and " must be escaped with a backslash \ for the regex engine. Since the initial backslash is needed to denote the escape in the JSON context, the result is the need for double escapes:

To re-iterate: The error code "ERR\57" occurred in Module A. appears in the pipeline as follows:

{
  "body": "The error code \"ERR\\57\" occurred in Module A."
}

The normal regex pattern (?P<error>"ERR\\\d+"), when encased as a string becomes:

regex_capture(item["body"], "(?P<error>\"ERR\\\\\\d+\")")["error"]
  • \"ERR: Escapes the opening double quote for the regex engine, matching "ERR literally.
  • \\\: Represents the original single backslash after ERR escaped twice: once for the data layer and once for the regex engine.
  • \\\d+: Represents one or more digits \d+, escaped twice: once for the data layer and once for the regex engine.
  • \": Escapes the closing double quote of the error code, for the regex engine.

Note the properly closed brackets for the capture group and for the regex_capture(). And note the addition of the name of the regex capture [“error”].

Regex in OTTL

Like CEL, OTTL takes regex input as a string and requires extra escaping: once for the JSON layer and once for the regex engine. For example, consider the delete_matching_keys function.

delete_matching_keys(target, pattern)`

Consider this log after ingestion (ie JSON escaping has been added)

{
  "_type": "log",
  "body": "A message",
  "resource": {
    "\"ERR\\57\"": "strange.name",
    "\"ERR\\58\"": "strange.name2"
  },
  "timestamp": 1730511053177
}

The normal regex pattern (without a named group) for the two resource keys "ERR\\\d+", when encased as a string becomes:

delete_matching_keys(resource, "\"ERR\\\\\\d+\"")
  • \"ERR: Escapes the opening double quote for the regex engine, matching "ERR literally.
  • \\\: Represents the original single backslash after ERR escaped twice: once for the data layer and once for the regex engine.
  • \\\d+: Represents one or more digits \d+, escaped twice: once for the data layer and once for the regex engine.
  • \": Escapes the closing double quote of the error code, for the regex engine.