Edge Delta Mask Node

Mask values in logs using the Edge Delta Mask Node.

Overview

The Mask node obfuscates sensitive data in logs by replacing them with a specified set of characters, such as a series of asterisks or a custom string. Masking is important for compliance with various data protection regulations and for privacy concerns. Sensitive data is identified using a regex pattern. There are several patterns available out of the box for common regex patterns such as email, bitcoin address, different types of credit card numbers etc. You can also create a regex pattern with multiple capture groups, and specify different masks for each capture group.

  • incoming_data_types: log
  • outgoing_data_types: log

Note the mask node cannot process patterns with alternative regexes such as IPv4 OR IPv6 separated with a pipe character. Separate mask nodes in series are required to mask each regex pattern.

For a detailed walkthrough, see the Mask Emails in Logs page.

Each transformation node is designed for enriching different sections of the data item:

Node Attribute Resource Body Timestamp Any Field Restrictions
Output Transform Y Y Y Y Y Outputs a Custom type data item that cant be ingested by the Edge Delta Archive node. The whole payload is flattened and sent as the event, with all other fields empty.
Log Transform Y N N Y N Can only ingest logs, and it outputs only logs.
Resource Transform N Y N N N Can only ingest logs, and it outputs only logs.
Mask N N Y N N Can only ingest logs, and it outputs only logs. New value can only be a static string.
Generic Transform Y Y N Y N Can only transform non-body fields.
OTTL Transform Y Y Y Y Y Can transform any field on any data type.

The body field is protected from dynamic enrichment until the end of the pipeline (Output Transform) to prevent schema changes from disabling pipeline functionality.

Example 1: Single Mask

In this example, a specific string containing an IP address is identified and the IPv4 address is masked with the word “REDACTED”.

nodes:
  - name: mask
    type: mask
    pattern: Received request from (\b(\d{1,3}\.){3}\d{1,3}\b)
    mask: REDACTED

Input Log

{
  "timestamp": "2023-04-05T14:22:45Z",
  "node_id": "node6",
  "log_level": "INFO",
  "message": "Received request from 192.168.1.5",
  "source_ip": "192.168.1.5",
  "event": "request_received",
  "service": "api-service",
  "protocol": "HTTP",
  "method": "GET",
  "endpoint": "/api/data",
  "status_code": 200
}

Output Log

{
  "timestamp":"2023-04-05T14:22:45Z",
  "node_id":"node6",
  "log_level":"INFO",
  "message":"Received request from REDACTED",
  "source_ip":"192.168.1.5",
  "event":"request_received",
  "service":"api-service",
  "protocol":"HTTP",
  "method":"GET",
  "endpoint":"/api/data",
  "status_code":200
}

Note how the Source IP address is not masked.

Example 2: Greedy Mask

In this example, the Received request from string has been removed from the pattern. So all IPv4 addressed will masked with the word REDACTED.

nodes:
  - name: mask
    type: mask
    pattern: (\b(\d{1,3}\.){3}\d{1,3}\b)
    mask: REDACTED

Input Log

{
  "timestamp": "2023-04-05T14:22:45Z",
  "node_id": "node6",
  "log_level": "INFO",
  "message": "Received request from 192.168.1.5",
  "source_ip": "192.168.1.5",
  "event": "request_received",
  "service": "api-service",
  "protocol": "HTTP",
  "method": "GET",
  "endpoint": "/api/data",
  "status_code": 200
}

Output Log

The regex matching is greedy so both the IP address in the message field and the source_ip field have been redacted.

{
  "timestamp":"2023-04-05T14:22:45Z",
  "node_id":"node6", 
  "log_level":"INFO",
  "message":"Received request from REDACTED",
  "source_ip":"REDACTED",
  "event":"request_received",
  "service":"api-service",
  "protocol":"HTTP",
  "method":"GET",
  "endpoint":"/api/data",
  "status_code":200
}

Note: the Host IP address is not masked.

Example 3: Capture Groups

In this example, a regex pattern containing two capture groups is configured with different masks for each group.

nodes:
- name: mask
  type: mask
  pattern: password=(?P<pw>\S+).*?ssn=(?P<ssn>\d{3}-\d{2}-\d{4})
  capture_group_masks:
  - capture_group: pw
    mask: '****'
  - capture_group: ssn
    mask: <REDACTED>

Input Log

12:34 [INFO] hello info - i am an info log - username:foobar, password=fancycat, service:billing, ssn=824-24-1932, environment:prod, latency=143ms

Output Log

{
  "_type": "log",
  "body": "12:34 [INFO] hello info - i am an info log - username:foobar, password=**** service:billing, ssn=<REDACTED>, environment:prod, latency=143ms",
  "resource": {
    "ed.conf.id": "12345678987654321",
    "ed.org.id": "98765432123456789",
    "ed.tag": "masktest",
    "host.ip": "10.0.0.1",
    "host.name": "ED_TEST",
    "service.name": "",
    "src_type": "memory_input"
  },
  "timestamp": 1722475675042
}

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: log_to_pattern

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

pattern

The pattern parameter is used to identify the string within the body field that should be masked. It is specified as a Golang regex pattern string. Alternatively you can select a pre-configured regex pattern. A pattern is required. See Regex Testing for details on writing effective regex patterns.

nodes:
  - name: <node name>
    type: mask
    pattern: <regex pattern>

Optional Parameters

mask

The mask parameter is used to define the characters that should be used to obfuscate the masked data. It is specified as a string and the default is ******. It is optional.

nodes:
  - name: <node name>
    type: mask
    pattern: <regex pattern>
    mask: <masking characters>

predefined_pattern

The predefined_pattern parameter is used to identify the values that should be masked. It is specified as a string. A pattern or a predefined_pattern is required. You can select one of the following predefined patterns:

  • credit_card
  • us_phone_dash
nodes:
  - name: <node name>
    type: mask
    predefined_pattern: credit_card|us_phone_dash

See Also

Mask Emails in Logs