Edge Delta Mask Node
4 minute read
Overview
The Mask node obfuscates sensitive data in logs by replacing them with a specified set of characters, such as a series of asterisks or a custom string. Masking is important for compliance with various data protection regulations and for privacy concerns. Sensitive data is identified using a regex pattern. There are several patterns available out of the box for common regex patterns such as email, bitcoin address, different types of credit card numbers etc. You can also create a regex pattern with multiple capture groups, and specify different masks for each capture group.
Note the mask node cannot process patterns with alternative regexes such as IPv4 OR IPv6 separated with a pipe character. Separate mask nodes in series are required to mask each regex pattern.
For a detailed walkthrough, see the Mask Emails in Logs page.
Each transformation node is designed for enriching different sections of the data item:
Node | Attribute | Resource | Body | Timestamp | Any Field | Restrictions |
---|---|---|---|---|---|---|
Output Transform | Y | Y | Y | Y | Y | Outputs a Custom type data item that cant be ingested by the Edge Delta Archive node. The whole payload is flattened and sent as the event, with all other fields empty. |
Log Transform | Y | N | N | Y | N | Can only ingest logs, and it outputs only logs. |
Resource Transform | N | Y | N | N | N | Can only ingest logs, and it outputs only logs. |
Mask | N | N | Y | N | N | Can only ingest logs, and it outputs only logs. New value can only be a static string. |
The body field is protected from dynamic enrichment until the end of the pipeline (Output Transform) to prevent schema changes from disabling pipeline functionality.
Example 1: Single Mask
In this example, a string containing an IP address is identified and the IPv4 address is masked with the word “REDACTED”.
nodes:
- name: mask
type: mask
pattern: Received request from (\b(\d{1,3}\.){3}\d{1,3}\b)
mask: REDACTED
Input Log
{
"timestamp": "2023-04-05T14:22:45Z",
"node_id": "node6",
"log_level": "INFO",
"message": "Received request from 192.168.1.5",
"source_ip": "192.168.1.5",
"event": "request_received",
"service": "api-service",
"protocol": "HTTP",
"method": "GET",
"endpoint": "/api/data",
"status_code": 200
}
Output Log
{
"timestamp":"2023-04-05T14:22:45Z",
"node_id":"node6",
"log_level":"INFO",
"message":"Received request from REDACTED",
"source_ip":"192.168.1.5",
"event":"request_received",
"service":"api-service",
"protocol":"HTTP",
"method":"GET",
"endpoint":"/api/data",
"status_code":200
}
Example 2: Greedy Mask
In this example, the Received request from
string has been removed from the pattern. So all IPv4 addressed will masked with the word REDACTED
.
nodes:
- name: mask
type: mask
pattern: (\b(\d{1,3}\.){3}\d{1,3}\b)
mask: REDACTED
Input Log
{
"timestamp": "2023-04-05T14:22:45Z",
"node_id": "node6",
"log_level": "INFO",
"message": "Received request from 192.168.1.5",
"source_ip": "192.168.1.5",
"event": "request_received",
"service": "api-service",
"protocol": "HTTP",
"method": "GET",
"endpoint": "/api/data",
"status_code": 200
}
Output Log
The regex matching is greedy so both the IP address in the message
field and the source_ip
field have been redacted.
{
"timestamp":"2023-04-05T14:22:45Z",
"node_id":"node6",
"log_level":"INFO",
"message":"Received request from REDACTED",
"source_ip":"REDACTED",
"event":"request_received",
"service":"api-service",
"protocol":"HTTP",
"method":"GET",
"endpoint":"/api/data",
"status_code":200
}
Example 3: Capture Groups
In this example, a regex pattern containing two capture groups is configured with different masks for each group.
nodes:
- name: mask
type: mask
pattern: password=(?P<pw>\S+).*?ssn=(?P<ssn>\d{3}-\d{2}-\d{4})
capture_group_masks:
- capture_group: pw
mask: '****'
- capture_group: ssn
mask: <REDACTED>
Input Log
12:34 [INFO] hello info - i am an info log - username:foobar, password=fancycat, service:billing, ssn=824-24-1932, environment:prod, latency=143ms
Output Log
{
"_type": "log",
"body": "12:34 [INFO] hello info - i am an info log - username:foobar, password=**** service:billing, ssn=<REDACTED>, environment:prod, latency=143ms",
"resource": {
"ed.conf.id": "12345678987654321",
"ed.org.id": "98765432123456789",
"ed.tag": "masktest",
"host.ip": "10.0.0.1",
"host.name": "ED_TEST",
"service.name": "",
"src_type": "memory_input"
},
"timestamp": 1722475675042
}
Required Parameters
name
A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the yaml using the name. It must be unique across all nodes. It is a yaml list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: log_to_pattern
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
pattern
The pattern
parameter is used to identify the string within the body field that should be masked. It is specified as a Golang regex pattern string. Alternatively you can select a pre-configured regex pattern. A pattern
is required. See Regex Testing for details on writing effective regex patterns.
nodes:
- name: <node name>
type: mask
pattern: <regex pattern>
Optional Parameters
mask
The mask
parameter is used to define the characters that should be used to obfuscate the masked data. It is specified as a string and the default is ******.
It is optional.
nodes:
- name: <node name>
type: mask
pattern: <regex pattern>
mask: <masking characters>
predefined_pattern
The predefined_pattern
parameter is used to identify the values that should be masked. It is specified as a string. A pattern
or a predefined_pattern
is required. You can select one of the following predefined patterns:
credit_card
us_phone_dash
nodes:
- name: <node name>
type: mask
predefined_pattern: credit_card|us_phone_dash