Designing Pipelines and Testing Edge Delta Nodes

Test node functions to design and build effective pipelines.

Overview

You can test a node and the preceding segment of pipeline using your own data to ensure that it is processing logs and metrics as expected.

Pipeline Design

Suppose services in your environment emit the following logs:

{"timestamp": "2024-03-20T14:31:43.621789Z", "logLevel": "ERROR", "serviceName": "node4_EmailService", "message": "Incorrect password, user failed to authenticate.", "clientIP": "192.168.1.139", "username": "raptor640", "event": "login_failed", "outcome": "failure"}
{"timestamp": "2024-03-20T14:31:13.144319Z", "logLevel": "INFO", "serviceName": "node4_InventoryService", "message": "The user has logged in successfully.", "clientIP": "192.168.1.102", "username": "zebra678", "event": "user_logged_in", "outcome": "success"}

Now suppose you have the following requirements for how these logs should be handled by the Edge Delta pipeline:

  1. Only logs containing the string node4 should be processed as per the remaining requirements, while the remaining logs should be sent directly to the archive.
  2. A new field called nodeID should be added to the attributes for each log.
  3. The value of the new nodeID field should always be node4.
  4. A second new field called originalHostName should be added to the attributes for each log.
  5. The value of the new originalHostName field should be dynamically determined by using the host.name field’s value from the resource section for each log, which will be generated when the log is ingested by the agent.
  6. A third new field called outcome should be added to the attributes for each log.
  7. The value of the new outcome field should be dynamically determined by using the outcome field’s value from the log body, which is a JSON object.

These requirements indicate the need for two Edge Delta nodes in series:

  1. A Route node to capture the node4 logs and send them to be processed while sending other logs to other destinations.
  2. A Log Transform node to process transformations on each log.

Node Configurations

Route

The Route node configuration needs to identify logs containing log4 in the body and send them to a particular named path. To identify the string only in the log body you can use the CEL macro regex_match as follows:

nodes:
- name: route
  type: route
  paths:
  - path: log_transform
    condition: regex_match(item["body"], "node4")

See Route Logs in a Branched Pipeline for a how-to.

Log Transform

The Log Transform node is configured as follows to perform three transformations:

  1. upsert node4 into attributes.nodeID.
  2. upsert the value of item["resource"]["host.name"] into attributes.originalHostName.
  3. upsert the value of json(item["body"]).outcome into attributes.outcome.

The pipeline should be similar to the following. In this example a file input node is the data source and default nodes have been removed.

The YAML for this part of the pipeline should be similar to the following:

links:
- from: file_input
  to: route
- from: route
  path: log_transform
  to: log_transform_test
- from: route
  path: unmatched
  to: ed_archive
- from: log_transform_test
  to: ed_archive

nodes:
- name: ed_archive
  type: ed_archive_output
- name: ed_health
  type: ed_health_output
- name: file_input
  type: file_input
  path: /mnt/inputfile/logs/*.*
- name: route
  type: route
  paths:
  - path: log_transform
    condition: regex_match(item["body"], "node4")
- name: log_transform_test
  type: log_transform
  transformations:
  - field_path: attributes.nodeID
    operation: upsert
    value: '"node4"'
  - field_path: attributes.originalHostName
    operation: upsert
    value: item["resource"]["host.name"]
  - field_path: attributes.outcome
    operation: upsert
    value: json(item["body"]).outcome

Testing the Route and Transformations

You can test the Route and Transformations simultaneously.

  1. Enter Edit Mode, open the Log Transform node and click Test Node.

The Regex page opens and sample logs are inserted. This configuration doesn’t contain regex so there are no matches.

  1. Click Processor and paste representative logs that the services in your environment emit.

In this instance the logs listed previously are pasted in the Paste Log Data field, and an additional non-matching log, which contains node3 instead of node4, has been added.

{"timestamp": "2024-03-20T14:31:43.621789Z", "logLevel": "ERROR", "serviceName": "node4_EmailService", "message": "Incorrect password, user failed to authenticate.", "clientIP": "192.168.1.139", "username": "raptor640", "event": "login_failed", "outcome": "failure"}
{"timestamp": "2024-03-20T14:31:13.144319Z", "logLevel": "INFO", "serviceName": "node4_InventoryService", "message": "The user has logged in successfully.", "clientIP": "192.168.1.102", "username": "zebra678", "event": "user_logged_in", "outcome": "success"}
{"timestamp": "2024-03-20T14:31:13.144319Z", "logLevel": "INFO", "serviceName": "node3_InventoryService", "message": "The user has logged in successfully.", "clientIP": "192.168.1.102", "username": "zebra678", "event": "user_logged_in", "outcome": "success"}

The Input node option should be set to Test full pipeline up to the file_input. This ensures that the routing logic in the Route node is also tested.

  1. Click Test Processor and examine the Incoming and Outgoing Data Items fields:

While three logs were entered into the test, only 2 are listed in the Incoming Data Items field. This is due to the route node not sending the node3 log to the Log Transform node. This validates the Route configuration.

You can expand the log fields in the Outgoing Data Items field:

{
  "_type": "log"
  "attributes": {
    "nodeID": "node4"
    "originalHostName": "ED_TEST"
    "outcome": "failure"
  }
  "body": "{"timestamp": "2024-03-20T14:31:43.621789Z", "logLevel": "ERROR", "serviceName": "node4_EmailService", "message": "Incorrect password, user failed to authenticate.", "clientIP": "192.168.1.139", "username": "raptor640", "event": "login_failed", "outcome": "failure"}"
  "resource": {
    "ed.conf.id": "123456789"
    "ed.filepath": "test/file/path"
    "ed.org.id": "987654321"
    "ed.tag": "testing pipeline"
    "host.ip": "10.0.0.1"
    "host.name": "ED_TEST"
    "src_type": "file_input"
  }
  "timestamp": 1710958144339
}
{
  "_type": "log"
  "attributes": {
    "nodeID": "node4"
    "originalHostName": "ED_TEST"
    "outcome": "success"
  }
  "body": "{"timestamp": "2024-03-20T14:31:13.144319Z", "logLevel": "INFO", "serviceName": "node4_InventoryService", "message": "The user has logged in successfully.", "clientIP": "192.168.1.102", "username": "zebra678", "event": "user_logged_in", "outcome": "success"}"
  "resource": {
    "ed.conf.id": "123456789"
    "ed.filepath": "test/file/path"
    "ed.org.id": "987654321"
    "ed.tag": "testing pipeline"
    "host.ip": "10.0.0.1"
    "host.name": "ED_TEST"
    "src_type": "file_input"
  }
  "timestamp": 1710958144341
}

There are three new attributes in each log, validating the Log Transformation configuration.

Testing CEL in isolation

Optionally, you can validate the CEL expression used in the Log Transform configuration.

  1. Click CEL. You can explore the CEL library to discover methods to capture data.

  2. Select json(item[“body”]).outcome and click Test CEL.

The CEL test results show only the captured values from the selected CEL macro. To view them in the context of the log you need to use the Processor tab.