Design Effective Pipelines

Design and build effective pipelines by testing.

Know your Data

To design an effective data handling pipeline you should have a good understanding of the data your workloads generate. It is important to understand their structure and content, as well as whether they are homogeneous - of the same type and structure.

When logs are ingested into the pipeline, the entire log becomes the body and metadata is added to the log to build an OTEL data item.

Bear in mind that the OTEL source node attempts to use the incoming OTEL log fields.

See more examples of the data items. To understand how data is escaped, see Understand Escaping Characters.

Live Capture helps you design pipelines by showing actual data as it flows through the processor. See Live Capture.

Know your Requirements

To design effective log and metric pipelines, you must have a comprehensive understanding of the data handling requirements. These include business-driven factors such as cost-efficiency and adherence to legal mandates, data-specific needs such as volume capacity and optimization of data throughput, information security, and maintainability.

Pipeline Conceptual Design

Create a rough or conceptual pipeline containing the nodes whose functions fulfil the requirements. Consider the sequence of nodes and opportunities for branching the pipeline in paths. Develop a high level understanding of what your data should look like as it progresses through the pipeline to meet your requirements. For example, the first node might mask a specific field, while the next might extract a field from the body and convert it into an attribute. A parallel path might be required to also generate metrics or trigger alerts against a threshold. Consider the data destination data format requirements.

Efficient Pipelines

Managing computational cost is vital to ensure the Fleet’s performance and overall cost-effectiveness within an edge computing environment. Try to use pipeline configurations that are computationally less expensive than alternatives that perform the same function. For example, consider this transformation configuration:

- field_path: item["attributes"]["pod_name"]
  operation: upsert
  value: from_k8s(regex_capture(item["resource"]["ed.filepath"], "/var/lib/kubelet/pods/(?P<id>(.+))/volumes.*")["id"], "k8s.pod.name")
- field_path: item["attributes"]["pod_namespace"]
  operation: upsert
  value: from_k8s(regex_capture(item["resource"]["ed.filepath"], "/var/lib/kubelet/pods/(?P<id>(.+))/volumes.*")["id"], "k8s.namespace.name")

In this configuration, regex_capture is called twice.

Now consider this version:

- field_path: item["attributes"]["pod_id"]
  operation: upsert
  value: regex_capture(item["resource"]["ed.filepath"], "/var/lib/kubelet/pods/(?P<id>(.+))/volumes.*")["id"]
- field_path: item["attributes"]["pod_name"]
  operation: upsert
  value: from_k8s(item["attributes"]["pod_id"], "k8s.pod.name")
- field_path: item["attributes"]["pod_namespace"]
  operation: upsert
  value: from_k8s(item["attributes"]["pod_id"], "k8s.namespace.name")
  • Fewer Regex Operations: Only one regex_capture call is made in the efficient configuration, as opposed to four in the inefficient configuration. Since regex operations can be costly, minimizing their usage can lead to considerable performance improvements.
  • Reusing Extracted Data: The pod_id is extracted once and reused multiple times, which streamlines the data transformation process and reduces redundancy.
  • Optimized API Calls: With fewer steps involved in data transformation, the API interactions, particularly with Kubernetes, become more efficient. This leads to faster processing times and lower latency.

See the CEL Macro page and the Designing Efficient Pipelines page for the computational expense of each CEL macro.


Source Detection with Edge Delta

Automatically detect data sources when installing a Fleet.

Live Data Pipeline Design in Edge Delta

Build and test pipelines using your live data.

Test Regex in Edge Delta

Build and test regex patterns for use in your pipeline nodes.

Use Multiprocessors in Edge Delta

Build Multiprocessors to shape and manage live data.

Manage Log Timestamps with Edge Delta

Understand and manage timestamps using Edge Delta.

Use Lookup Tables in Edge Delta

Enrich logs dynamically using data in a lookup table on the edge with Edge Delta’s agent.

Designing Efficient Pipelines with Edge Delta

Building efficient pipelines focused on optimizing computational resources.

Edge Delta Packs

Packs in the Edge Delta Visual Pipeline.

Create an Edge Delta Test Bench

Creates a test bench to experiment with different pipeline configurations.