Source Detection with Edge Delta
Automatically detect data sources when installing a Fleet.
3 minute read
To design an effective data handling pipeline you should have a good understanding of the data your workloads generate. It is important to understand their structure and content, as well as whether they are homogeneous - of the same type and structure.
When logs are ingested into the pipeline, the entire log becomes the body and metadata is added to the log to build an OTEL data item.
Bear in mind that the OTEL source node attempts to use the incoming OTEL log fields.
See more examples of the data items. To understand how data is escaped, see Understand Escaping Characters.
Live Capture helps you design pipelines by showing actual data as it flows through the processor. See Live Capture.
To design effective log and metric pipelines, you must have a comprehensive understanding of the data handling requirements. These include business-driven factors such as cost-efficiency and adherence to legal mandates, data-specific needs such as volume capacity and optimization of data throughput, information security, and maintainability.
Create a rough or conceptual pipeline containing the nodes whose functions fulfil the requirements. Consider the sequence of nodes and opportunities for branching the pipeline in paths. Develop a high level understanding of what your data should look like as it progresses through the pipeline to meet your requirements. For example, the first node might mask a specific field, while the next might extract a field from the body and convert it into an attribute. A parallel path might be required to also generate metrics or trigger alerts against a threshold. Consider the data destination data format requirements.
Managing computational cost is vital to ensure the Fleet’s performance and overall cost-effectiveness within an edge computing environment. Try to use pipeline configurations that are computationally less expensive than alternatives that perform the same function. For example, consider this transformation configuration:
- field_path: item["attributes"]["pod_name"]
operation: upsert
value: from_k8s(regex_capture(item["resource"]["ed.filepath"], "/var/lib/kubelet/pods/(?P<id>(.+))/volumes.*")["id"], "k8s.pod.name")
- field_path: item["attributes"]["pod_namespace"]
operation: upsert
value: from_k8s(regex_capture(item["resource"]["ed.filepath"], "/var/lib/kubelet/pods/(?P<id>(.+))/volumes.*")["id"], "k8s.namespace.name")
In this configuration, regex_capture
is called twice.
Now consider this version:
- field_path: item["attributes"]["pod_id"]
operation: upsert
value: regex_capture(item["resource"]["ed.filepath"], "/var/lib/kubelet/pods/(?P<id>(.+))/volumes.*")["id"]
- field_path: item["attributes"]["pod_name"]
operation: upsert
value: from_k8s(item["attributes"]["pod_id"], "k8s.pod.name")
- field_path: item["attributes"]["pod_namespace"]
operation: upsert
value: from_k8s(item["attributes"]["pod_id"], "k8s.namespace.name")
regex_capture
call is made in the efficient configuration, as opposed to four in the inefficient configuration. Since regex operations can be costly, minimizing their usage can lead to considerable performance improvements.pod_id
is extracted once and reused multiple times, which streamlines the data transformation process and reduces redundancy.See the CEL Macro page and the Designing Efficient Pipelines page for the computational expense of each CEL macro.
Automatically detect data sources when installing a Fleet.
Build and test pipelines using your live data.
Build and test regex patterns for use in your pipeline nodes.
Build Multiprocessors to shape and manage live data.
Understand and manage timestamps using Edge Delta.
Enrich logs dynamically using data in a lookup table on the edge with Edge Delta’s agent.
Building efficient pipelines focused on optimizing computational resources.
Packs in the Edge Delta Visual Pipeline.
Creates a test bench to experiment with different pipeline configurations.