Agent v1.14.0

Health Data Upload Fix, Granular Stream Stats, CEL Function, and Health Data Debugging.

August 20, 2024

Critical Fix

  • Health Data Upload Fix: Resolved an issue with health data uploads causing throttling. Now, health and diagnostic data are buffered to ensure they are uploaded as a single file, significantly reducing upload frequency. All agents running version 1.13.0 must upgrade to 1.14.0.

New Features

  • Granular Stream Stats: Added support for granular stream statistics in S3 and Azure Blob Storage, with similar functionality extended to the ED Archive when metadata is enabled.
  • CEL Function: Introduced the to_json CEL macro.
  • Health Data Debugging: Allowed ingestion of health data by the Debug output node.
  • Datadog and Splunk Mapper Updates: Allowed ingestion of metrics by Datadog and Splunk mappers.
  • Cluster-Pattern Item Manipulation: Enabled Datadog and Splunk mappers as well as Output Transform nodes to ingest the cluster-pattern data type.

Enhancements

  • Agent handling of large data items: To improve agent performance, the agent will split any incoming message larger than 1Mb into individual messages. In addition, the Edge Delta archive will not ingest telemetry messages larger than 2Mb.
  • Improved Kubernetes CEL Function: Added the GetPod function to improve use of the from_k8s CEL macro. Introduced a Kubernetes API fetch step if the pod is not found in the cache.
  • OTEL Log Ingestion: Made OpenTelemetry (OTEL) log ingestion the default path and removed the old ingestion path in the v3 codebase.
  • Rename ed_archive_output to ed_logs_output: The node type for ed_archive_output will be changed to ed_logs_output. To ensure backward compatibility, both of these node types are supported.
  • Data Type Validation in OTLP Input: Introduced stricter string data type validation for OTLP input and changed the input field to a dropdown menu instead of a freeform text field.
  • Display Name Consistency: Updated the display name for the unescape JSON node to JSON Unescape.
  • The following advanced firewall rules are no longer required:
    • ed-agent-log.s3.us-west-2.amazonaws.com
    • ed-overflow-agent-log.s3.us-west-2.amazonaws.com
    • agent-pprof.s3.us-west-2.amazonaws.com
  • Kubernetes Pod Topology Spread Constraints: Introduced pod topology spread constraints to our Helm chart. This feature helps control how Pods are spread across your cluster among failure domains such as regions, zones, nodes, and other user-defined topology domains, improving operability with KaaS and overall K8s scheduling.
  • Cache Health Observability: Added health data to the pod listener component, allowing better observation of cache contents over time.
  • OTLP Traces: Added support for OTLP traces to the OTLP input node. This enhancement, data_type: trace, improves tracing capabilities.

Fixes

  • Docker Library Update: Updated the Docker library from version 24.0.9 to 26.1.5 to address critical CVEs including CVE-2024-41110.
  • Remove Config Content ID: Removed all references to the now-unsupported config content ID.
  • Stream Stats Calculation: Fixed potential divide-by-zero panics by adding length checks before performing average calculations on metadata.
  • Resource Flexibility: Made source attributes more flexible by removing mappings that prevented certain labels from propagating downstream when added by users.
  • Large Stack Trace Handling: Increased minimum seek size to handle large stack traces more effectively.
  • Nested Compound Nodes: Resolved issues with compound nodes (now called Packs) having the same name as their parent compound node, ensuring correct pipeline imports.
  • Node Creation for Rollup: Reduced memory consumption on rollup agents by limiting the creation of unnecessary components.
  • Sample Collection Time: Increased the default sample collection time from 1 minute to 15 minutes to ensure coverage for lower volume sources.
  • Compactor Service DNS Resolver: Fixed an issue where the compactor service’s DNS resolver watched for changes in all services in a K8s cluster. The DNS resolver now only monitors the compactor service, reducing unnecessary load. Also fixed the deregistration of the pod listener from the health manager.
  • K8s Metrics Collection: Corrected an issue where some metrics collectors did not check if metric items were nil, causing errors during K8s metrics collection.
  • Health Endpoints in HTTP(S) Input: Removed constraints on health endpoints.
  • Transform Node Updates: Fixed on-screen wording and updated examples for transform nodes.
  • Ingest Health Data Type: Allowed health data type ingestion by debug output.

Stability and Performance Improvements:

  • Several stability and performance improvements have been made, including Zstd encoder thread safety. Additionally, error logging for output nodes has been added, dependencies on deprecated ingest configuration fields removed, transformation nodes set to use a no-op poder, and a feedback channel added for the health manager to ensure proper stop procedures.

Maintenance

  • Pod Listener Testing: Updated the pod listener to function as a no-op during node testing, ensuring it does not interfere with test scenarios.