Manage Log Timestamps with Edge Delta

Understand and manage timestamps using Edge Delta.

Background

Log timestamps are generated by workloads at the moment the log is emitted. This timestamp is generally reflected in the item["body"] of the log when it is ingested by the Edge Delta agent.

The item["timestamp"] attribute is created as Unix time in milliseconds at the moment the log is ingested by the Edge Delta Fleet. The log search page renders this timestamp in your local timezone.

Overview

Sometimes there is a delay between when the log is created and when it is ingested by the fleet. This can lead to a slight difference between the two timestamps.

There are many factors that might affect the difference between the emitted and the ingested timestamps. Here are just two examples:

  • Network Latency and Log Batching Strategies: When logs are transported across a network, especially in distributed systems, network latency can play a significant role. Moreover, if log batching is employed to reduce the number of network requests, this may introduce additional wait times before logs are sent to the Edge Delta Fleet.

  • Serverless Architectures like AWS Lambda: When using serverless functions such as AWS Lambda, logs may not be emitted immediately. Instead, they are typically batched and processed at the end of the function execution, which can cause a delay in ingestion by the Edge Delta Fleet.

See Troubleshoot Latency for more information.

Consider the following log:

{2024-09-16T02:28:52.094Z} [ {1234} ] {{298312818-600780716}} <{WARN}> ({server}({demo-application})?|clickhouse-server): {User input validation failed}

This log was created by the workload at 02:28:52.094 UTC (or 11:58:02.094 UTC+9.5). However, when viewing the same log in Log Search, the timestamp shows a time of 11:59:02.127 (UTC+9.5):

When corrected for timezone, the timestamps shows a time difference of 10.33 seconds. While this may seem like a trivial difference, it could result in logs being viewed out of order in Log Search when sorted by timestamp. This could make it more difficult to determine the root cause of issues highlighted by the logs.

If this is an issue for your environment, consider transforming the item["timestamp"] to match the item["body"] timestamp using a Log Transform node, which is configured using CEL macros.

Process Overview

To convert the item["timestamp"] to the timestamp reflected in item["body"]:

  1. Evaluate item["body"] and its timestamp
  2. Write a regex capture CEL macro to capture the item["body"] timestamp
  3. If necessary, write a CEL macro to convert the item["body"] timestamp to Unix Milli.
  4. Configure a Log Transform node.

1. Evaluate the Log

Consider the source log again:

{2024-09-16T02:28:52.094Z} [ {1234} ] {{298312818-600780716}} <{WARN}> ({server}({demo-application})?|clickhouse-server): {User input validation failed}

Note the location of the timestamp, which is important when creating the regex capture, and also its format is ISO 8601, indicating it will also need to be converted to Unix time in milliseconds before being upserted into item["timestamp"].

2. Capture the Timestamp

The Log Transform node is configured using CEL macros to identify the value of an upserted field.

A regex_capture CEL macro can be used to capture the timestamp. As a reminder, this is the required syntax:

regex_capture(input string, regexWithCaptureGroups string)

The following macro will capture the timestamp:

regex_capture(item["body"],"\\{(?P<match>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3}Z)\\}").match
  • item["body"]: contains the log string.
  • \{ and \}: Matches the literal curly braces { and } that enclose the timestamp. Double escaping is used because this regex is for YAML configurations where a single backslash must be escaped.
  • (?P<match> ... ): This is a named capture group with the name “match” that captures the timestamp contents.
  • \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z: This is the pattern for the timestamp:
    • \d{4}, \d{2}, \d{2}: Matches four digits for the year, and two digits each for the month and day.
    • T: Matches the ‘T’ character separating date and time.
    • \d{2}:\d{2}:\d{2}: Matches two digits each for hour, minute, and second, separated by colons.
    • \.\d{3}: Matches a literal dot followed by three digits for the milliseconds.
    • Z: Matches the literal character ‘Z’ indicating UTC timezone.
  • .match is how you access the value of the named capture group “match” in the map returned by the regex_capture function. This is similar to how you access entries in a dictionary or a map by their keys.

The captured timestamp is 2024-09-16T02:28:52.094Z and it is assigned to a capture group “match”.

LLMs can assist in designing CEL macros or regex patterns.

3. Convert the Timestamp

The item["timestamp"] attribute is specified in Unix time in milliseconds so the captured timestamp from the previous step will also need to be converted using the convert_timestamp CEL macro.

As a reminder, this is the required syntax:

convert_timestamp(input location string, input format string, output format string)
  • input location string: This is the location of the input timestamp, in this instance it will be the regex_capture designed in the previous step.
  • input format string: This is an example of the format of the current timestamp.
  • output format string: This defines the required format of the output time.

For the input and output format strings you copy the appropriate listed example.

The following macro will convert the timestamp:

convert_timestamp(regex_capture(item["body"],"\\{(?P<match>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3}Z)\\}").match, "2006-01-02T15:04:05.000Z", "Unix Milli")

The input location is the regex_capture CEL macro defined in the previous step, the input format is the listed example with the same format as the sample log’s timestamp, and the output format is specified as “Unix Milli”, also from the listed examples.

Note: If the input timestamp does not specify a timezone it is assumed to be UTC.

4. Configure a Log Transform node

Configure a log transform node as follows:

  1. Click Pipelines.
  2. Select the fleet in which you want to convert timestamps and click View/Edit Pipeline.
  3. Click Edit Mode.
  4. Click Add Processor.
  5. Expand Transformations and select Log Transform.
  6. Specify a name such as timestamp and click Add New.
  7. Enter item["timestamp"] in the Field Path.
  8. Select Upsert for the Operation.
  9. Enter the convert_timestamp macro for the Value field:
convert_timestamp(regex_capture(item["body"],"\\{(?P<match>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{3}Z)\\}").match, "2006-01-02T15:04:05.000Z", "Unix Milli")
  1. Click Save Changes.
  2. Connect the Log Transform node to the pipeline.
  3. Click Review Changes.
  4. Click Save Changes.

The item["timestamp"] time will now match the item["body"] time, enabling you to search for and sort by the timestamp at which the log event was emitted from its data source.

Troubleshoot Latency

There are a number of reasons for latency from the moment of telemetry creation to the moment of data ingestion by the Edge Delta agent. Use these steps to troubleshoot this issue:

  1. Measure network latency. Monitor the network between the source and the destination to identify potential bottlenecks. You can add a field to capture the delay between timestamps using a math CEL function. In addition, use tools like ping, traceroute, or network monitoring software to assess latency.
  2. Review log batching and buffering configurations. If you have telemetry processing components upstream of or before the Edge Delta agent, such as an OTEL collector, examine their configuration and the configuration of your data sources to ensure batching and buffering settings align with your latency requirements. Adjust the size and timeout settings of buffers and batches to reduce delay. Also check for upstream log aggregation services and check their processing delays and configurations.
  3. Synchronize system clocks. Verify time synchronization across all systems involved in log generation, transport, and ingestion. Ensure NTP is configured and operational on all systems.
  4. Inspect serverless functions. For environments like AWS Lambda, review function execution times and the timing of log events.
  5. Monitor resources. Monitor system resources to ensure there are no CPU, memory, or I/O bottlenecks.
  6. Perform end-to-end testing. Send test logs through the system and manually track their journey to identify where delays occur. Document the time at each stage from generation to ingestion.

By systematically analyzing and addressing each potential cause, you can narrow down the reasons for the discrepancies and take appropriate actions to minimize delays.