Syslog Pack

This pack processes syslog messages to reduce size and extract fields.

Edge Delta Pipeline Pack for Syslog

Overview

The Edge Delta Syslog pack processes syslog messages by normalizing whitespace, parsing timestamps, and enriching logs with metadata to facilitate monitoring, searching, and alerting. It also attempts to fill in missing metadata based on host information. You can configure the pack to process either RFC5424 logs (default) or you can have it process RFC3164 and Linux format logs.

Pack Description

1. Data Ingestion

The data flow starts with the Source as the entry point into the pack.

2. Replace #011 with Space

The first transformation step is handled by the Replace #011 with space node, a Mask node.

- name: 'Replace #011 with space'
    type: mask
    pattern: '#011'
    mask: ' '

This replaces tab character representations “#011” with actual spaces. This change helps standardize the log format.

3. Replace Multiple Whitespace with Space

The next node is Replace multiple whitespace with space, another Mask node.

- name: Replace multiple whitespace with space
    type: mask
    pattern: \s\s+
    mask: ' '

It condenses multiple whitespace characters into a single space. This results in cleaner logs.

4. Remove Leading Space

The third node, Remove leading space, is also a Mask node.

- name: Remove leading space
    type: mask
    pattern: ^\s+
    mask: ""

It removes any leading spaces from the log messages. This step ensures the messages are uniformly aligned.

From here, logs flow by default to Parse RFC5424 Format. However, you can edit the pack to rather flow to the Parse RFC3164 and Linux Format node, which is currently orphaned.

5.1. Parse RFC5424 Format

If you keep the default pack configuration, logs flow from Remove Leading Space to this node, Parse RFC5424 format, a Grok node.

- name: Parse RFC5424 format
    type: grok
    pattern: <%{POSINT:pri}>%{POSINT:version}%{SPACE}%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{SYSLOGHOST:host}%{SPACE}%{SYSLOG5424PRINTASCII:appname}%{SPACE}%{SYSLOG5424PRINTASCII:procid}%{SPACE}%{SYSLOG5424PRINTASCII:msgid}%{SPACE}(?:-|(?<structuredData>(\[.*?[^\\]\])+))%{SPACE}%{GREEDYDATA:message}

It applies a Grok pattern to extract fields from logs formatted according to the RFC5424 standard. Parsing based on RFC5424 enables you to extract structured data from logs.

5.2. Extract Timestamp ISO8601

Logs on the RFC5424 path continue to the Extract Timestamp ISO8601, an OTTL Transform node.

- name: Extract Timestamp ISO8601
    type: ottl_transform
    statements: set(timestamp, UnixMilli(Time(attributes["timestamp"], "%Y-%m-%dT%H:%M:%SZ")))

This node converts the extracted timestamp into a standardized Unix Millisecond format as follows:

  • set: This function is used to assign a value to the timestamp field.
  • attributes["timestamp"]: This is accessing a value that is stored in the attributes map with the key timestamp.
  • Time: This function converts a string representation of a time to a time.Time object. The format used here is %Y-%m-%dT%H:%M:%SZ, which corresponds to the ISO 8601 standard format for representing date and time (e.g., 2023-01-01T00:00:00Z).
  • UnixMilli: This function converts a time.Time object into a Unix time.

This step is crucial for ensuring that log entries are accurately correlated across distributed systems. See Manage Log Timestamps with Edge Delta.

6.1. Parse RFC3164 and Linux Format

If you edit the configuration for RFC3164 and Linux Format, logs flow from Remove Leading Space to this node, Parse RFC3164 and Linux format, a Grok node.

- name: Parse RFC3164 and Linux format
    type: grok
    pattern: (<%{POSINT:pri}>)?%{SPACE}%{SYSLOGTIMESTAMP:timestamp}%{SPACE}%{SYSLOGHOST:host}%{SPACE}%{DATA:appname}(\[%{POSINT:procid}\])?:%{GREEDYDATA:message}

It extracts information from older syslog formats and parses it into structured data. Supporting RFC3164 ensures backward compatibility with legacy systems.

6.2. Extract Timestamp Syslog

Logs on the RFC3164 and Linux Format path continue to the Extract Timestamp Syslog, is another OTTL Transform node.

- name: Extract Timestamp Syslog
    type: ottl_transform
    statements: set(timestamp, UnixMilli(Time(attributes["timestamp"], "%b %d %H:%M:%S")))

This node converts timestamps from syslog-format timestamps as follows:

  • set: This function assigns a value to a specified telemetry field. In this case, it is setting the timestamp field.
  • attributes["timestamp"]: This is accessing a value in the attributes map with the key timestamp.
  • Time: This function converts a string representation of time into a time.Time object. The format used here is %b %d %H:%M:%S, where:
    • %b is the abbreviated month name (e.g., Jan, Feb).
    • %d is the day of the month as zero-padded number.
    • %H:%M:%S represents the hour (24-hour format), minute, and second, respectively.
  • UnixMilli: This function converts the time.Time object into a Unix time.

Utilizing a consistent timestamp format is vital for reliable event sorting and analysis. See Manage Log Timestamps with Edge Delta.

7. Lookup by Host

All logs, whether configured to flow on the RFC5424 path or the RFC3164 and Linux Format path are routed to Lookup by Host, a Lookup node. When you add this pack, a lookup table is automatically added to your lookup library in Edge Delta.

hostindexsourcetypesource
myhostnameindex-klinux-sysloglinux

You can edit the table to add your environment data.

- name: Lookup by host
    type: lookup
    location_path: ed://syslog_lookup.csv
    reload_period: 5m0s
    match_mode: exact
    regex_option: first
    key_fields:
    - event_field: item["attributes"]["host"]
    lookup_field: host
    out_fields:
    - event_field: item["attributes"]["index"]
    lookup_field: index
    - event_field: item["attributes"]["sourcetype"]
    lookup_field: sourcetype
    - event_field: item["attributes"]["source"]
    lookup_field: source

It enriches logs by matching the host to external data sources, filling fields like index, sourcetype, and source. The enrichment is valuable for categorizing logs and speeding up query time by allowing filters on frequently indexed fields.

8. Lookup by Source Host

The Lookup by Source Host node uses a similar Lookup operation.

- name: Lookup by source host
    type: lookup
    location_path: ed://syslog_lookup.csv
    reload_period: 5m0s
    match_mode: exact
    regex_option: first
    key_fields:
    - event_field: item["resource"]["host.ip"]
    lookup_field: host
    out_fields:
    - event_field: item["attributes"]["source_host"]["source"]
    lookup_field: source
    - event_field: item["attributes"]["source_host"]["sourcetype"]
    lookup_field: sourcetype
    - event_field: item["attributes"]["source_host"]["index"]
    lookup_field: index

It attempts to find metadata based on the host.ip resource, rather than the host attribute. This redundancy ensures that if metadata is missing, the system has an alternate path to potentially fill it.

9. Fill Missing Metadata

Finally, logs are processed by the Fill Missing Metadata node, an OTTL Transform node.

- name: Fill missing metadata
  type: ottl_transform
  statements: |-
    set(attributes["source"], attributes["source_host"]["source"]) where attributes["source"] == nil or attributes["source"] == ""
    set(attributes["source"], resource["host.ip"]) where attributes["source"] == nil or attributes["source"] == ""
    set(attributes["index"], attributes["source_host"]["index"]) where attributes["index"] == nil or attributes["index"] == ""
    set(attributes["index"], "syslog") where attributes["index"] == nil or attributes["index"] == ""
    set(attributes["sourcetype"], attributes["source_host"]["sourcetype"]) where attributes["sourcetype"] == nil or attributes["sourcetype"] == ""
    set(attributes["sourcetype"], "syslog") where attributes["sourcetype"] == nil or attributes["sourcetype"] == ""
    delete_key(attributes, "source_host")    

This node fills in any remaining gaps with default values, ensuring all logs have the necessary metadata for efficient querying and storage. The statements work as follows:

  1. Sets the attributes["source"] field to the value of attributes["source_host"]["source"] if attributes["source"] is either nil (not set) or an empty string.
  2. Sets the attributes["source"] field to resource["host.ip"] if attributes["source"] is nil or an empty string.
  3. Sets attributes["index"] to attributes["source_host"]["index"] if attributes["index"] is nil or an empty string.
  4. Sets attributes["index"] to the literal string syslog if attributes["index"] is nil or an empty string.
  5. Sets attributes["sourcetype"] to attributes["source_host"]["sourcetype"] if attributes["sourcetype"] is nil or an empty string.
  6. Sets attributes["sourcetype"] to syslog if attributes["sourcetype"] is nil or an empty string.
  7. Removes the source_host key from the attributes map.

This ensures that logs have a baseline of metadata for later analysis or alerting.

10. Processed Output

The processed logs are finally routed to the Processed compound output for downstream handling in the pipeline.

Sample Input

<165>1 2003-10-11T22:14:15.003Z myhostname myapp 1234 ID47 - [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] An application event log entry...