MySQL Error Pack

This is a MySQL Error pack that ingests and structures log data for easier interpretation, routes logs based on severity, and converts errors and warnings into actionable metrics

6 minute read

Edge Delta Pipeline Pack for MySQL Error Logs

Overview

The MySQL Error Logs pack efficiently processes MySQL error logs to enhance troubleshooting and system analysis. It ingests and structures log data for easier interpretation, routes logs based on severity, and converts errors and warnings into actionable metrics. This pack also detects log patterns, enabling identification of recurring issues and facilitating comprehensive monitoring and proactive maintenance.

Pack Description

1. Data Ingestion

The data flow starts with the compound_input node. This node acts as the entry point into the pipeline, where it begins processing the incoming MySQL Error logs.

2. Field Extraction

Once ingested, logs move to the grok_extract_fields node, which is a Grok node. This node utilizes a specified Grok pattern to extract fields such as timestamp, thread, label, error_code, sub_system, and message. This transformation converts unstructured log data into structured data, enabling easier search, analysis, and visualization.

3. Timestamp Transformation

The log_transform_timestamp node, a Log Transform node, updates the log entries by converting the timestamp attribute to a Unix Milliseconds format using the Edge Delta convert_timestamp macro. This ensures that the original log timestamp is used for the log, rather than the timestamp generated by Edge Delta when the agent ingested the log.

  - name: log_transform_timestamp
    type: log_transform
    transformations:
      - field_path: item["timestamp"]
        operation: upsert
        value:
          convert_timestamp(item["attributes"]["timestamp"], "2006-01-02T15:04:05.999999Z",
          "Unix Milli")

4. Status-Based Routing

The logs then proceed to the status_router node, a Route node. This node routes logs based on the label attribute: logs where label matches “Warning” are routed to the warning path, and logs where label matches “ERROR” are routed to the error path. Logs that do not match these conditions are sent to the other_logs output.

  - name: status_router
    type: route
    paths:
      - path: warning
        condition: regex_match(item["attributes"]["label"], "(?i)Warning")
        exit_if_matched: true
      - path: error
        condition: regex_match(item["attributes"]["label"], "(?i)ERROR")
        exit_if_matched: true

5. Log to Metric Conversion for Error Logs

Error logs pass through the error_l2m node, a Log to Metric node. This node tracks and reports metrics such as the count of errors, categorized by error_code and sub_system.

  - name: error_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: error
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["error_code"]
          - item["attributes"]["sub_system"]
        custom_suffix: by_code_and_sub_system
        enabled_stats:
          - count
          - min
          - max
      - enabled_stats:
          - count
          - min
          - max

Explanation:

The pattern is a regex expression that matches all log entries. This catch-all pattern means that every log line will be considered for metric conversion.
The interval specifies that metrics are generated and reported every 1 minute. Setting a reporting interval helps you analyze metrics over consistent periods, supporting trend analysis and monitoring, which is crucial for maintaining oversight and performance tracking.
skip_empty_intervals: When set to false, metrics are reported for every interval, even if no matching logs are found. This ensures that all reporting periods are accounted for, allowing for a complete view of log and metric activity over time.
only_report_nonzeros: Metrics are reported even if their value is zero, painting a full picture of the log data and potential inactivity. This leads to comprehensive reports that include potential periods of outages or lack of errors, which might otherwise be skipped.
metric_name: The base name for metrics generated from this configuration. This is used to categorize and identify the type of metric clearly. Having distinct metric names helps in easily tracking and understanding the nature of each metric, aiding in troubleshooting and pinpointing issues faster.
The dimension_groups section specifies how metrics should be derived and grouped:
- field_dimensions utilizes fields such as item[“attributes”][“error_code”] and item[“attributes”][“sub_system”] to group metrics. Adding dimensions allows you to differentiate metrics by these attributes, providing detailed insights and enabling fine-grained analysis of errors by code and subsystem.
- custom_suffix: by_code_and_sub_system is appended to the metric name to further specify the categorization of the metric. This customized naming helps in quickly identifying the grouped statistic context in dashboards or reports.
- enabled_stats: count, min, max specifies the types of statistics to be reported, including the count of occurrences, minimum, and maximum values within the defined group. These statistics provide essential insights into the frequency and extremity of the log events, supporting effective monitoring and alerting strategies.

6. Log to Metric Conversion for Warning Logs

Similarly, warning logs are processed by the warning_l2m node. Metrics such as the count of warnings are tracked and reported, categorized by sub_system.

  - name: warning_l2m
    type: log_to_metric
    pattern: .*
    interval: 1m0s
    skip_empty_intervals: false
    only_report_nonzeros: false
    metric_name: warning
    dimension_groups:
      - field_dimensions:
          - item["attributes"]["sub_system"]
        custom_suffix: by_sub_system
        enabled_stats:
          - count
          - min
          - max
      - enabled_stats:
          - count
          - min
          - max

Explanation:

The pattern is a regex expression that matches all log entries, ensuring every log line is considered for metric conversion.
The interval is set to 1 minute, specifying that metrics are generated and reported at this frequency.
skip_empty_intervals: When set to false, it ensures metrics are reported for every interval, even if no matching logs are present.
only_report_nonzeros: Reports metrics even if their value is zero, thereby offering a complete picture of log data, including periods of inactivity or absence of warnings.
metric_name: Used to define the base name for generated metrics, ensuring clear categorization and identification.
The dimension_groups defines how metrics are derived and grouped:
- field_dimensions uses fields such as item["attributes"]["sub_system"] to group metrics.
- custom_suffix: by_sub_system is appended to the metric name to clarify the metric’s categorization further.
- enabled_stats: count, min, max specify the statistics types to report, including the count of occurrences, minimum, and maximum values within the defined group.

7. Pattern Detection

Both error and warning logs are analyzed by the log_to_pattern node, a Log to Pattern node. This node identifies patterns within the logs and clusters them, which helps you recognize recurring issues or behaviors.

  - name: log_to_pattern
    type: log_to_pattern
    num_of_clusters: 10
    samples_per_cluster: 5

8. Outputs

The pipeline includes several compound_output nodes that capture and store outputs for archival or further evaluation, each with a specific data source:

error_logs: This node outputs error logs routed from the status_router node when the label matches “ERROR”.
warning_logs: Warning logs are directed to this node, capturing logs when the label matches “Warning” from the status_router.
other_logs: Captures logs routed from the status_router node on the unmatched path, keeping logs that don’t meet the specific error or warning criteria.
error_metrics_output: Collects metrics generated by the error_l2m node.
warning_metrics_output: Gathers metrics created by the warning_l2m node.
patterns_output: This node captures the patterns detected and formed by the log_to_pattern node.

Sample Input

2023-08-28T14:35:42.123456Z 0 [ERROR] [MY-012345] [InnoDB] Database page corruption detected

2023-09-12T21:12:18.987456Z 3 [ERROR] [MY-010119] [Server] Aborting due to a fatal error

2023-09-12T20:15:22.654321Z 1 [ERROR] [MY-011092] [Server] Disk full error while writing '.\/mysql-bin.000003'