MySQL Slow Query Pack
4 minute read
Edge Delta Pipeline Pack for MySQL Slow Query Log
Overview
The MySQL Slow Query pack efficiently processes MySQL slow query logs, providing detailed insights into database performance. It ingests logs, extracts key fields, and structures the data for easier analysis. By clustering log patterns, converting query execution details into metrics, and categorizing this information by database, user, and source, this pack enables comprehensive monitoring of query performance, helping to optimize database operations and identify potential bottlenecks.
Pack Description
1. Data Ingestion
The data flow starts with the compound_input node. This node serves as the entry point into the pipeline, capturing incoming MySQL slow query logs for processing.
2. Field Extraction
Logs then flow into the grok_extract_fields node, a Grok node. This node uses a custom-defined pattern to extract fields like timestamp, username, db_name, source_ip_v4, source_ip_v6, etc., structuring them as attributes. Converting unstructured log data into structured data allows for easier searching, analysis, and visualization.
3. Timestamp Transformation
The log_transform_timestamp node, a Log Transform node, updates log entries by converting the timestamp attribute to Unix Milliseconds format and setting it into the item["timestamp"] field. This transformation ensures that the original log timestamp is preserved.
- name: log_transform_timestamp
type: log_transform
transformations:
- field_path: item["timestamp"]
operation: upsert
value:
convert_timestamp(item["attributes"]["timestamp"], "2006-01-02T15:04:05.999999Z",
"Unix Milli")
4. Log Pattern Clustering
Logs are processed by the log_to_pattern node, a Log to Pattern node, to generate clusters representing common log patterns.
- name: log_to_pattern
type: log_to_pattern
num_of_clusters: 10
samples_per_cluster: 5
This node distinguishes between various log patterns, providing a structural understanding of log data which can assist in anomaly detection or refining alert mechanisms.
5. Query Time Metric Conversion
Logs are further processed by the query_time_l2m node, a Log to Metric node, to convert query_time information into metrics.
- name: query_time_l2m
type: log_to_metric
pattern: .*
interval: 1m0s
skip_empty_intervals: false
only_report_nonzeros: false
metric_name: latency
dimension_groups:
- field_dimensions:
- item["attributes"]["db_name"]
field_numeric_dimension: item["attributes"]["query_time"]
custom_suffix: by_database
enabled_stats:
- count
- min
- max
- p95
- p99
- field_dimensions:
- item["attributes"]["username"]
field_numeric_dimension: item["attributes"]["query_time"]
custom_suffix: by_user
enabled_stats:
- count
- min
- max
- p95
- p99
- field_dimensions:
- item["attributes"]["source_ip_v4"]
- item["attributes"]["source_ip_v6"]
field_numeric_dimension: item["attributes"]["query_time"]
custom_suffix: by_source
enabled_stats:
- count
- min
- max
- p95
- p99
The dimensions provide insights into how queries perform across different categories:
- By Database (
db_name): Enables performance tracking of queries per database, highlighting which databases are handling slower queries. - By User (
username): Allows you to discern whether certain users generate heavier loads on the system. - By Source IP (
source_ip_v4,source_ip_v6): Facilitates the identification of network sources contributing to the load, helping in network-level optimizations.
6. Lock Time Metric Conversion
Similarly, the lock_time_l2m node processes lock_time data to provide metrics concerning database locking performance issues.
- name: lock_time_l2m
type: log_to_metric
pattern: .*
interval: 1m0s
skip_empty_intervals: false
only_report_nonzeros: false
metric_name: query
dimension_groups:
- field_dimensions:
- item["attributes"]["db_name"]
field_numeric_dimension: item["attributes"]["lock_time"]
custom_suffix: by_database
enabled_stats:
- count
- min
- max
- p95
- p99
- field_dimensions:
- item["attributes"]["username"]
field_numeric_dimension: item["attributes"]["lock_time"]
custom_suffix: by_user
enabled_stats:
- count
- min
- max
- p95
- p99
- field_dimensions:
- item["attributes"]["source_ip_v4"]
- item["attributes"]["source_ip_v6"]
field_numeric_dimension: item["attributes"]["lock_time"]
custom_suffix: by_source
enabled_stats:
- count
- min
- max
- p95
- p99
7. Rows Sent Metric Conversion
The rows_sent_l2m node transforms rows_sent data into metrics for understanding query returns.
- name: rows_sent_l2m
type: log_to_metric
pattern: .*
interval: 1m0s
skip_empty_intervals: false
only_report_nonzeros: false
metric_name: query
dimension_groups:
- field_dimensions:
- item["attributes"]["db_name"]
field_numeric_dimension: item["attributes"]["rows_sent"]
custom_suffix: by_database
enabled_stats:
- count
- sum
- field_dimensions:
- item["attributes"]["username"]
field_numeric_dimension: item["attributes"]["rows_sent"]
custom_suffix: by_user
enabled_stats:
- count
- sum
- field_dimensions:
- item["attributes"]["source_ip_v4"]
- item["attributes"]["source_ip_v6"]
field_numeric_dimension: item["attributes"]["rows_sent"]
custom_suffix: by_source
enabled_stats:
- count
- sum
8. Rows Examined Metric Conversion
Similarly, the rows_examined_l2m node converts rows_examined data into metrics to evaluate query performance based on resource consumption.
- name: rows_examined_l2m
type: log_to_metric
pattern: .*
interval: 1m0s
skip_empty_intervals: false
only_report_nonzeros: false
metric_name: query
dimension_groups:
- field_dimensions:
- item["attributes"]["db_name"]
field_numeric_dimension: item["attributes"]["rows_examined"]
custom_suffix: by_database
enabled_stats:
- count
- sum
- field_dimensions:
- item["attributes"]["username"]
field_numeric_dimension: item["attributes"]["rows_examined"]
custom_suffix: by_user
enabled_stats:
- count
- sum
- field_dimensions:
- item["attributes"]["source_ip_v4"]
- item["attributes"]["source_ip_v6"]
field_numeric_dimension: item["attributes"]["rows_examined"]
custom_suffix: by_source
enabled_stats:
- count
- sum
9. Logging and Metrics Outputs
logs_outputnode collects all processed logs for further analysis.patterns_outputcollects output patterns from thelog_to_patternnode, ensuring structural insights into query execution operations.query_time_metrics_output,lock_time_metrics_output,rows_sent_metrics_output, androws_examined_metrics_outputnodes capture metrics for monitoring query performance and operational efficiency.
Sample Input
# Time: 2024-09-17T17:58:31.649167Z
# User@Host: Quigley5177[root] @ localhost [d9f5:c1c1:9699:bff4:b8ae:91b3:7f67:8fa8] Id: 469835
# Query_time: 120.408058 Lock_time: 0.011555 Rows_sent: 268756 Rows_examined: 8550280
SELECT article, dealer, price
FROM shop
WHERE price=(SELECT MAX(price) FROM shop);
# Time: 2024-09-17T17:58:31.649201Z
# User@Host: Hartmann6021[example] @ 180.68.18.209 [26aa:ed3d:4d9d:9f42:45c9:aa7e:dea2:620c] Id: 917215
# Query_time: 54.781643 Lock_time: 0.013071 Rows_sent: 6396691 Rows_examined: 9678866
CREATE TABLE animals (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO animals (name) VALUES
('dog'),('cat'),('penguin'),
('lax'),('whale'),('ostrich');
SELECT * FROM animals;