Ingest AWS RDS Metrics from CloudWatch
9 minute read
Overview
Amazon Relational Database Service (RDS) is a managed AWS service that simplifies deployment and scaling of relational databases like PostgreSQL and MariaDB. RDS automatically sends database metrics to AWS CloudWatch, including CPU utilization, replication status, and read/write IOPS.
Edge Delta’s Telemetry Pipelines enable teams to:
- Extract RDS metrics from CloudWatch via S3
- Standardize metrics using OpenTelemetry formats
- Correlate database metrics with external telemetry data pre-index
- Route metrics to cost-effective downstream destinations
Architecture
The ingestion flow consists of:
- CloudWatch Metric Streams send RDS metrics to Kinesis Data Firehose
- Kinesis Data Firehose batches and delivers metrics to an S3 bucket
- S3 Event Notifications notify an SQS queue when new data arrives
- Edge Delta Agent polls SQS and ingests metrics from S3
- Telemetry Pipeline processes and routes metrics to destinations
Prerequisites
- AWS account with RDS instances
- IAM permissions to create:
- CloudWatch Metric Streams
- S3 buckets and event notifications
- SQS queues
- Edge Delta account with cloud pipeline access
Configure AWS Components
Create an SQS Queue
- Open the Amazon SQS console
- Create a Standard queue with a descriptive name (e.g.,
rds-metrics-queue
) - Configure the access policy to allow S3 to send messages:
{
"Sid": "s3_send_statement",
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": [
"SQS:SendMessage"
],
"Resource": "arn:aws:sqs:AWS_REGION:AWS_ACCOUNT_ID:QUEUE_NAME",
"Condition": {
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:::BUCKET_NAME"
},
"StringEquals": {
"aws:SourceAccount": "AWS_ACCOUNT_ID"
}
}
}
- Save the queue URL for Edge Delta configuration
Create a CloudWatch Metric Stream
- Navigate to CloudWatch > Metrics > Streams
- Click Create metric stream
- Select metrics to include:
- For comprehensive monitoring: Choose AWS/RDS: All metric names
- For specific metrics: Select individual RDS metrics
- Configure the destination:
- Choose Amazon Kinesis Data Firehose
- Create or select a Firehose delivery stream with:
- Destination: Amazon S3
- Output format: JSON (recommended) or Parquet
- Compression: GZIP (recommended for cost savings)
- Buffer interval: 60 seconds (for near real-time delivery)
- Note the metric stream name and S3 bucket path
Configure S3 Event Notifications
- Navigate to the S3 bucket receiving metric stream data
- Go to Properties > Event notifications
- Click Create event notification:
- Event name:
rds-metrics-notification
- Event types: Select All object create events
- Destination: Choose SQS queue
- Select the SQS queue created earlier
- Event name:
- Save the configuration
Configure IAM Permissions
Create an IAM policy for Edge Delta to access AWS resources:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EdgeDeltaRDSMetricsAccess",
"Effect": "Allow",
"Action": [
"sqs:DeleteMessage",
"sqs:DeleteMessageBatch",
"sqs:ReceiveMessage",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::BUCKET_NAME/*",
"arn:aws:sqs:REGION:ACCOUNT_ID:QUEUE_NAME"
]
}
]
}
Attach this policy to an IAM user or role for Edge Delta authentication.
Configure Edge Delta Pipeline
Create a Cloud Pipeline
- Log into the Edge Delta web app
- Navigate to Pipelines > Cloud Pipelines
- Click Create Cloud Pipeline
- Provide a name (e.g.,
rds-metrics-pipeline
)
Add S3 Source Node
In the pipeline editor, click Add Node and select S3 from the source nodes. The S3 source node configuration requires the SQS queue URL and AWS region as mandatory parameters. For detailed parameter descriptions, refer to the S3 Source Node documentation.
CloudWatch Metric Streams typically compress data using gzip, so you should set the compression parameter accordingly. For authentication, you can use either IAM roles with assumed permissions or AWS access keys. IAM roles are recommended for production environments as they provide better security through temporary credentials and don’t require storing long-lived access keys.
When using IAM role authentication, the role_arn parameter specifies which role to assume. The external_id parameter is optional but strongly recommended when Edge Delta assumes roles across AWS accounts, as it provides an additional security layer to prevent confused deputy attacks. If you choose access key authentication instead, both aws_key_id and aws_sec_key must be provided together.
nodes:
- name: rds_metrics_s3_input
type: s3_input
sqs_url: https://sqs.us-west-2.amazonaws.com/123456789/rds-metrics-queue
region: us-west-2
compression: gzip
role_arn: arn:aws:iam::123456789:role/edge-delta-rds-metrics
external_id: unique-external-id-12345
Process RDS Metrics
Configure processors to parse and transform CloudWatch metrics into OpenTelemetry format. The processing pipeline handles the Kinesis Firehose JSON structure and extracts meaningful metrics using Edge Delta’s transform processors.
Start with the JSON Unroll processor to extract individual metric records from the Firehose batch. CloudWatch Metric Streams via Kinesis Firehose wrap multiple metrics in a records array when using JSON output format. The unroll processor creates separate telemetry items for each array element, preserving all resource and attribute information. If using Parquet output format in Kinesis Firehose, adjust the processor configuration accordingly as the data structure will differ. For JSON format, ‘records’ is the standard field path.
Use the Copy Field processor to map CloudWatch metric fields to standardized attributes. The processor uses OTTL statements like set(attributes["metric_name"], attributes["metric_record"]["metric_name"])
to copy values between fields. When copying numeric values for metrics, apply type conversion using OTTL functions like Double()
to ensure proper data types.
The Extract Metric processor generates proper metric items from the parsed data. Configure extraction rules for each RDS metric type with appropriate units and metric kinds. Gauges represent instantaneous values like CPU utilization, while sums work better for cumulative metrics like IOPS.
processors:
- name: rds_metrics_pipeline
type: sequence
processors:
- type: json_unroll
metadata: '{"name":"Unroll Firehose Records"}'
data_types:
- log
field_path: body
json_field_path: records
new_field_name: metric_record
- type: ottl_transform
metadata: '{"id":"map_cloudwatch","type":"copy-field","name":"Map CloudWatch Fields"}'
data_types:
- log
statements: set(attributes["metric_name"], attributes["metric_record"]["metric_name"])
- type: ottl_transform
metadata: '{"id":"extract_value","type":"copy-field","name":"Extract Metric Value"}'
data_types:
- log
statements: set(attributes["metric_value"], Double(attributes["metric_record"]["value"]["sum"]))
- type: ottl_transform
metadata: '{"id":"map_instance","type":"copy-field","name":"Map DB Instance"}'
data_types:
- log
statements: set(resource["db.instance"], attributes["metric_record"]["dimensions"]["DBInstanceIdentifier"])
- type: extract_metric
metadata: '{"name":"Generate RDS Metrics"}'
extract_metric_rules:
- name: rds_cpu_utilization
description: RDS instance CPU utilization percentage
unit: "%"
gauge:
value: attributes["metric_value"]
condition: attributes["metric_name"] == "CPUUtilization"
- name: rds_database_connections
description: Number of database connections in use
unit: "1"
gauge:
value: attributes["metric_value"]
condition: attributes["metric_name"] == "DatabaseConnections"
- name: rds_read_iops
description: Average number of disk read I/O operations per second
unit: "1/s"
sum:
value: attributes["metric_value"]
condition: attributes["metric_name"] == "ReadIOPS"
- name: rds_write_iops
description: Average number of disk write I/O operations per second
unit: "1/s"
sum:
value: attributes["metric_value"]
condition: attributes["metric_name"] == "WriteIOPS"
You can extend these extraction rules to include additional RDS metrics such as FreeStorageSpace (gauge type for available storage), ReplicaLag (gauge type for replication delay in seconds), SwapUsage (gauge type for swap space utilization), and BinLogDiskUsage (gauge type for binary log storage). Each metric should be configured with the appropriate unit and type based on whether it represents an instantaneous value or a cumulative counter.
For high-volume metrics, consider adding an Aggregate Metric processor to reduce data before sending to destinations. This can significantly reduce costs while maintaining metric fidelity through statistical aggregation.
Route to Destinations
Configure destination nodes to send processed metrics:
- Edge Delta Observability Platform: For monitoring and dashboards
- OpenTelemetry Collector: For further processing
- Third-party platforms: Datadog, Splunk, New Relic, etc.
Monitoring and Validation
Enrich RDS Metrics with Context
Use the Lookup processor to enrich RDS metrics with additional context from lookup tables. This processor can add metadata like database environment tags, cost center information, or alert thresholds based on instance identifiers.
Create a CSV lookup table mapping RDS instance names to metadata:
instance_id,environment,team,cost_center,cpu_threshold
prod-mysql-01,production,platform,eng-001,80
staging-postgres-02,staging,platform,eng-002,90
Configure the lookup processor to match on resource["db.instance"]
and add enrichment fields:
- type: lookup
metadata: '{"name":"Enrich RDS Metrics"}'
location_path: ed://rds_metadata.csv
reload_period: 10m0s
match_mode: exact
key_fields:
- event_field: resource["db.instance"]
lookup_field: instance_id
out_fields:
- event_field: attributes["environment"]
lookup_field: environment
- event_field: attributes["team"]
lookup_field: team
- event_field: attributes["cost_center"]
lookup_field: cost_center
Use the Add Field processor to tag metrics based on conditions. For example, mark high CPU usage:
- type: ottl_transform
metadata: '{"type":"add-field","name":"Tag High CPU"}'
condition: attributes["metric_name"] == "CPUUtilization" and attributes["metric_value"] > 80
statements: set(attributes["alert_severity"], "high")
The Aggregate Metric processor can then group enriched metrics by these new fields for better analysis and routing decisions.
Verify Data Flow
- Check SQS queue metrics for message activity
- Monitor pipeline logs for ingestion status
- Validate metrics appear in destination platforms
- Confirm correlation rules are matching expected patterns
Create Dashboards
Configure dashboards in Edge Delta to visualize RDS metrics. See Create a Dashboard for detailed instructions.
Navigate to Dashboards and click New Dashboard to start building. Add Dashboard Variables to make your dashboard interactive:
- Use Facet Option Variables to filter by
resource["db.instance"]
for specific database selection - Add Metric Name Variables to switch between different RDS metrics dynamically
- Configure String Variables for environment selection (production, staging, development)
Drag widgets from the toolbox to visualize RDS metrics. Configure time series widgets to display:
- CPU utilization trends using the
rds_cpu_utilization
metric - IOPS patterns by combining
rds_read_iops
andrds_write_iops
metrics - Connection count using
rds_database_connections
with appropriate thresholds - Storage capacity trends with percentage calculations
Group metrics by the enriched attributes from the lookup processor (environment, team, cost_center) to create filtered views. Reference variables in widget configurations using the $variable_key
syntax to make dashboards respond to user selections.
Save custom views for specific database instances or environments using the Save View feature. This allows quick access to frequently monitored database configurations without reconfiguring variables each time.
Configure Monitors
Create monitors in Edge Delta’s Observability Platform to track RDS metrics and generate alerts. Configure different monitor types based on your alerting requirements.
Use Metric Threshold Monitors for RDS performance metrics. Set thresholds for CPU utilization, IOPS, and connection counts with appropriate evaluation windows:
- Database CPU utilization: Alert when above 80% for 5-minute evaluation window
- Read/Write IOPS: Warn when exceeding baseline by 50% using 15-minute rollup
- Database connections: Alert when approaching connection limit (e.g., above 90% of max_connections)
- Storage capacity: Warn at 85% full, alert at 95% full
Configure Pattern Anomaly Monitors to detect unusual database behavior patterns. These monitors use sensitivity settings to identify spikes in error patterns or unusual query patterns in RDS logs when processed alongside metrics.
For complex scenarios involving multiple metrics, create Composite Monitors that evaluate conditions across multiple monitors:
- Combine high CPU AND high connection count monitors to detect resource exhaustion
- Use OR logic to alert when either replication lag exceeds threshold OR primary instance shows errors
- Configure AND logic for correlated issues like high IOPS with increased error rates
Set appropriate aggregation methods (sum, average, max) and rollup windows based on metric characteristics. Use grouping by resource["db.instance"]
to receive per-database alerts rather than aggregate notifications.
Best Practices
- Start with essential metrics: Begin with CPU, IOPS, and connection metrics, then expand coverage
- Configure retention wisely: Set appropriate S3 retention periods based on compliance requirements
- Optimize costs: Implement S3 lifecycle policies to transition old metrics to cheaper storage tiers
- Leverage pre-index processing: Because metrics are standardized and enriched pre-index, you can reduce downstream ingestion and storage costs while still retaining full context in Edge Delta
- Enrich with correlation: Connect RDS metrics with application traces and logs for full context
- Apply OpenTelemetry standards: Use semantic conventions for consistent metric naming and attributes
Troubleshooting
No Metrics Appearing
- Verify CloudWatch Metric Stream shows “Active” status
- Check S3 bucket contains metric files in the configured path
- Confirm SQS queue shows message activity
- Review Edge Delta pipeline logs for ingestion errors
Authentication Issues
- Verify IAM policy contains all required S3 and SQS permissions
- Validate AWS credentials are correctly configured in Edge Delta
- Confirm region matches where resources are deployed
Data Format Problems
- Confirm Kinesis Firehose output format matches processor expectations
- Check JSON parsing for base64 decoding if needed
- Verify field mappings align with CloudWatch metric structure