Reduce Metric Cardinality

Strategies for controlling timeseries growth by dropping attributes, normalizing values, and using aggregation processors.

5 minute read

Overview

Metric cardinality - the number of unique timeseries - directly impacts costs, performance, and system health. This guide covers strategies for reducing cardinality at the edge, before metrics reach expensive downstream destinations.

For foundational concepts, see Metric Cardinality.

Strategy 1: Drop high-cardinality attributes

The most effective cardinality reduction is removing attributes that generate excessive unique values.

Common attributes to drop

Attribute	Why Drop	Alternative
`pod_id`	Unique per pod instance	Aggregate to `service.name`
`container_id`	Unique per container	Aggregate to `pod` or `service`
`request_id`	Unique per request	Use traces for request-level detail
`user_id`	Unique per user	Hash to buckets or remove
`session_id`	Unique per session	Remove from metrics

Using the Custom processor

The Custom processor executes OTTL statements to drop specific attributes from metrics:

- name: drop_high_cardinality
  type: custom
  data_types:
    - metric
  statements:
    - delete_key(attributes, "pod_id")
    - delete_key(attributes, "container_id")
    - delete_key(attributes, "request_id")

Using the Delete Field processor

For simpler cases, the Delete Field processor removes a single field:

- name: delete_pod_id
  type: delete_field
  data_types:
    - metric
  field_path: attributes["pod_id"]

Strategy 2: Normalize dynamic values

Dynamic values like URL paths create unbounded cardinality. Normalize them to bounded sets.

URL path normalization

Convert dynamic path segments to placeholders:

Before	After
`/users/12345`	`/users/{id}`
`/orders/abc-def-ghi`	`/orders/{id}`
`/products/SKU-99999`	`/products/{sku}`

The Custom processor with replace_pattern statements normalizes these paths:

- name: normalize_urls
  type: custom
  data_types:
    - metric
  statements:
    # Replace numeric IDs
    - replace_pattern(attributes["url.path"], "/[0-9]+", "/{id}")
    # Replace UUIDs
    - replace_pattern(attributes["url.path"], "/[a-f0-9-]{36}", "/{uuid}")
    # Replace SKUs
    - replace_pattern(attributes["url.path"], "/SKU-[A-Z0-9]+", "/{sku}")

Status code grouping

Reduce granularity by grouping similar values:

Before	After
200, 201, 204	2xx
400, 401, 403, 404	4xx
500, 502, 503	5xx

The Custom processor groups status codes using replace_pattern:

- name: group_status_codes
  type: custom
  data_types:
    - metric
  statements:
    - replace_pattern(attributes["http.status_code"], "^2[0-9]{2}$", "2xx")
    - replace_pattern(attributes["http.status_code"], "^3[0-9]{2}$", "3xx")
    - replace_pattern(attributes["http.status_code"], "^4[0-9]{2}$", "4xx")
    - replace_pattern(attributes["http.status_code"], "^5[0-9]{2}$", "5xx")

Strategy 3: Use aggregation processors

Aggregation naturally reduces cardinality by grouping metrics. The Aggregate Metric processor and Rollup Metric processor provide different levels of reduction.

Aggregate Metric processor

The Aggregate Metric processor uses group_by to specify which attributes to preserve. All others are dropped:

- name: aggregate_metrics
  type: aggregate_metric
  data_types:
    - metric
  aggregation_type: sum
  interval: 60s
  group_by:
    - service.name
    - http.method
    - http.status_code
  # Keep only group by keys drops all other attributes
  keep_only_group_by_keys: true

Before aggregation: Metrics with pod_id, container_id, request_id, plus the group_by keys

After aggregation: Only service.name, http.method, http.status_code remain

Rollup Metric processor

For maximum reduction, the Rollup Metric processor creates a single aggregated value without any grouping:

- name: rollup_metrics
  type: rollup_metric
  data_types:
    - metric
  aggregation_type: sum
  interval: 60s

This produces one value per metric name per interval - the lowest possible cardinality.

Strategy 4: Filter metrics by name

Some metrics are not worth the cardinality cost. The Filter processor drops them entirely:

- name: filter_noisy_metrics
  type: filter
  data_types:
    - metric
  condition: 'not (name matches "debug\\..*" or name matches "internal\\..*")'

This keeps only metrics that do not start with debug. or internal..

Strategy 5: Conditional reduction by environment

Apply aggressive reduction in development and staging while preserving detail in production. Use the Route processor to direct metrics to different aggregation paths based on environment.

flowchart LR
    A[Metrics Input] --> B{Route by Environment}
    B -->|production| C[Production Aggregation]
    B -->|staging| D[Staging Aggregation]
    B -->|unmatched| E[Dev Aggregation]
    C --> F[Output]
    D --> F
    E --> F

Route processor: Separate metrics by environment using path conditions:

paths:
- path: production
  condition: resource["deployment.environment"] == "production"
- path: staging
  condition: resource["deployment.environment"] == "staging"
# Unmatched items (dev) go to the default "unmatched" path

Production Aggregate Metric: High resolution with 30-second intervals. Preserve detailed dimensions for troubleshooting:

aggregation_type: sum
interval: 30s
group_by: [service.name, http.method, http.status_code, http.route]
keep_only_group_by_keys: true

Staging Aggregate Metric: Moderate resolution with 60-second intervals. Keep essential dimensions for validation:

aggregation_type: sum
interval: 60s
group_by: [service.name, http.method]
keep_only_group_by_keys: true

Dev Aggregate Metric: Aggressive reduction with 5-minute intervals. Minimize costs while maintaining basic visibility:

aggregation_type: sum
interval: 300s
group_by: [service.name]
keep_only_group_by_keys: true

Example: Complete cardinality reduction pipeline

Combine strategies for comprehensive cardinality control.

flowchart LR
    A[OTLP Input] --> B[Drop High-Cardinality Attributes]
    B --> C[Normalize Dynamic Values]
    C --> D[Aggregate Metrics]
    D --> E[OTLP Output]

Drop high-cardinality attributes: Use the Custom processor with OTTL statements to remove attributes that generate excessive unique values:

statements:
- delete_key(attributes, "pod_id")
- delete_key(attributes, "container_id")
- delete_key(attributes, "request_id")
- delete_key(attributes, "trace_id")

Normalize dynamic values: The Custom processor uses replace_pattern to convert dynamic URL segments to placeholders:

statements:
- replace_pattern(attributes["url.path"], "/[0-9]+", "/{id}")
- replace_pattern(attributes["url.path"], "/[a-f0-9-]{36}", "/{uuid}")

Aggregate metrics: Use the Aggregate Metric processor with keep_only_group_by_keys: true to preserve only the dimensions you need:

aggregation_type: sum
interval: 60s
group_by: [service.name, http.method, http.status_code, url.path]
keep_only_group_by_keys: true

Measuring reduction effectiveness

Track cardinality before and after your pipeline to measure effectiveness:

Count distinct fingerprints at pipeline input
Count distinct fingerprints at pipeline output
Calculate reduction percentage: (before - after) / before × 100

Use the Pipelines Dashboard to monitor input and output rates.

Best practices

When reducing metric cardinality:

Start with the highest-cardinality attributes first
Test reduction in non-production environments
Preserve attributes you need for alerting and dashboards
Document which attributes are dropped and why
Monitor for unexpected cardinality growth from new attributes