Data Reduction Examples

Examples of how to reduce data volumes with Edge Delta.

Overview

These examples demonstrate practical data reduction techniques using Edge Delta’s Telemetry Pipelines to achieve 20-90% volume reduction while maintaining observability. Each scenario shows real test results from production workloads, complete with before/after comparisons and working pipeline configurations. For a comprehensive introduction to data reduction strategies and concepts, see the Data Reduction overview page.

Scenario 1: Small Logs with Light Reduction

Consider this log

{
  "_type": "log",
  "timestamp": 1757678937068,
  "body": "2025-09-12T12:08:57.067Z ERROR middleware/authz.go:383 request flog_log_generator spec:{uri:/v1/orgs/b9df8fc0-084b-11ee-be56-0242ac120002/confs/- method:POST password:4o7sOIGH} latency:401ms",
  "resource": {
    "ed.demo": "",
    "ed.source.name": "telemetrygen_input_d9f7",
    "ed.source.type": "telemetrygen_input",
    "host.ip": "172.19.0.4",
    "host.name": "data-reduction-test-control-plane",
    "service.name": "telemetry-gen-telemetrygen_input_d9f7"
  },
  "attributes": {}
}

When this configuration is applied log size is reduced by about 21%.

This stack parses only the essentials from each line (timestamp, level, method, URI, latency) and then synthesizes a compact message from those fields. It replaces the verbose body with that compact form so downstream processors and destinations handle a much smaller payload. Finally, it cleans up by dropping all temporary attributes and removing low‑value resource keys, leaving just a lean body plus the minimal metadata you intend to keep. The result is a lossless summary for operational needs, without carrying along extra parse artifacts or host details you don’t analyze.

Resulting in logs like this:

{
  "_type": "log",
  "timestamp": 1757678937068,
  "body": "2025-09-12T12:08:57.067Z ERROR POST /v1/orgs/b9df8fc0-084b-11ee-be56-0242ac120002/confs/- latency=401ms",
  "resource": {
    "ed.source.name": "telemetrygen_input_d9f7",
    "ed.source.type": "telemetrygen_input",
    "host.ip": "172.19.0.4",
    "host.name": "data-reduction-test-control-plane",
    "service.name": "telemetry-gen-telemetrygen_input_d9f7"
  },
  "attributes": {}
}

The GUI generates a pipeline with the following YAML in the Multi-processor node.

- type: comment
  metadata: '{"id":"ZRhKj7-SE6u5TYz1tX-9o","type":"comment","name":"Keep: timestamp, level, method, uri, latency"}'
  data_types:
  - log
- type: ottl_transform
  metadata: '{"id":"uOkKAdLBzduEcwDMDNg35","type":"parse-grok","name":"Parse Grok"}'
  data_types:
  - log
  statements: |-
    merge_maps(attributes, ExtractGrokPatterns(body, "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{DATA} request %{DATA} spec:\\{uri:%{DATA:uri} method:%{WORD:method} password:%{DATA}\\} latency:%{NUMBER:latency}ms", true), "upsert") where IsMap(attributes)
    set(attributes, ExtractGrokPatterns(body, "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{DATA} request %{DATA} spec:\\{uri:%{DATA:uri} method:%{WORD:method} password:%{DATA}\\} latency:%{NUMBER:latency}ms", true)) where not IsMap(attributes)    
- type: ottl_transform
  metadata: '{"id":"8XuK6dYIkpiayHx59Ap0c","type":"ottl_transform","name":"Create
    new body"}'
  data_types:
  - log
  statements: |
    set(attributes["essential"], Format("%s %s %s %s latency=%sms", [attributes["timestamp"], attributes["level"], attributes["method"], attributes["uri"], String(attributes["latency"])])) where attributes["timestamp"] != nil and attributes["level"] != nil and attributes["method"] != nil and attributes["uri"] != nil and attributes["latency"] != nil    
- type: ottl_transform
  metadata: '{"id":"1NcSk_6yX8HAvXuOyLkWl","type":"copy-field","name":"Copy Field"}'
  data_types:
  - log
  statements: set(body, attributes["essential"])
- type: ottl_transform
  metadata: '{"id":"KGyDcK2Xrr239u3gCGUwD","type":"ottl_transform","name":"Delete
    Attributes"}'
  data_types:
  - log
  statements: delete_matching_keys(attributes, ".*")
- type: ottl_transform
  metadata: '{"id":"X69__BNiX1W7RuYcOFrDS","type":"delete-field","name":"Delete ed.demo"}'
  data_types:
  - log
  statements: delete_key(resource, "ed.demo")

Note: This could be reduced into a single OTTL Custom Processor with multiple statements, but it is useful to have the GUI available to quickly reorder, disable or otherwise configure processors.

Optimized processor:

- name: telemetrygen_input_c47a_multiprocessor
  type: sequence
  user_description: Multi Processor
  processors:
  - type: ottl_transform
    metadata: '{"id":"oY9Q1NPiR_dXscOR6YPFX","type":"ottl_transform","name":"Custom"}'
    data_types:
    - log
    statements: |
      // Parse once, format directly to body, skip intermediate storage
      set(cache["p"], ExtractGrokPatterns(body, "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{DATA} request %{DATA} spec:\\{uri:%{DATA:uri} method:%{WORD:method} password:%{DATA}\\} latency:%{NUMBER:latency}ms", true))
      set(body, Concat([cache["p"]["timestamp"], " ", cache["p"]["level"], " ", cache["p"]["method"], " ", cache["p"]["uri"], " latency=", String(cache["p"]["latency"]), "ms"], "")) where cache["p"]["timestamp"] != nil
      delete_key(resource, "ed.demo")
      delete_key(cache, "p")      

This optimized processor achieves the same 21% reduction with just four operations instead of six. It parses the log line once using ExtractGrokPatterns and stores the result in a temporary cache. Then it immediately constructs the compact body format directly from the cached values using Concat, bypassing the need for intermediate attribute storage. Finally, it cleans up by removing the empty ed.demo field and clearing the cache. This approach reduces processing overhead by 50% compared to the GUI-generated version while producing identical output.

Scenario 2: Large Logs with Moderate Reduction

Consider this Apache JSON log:

{
  "_type": "log",
  "timestamp": 1757681921433,
  "body": "{\n\"host\": \"116.210.217.185\",\n\"user\": \"-\",\n\"time_local\": \"12/09/2025:12:58:41 +0000\",\n\"method\": \"POST\",\n\"request\": \"/v1/logout\",\n\"protocol\": \"HTTP/1.1\",\n\"response-code\": 201,\n\"bytes\": 1890,\n\"referrer\": \"https://www.investorholistic.biz/sticky/web-readiness\",\n\"agent\": \"Opera/10.56 (X11; Linux x86_64; en-US) Presto/2.8.161 Version/10.00\"\n}",
  "resource": {
    "ed.demo": "",
    "ed.source.name": "telemetrygen_input_c47a",
    "ed.source.type": "telemetrygen_input",
    "host.ip": "172.19.0.4",
    "host.name": "data-reduction-test-control-plane",
    "service.name": "telemetry-gen-telemetrygen_input_c47a"
  },
  "attributes": {}
}

When this configuration is applied, log size is reduced by about 40%.

This stack parses the JSON body to extract fields, then uses a lookup table to map verbose user agent strings to descriptive codes (e.g., “Opera on Linux” instead of the full 80+ character agent string). It removes low-value fields like referrer, protocol, and time_local that either duplicate information or aren’t needed for analysis. Finally, it rebuilds a compact JSON body with just the essential fields and cleans up all temporary attributes. The result maintains queryable JSON structure while cutting size nearly in half.

Resulting in logs like this:

{
  "_type": "log",
  "timestamp": 1757681921433,
  "body": "{\"host\":\"116.210.217.185\",\"method\":\"POST\",\"path\":\"/v1/logout\",\"status\":201,\"bytes\":1890,\"ua\":\"Opera on Linux\"}",
  "resource": {
    "ed.source.name": "telemetrygen_input_c47a",
    "ed.source.type": "telemetrygen_input",
    "host.ip": "172.19.0.4",
    "host.name": "data-reduction-test-control-plane",
    "service.name": "telemetry-gen-telemetrygen_input_c47a"
  },
  "attributes": {}
}

The GUI generates a pipeline with the following YAML in the Multi-processor node:

- type: ottl_transform
  metadata: '{"id":"32ZrlYDO9o2NbDLDHtfVu8M4cjm","type":"parse-json","name":"Parse JSON","isRecommendation":true}'
  data_types:
  - log
  statements: |-
    set(cache["parsed-json"], ParseJSON(body))
    merge_maps(attributes, cache["parsed-json"], "upsert") where IsMap(attributes) and IsMap(cache["parsed-json"])
    set(attributes, cache["parsed-json"]) where not (IsMap(attributes) and IsMap(cache["parsed-json"]))    
- type: lookup
  metadata: '{"id":"pIAlUUWI-Uwh4YjTW4uYZ","type":"lookup","name":"Lookup"}'
  data_types:
  - log
  location_path: ed://user-agent-lookup.csv
  match_mode: regex
  key_fields:
  - event_field: attributes["agent"]
    lookup_field: pattern
  out_fields:
  - event_field: attributes["ua_code"]
    lookup_field: description
    default_value: UNKNOWN
- type: ottl_transform
  metadata: '{"id":"L1iI_zYCM65dr0HBJfpyX","type":"delete-field","name":"Delete referrer"}'
  data_types:
  - log
  statements: delete_key(attributes, "referrer")
- type: ottl_transform
  metadata: '{"id":"86qKJVwikulm2W-b-itVP","type":"delete-field","name":"Delete agent"}'
  data_types:
  - log
  statements: delete_key(attributes, "agent")
- type: ottl_transform
  metadata: '{"id":"fAKhdxZ5hAGFBuUa1cE6j","type":"delete-field","name":"Delete protocol"}'
  data_types:
  - log
  statements: delete_key(attributes, "protocol")
- type: ottl_transform
  metadata: '{"id":"aoI0D7etoJXTOqq9Plirv","type":"delete-field","name":"Delete time_local"}'
  data_types:
  - log
  statements: delete_key(attributes, "time_local")
- type: ottl_transform
  metadata: '{"id":"Q8gN2A-HuMEOOXNCOuP4w","type":"delete-field","name":"Delete ed.demo"}'
  data_types:
  - log
  statements: delete_key(resource, "ed.demo")
- type: ottl_transform
  metadata: '{"id":"yt0jnUXRrhwJx7QWThVMh","type":"ottl_transform","name":"Custom"}'
  data_types:
  - log
  statements: |-
    set(body, Format("{\"host\":\"%s\",\"method\":\"%s\",\"path\":\"%s\",\"status\":%d,\"bytes\":%d,\"ua\":\"%s\"}", [attributes["host"], attributes["method"], attributes["request"], attributes["response-code"], attributes["bytes"], attributes["ua_code"]]))
    delete_matching_keys(attributes, ".*")    

The lookup table (user-agent-lookup.csv) maps common user agent patterns to short descriptive codes, reducing strings from 80-200 characters to 10-15 characters while maintaining analytical value.

Again, this processor stack could be reduced into a single OTTL Custom Processor with multiple statements, but it is useful to have the GUI available to quickly reorder, disable or otherwise configure processors.

- name: telemetrygen_input_4aa9_multiprocessor
  type: sequence
  user_description: Multi Processor
  processors:
  - type: ottl_transform
    metadata: '{"id":"xLmTEbB_j0e7UOzywBUwB","type":"ottl_transform","name":"Custom"}'
    data_types:
    - log
    statements: |-
      // Parse JSON and extract fields in one step
      set(cache["a"], ParseJSON(body))

      // Map user agent to code inline using where conditions
      set(cache["ua"], "Chrome") where IsMatch(String(cache["a"]["agent"]), ".*Chrome.*")
      set(cache["ua"], "Firefox") where IsMatch(String(cache["a"]["agent"]), ".*Firefox.*") and cache["ua"] == nil
      set(cache["ua"], "Safari") where IsMatch(String(cache["a"]["agent"]), ".*Safari.*") and not IsMatch(String(cache["a"]["agent"]), ".*Chrome.*") and cache["ua"] == nil
      set(cache["ua"], "Opera") where IsMatch(String(cache["a"]["agent"]), ".*Opera.*") and cache["ua"] == nil
      set(cache["ua"], "Bot") where IsMatch(String(cache["a"]["agent"]), ".*(bot|crawler|spider).*") and cache["ua"] == nil
      set(cache["ua"], "Other") where cache["ua"] == nil

      // Build compact JSON using Concat
      set(body, Concat(["{\"host\":\"", String(cache["a"]["host"]), "\",\"method\":\"", String(cache["a"]["method"]), "\",\"path\":\"", String(cache["a"]["request"]), "\",\"status\":", String(cache["a"]["response-code"]), ",\"bytes\":", String(cache["a"]["bytes"]), ",\"ua\":\"", cache["ua"], "\"}"], ""))

      // IMPORTANT: Clear all attributes to reduce size
      delete_matching_keys(attributes, ".*")

      // Clean up cache and resource
      delete_key(cache, "a")
      delete_key(cache, "ua")
      delete_key(resource, "ed.demo")      

This optimized processor achieves 45% reduction (better than the GUI’s 40%) with cleaner code. It parses JSON once, uses conditional where statements for user agent mapping instead of an external lookup table, and builds the compact body with Concat. This single-processor approach outperforms the eight-processor GUI version in both efficiency and compression.

Scenario 3: Extreme Aggregation/Optimization

Consider this GCP Audit log:

{
  "_type": "log",
  "timestamp": 1757684811048,
  "body": "{\"insertId\":\"WpjlIoyaKV\",\"logName\":\"projects/affable-alpha-386416/logs/cloudaudit.googleapis.com%2Factivity\",\"protoPayload\":{\"@type\":\"type.googleapis.com/google.cloud.audit.AuditLog\",\"authenticationInfo\":{\"principalEmail\":\"153750138338-compute@developer.gserviceaccount.com\",\"principalSubject\":\"serviceAccount:153750138338-compute@developer.gserviceaccount.com\",\"serviceAccountKeyName\":\"//iam.googleapis.com/projects/affable-alpha-386416/serviceAccounts/153750138338-compute@developer.gserviceaccount.com/keys/76b9aeba5cdb268d4fe4e0e9ddfeea35eea0647f\"},\"authorizationInfo\":[{\"granted\":true,\"permission\":\"pubsub.topics.attachSubscription\",\"resource\":\"projects/affable-alpha-386416/topics/all4\",\"resourceAttributes\":{}}],\"methodName\":\"google.pubsub.v1.Subscriber.CreateSubscription\",\"request\":{\"@type\":\"type.googleapis.com/google.pubsub.v1.Subscription\",\"name\":\"projects/affable-alpha-415524/subscriptions/all4sub\",\"topic\":\"projects/affable-alpha-910731/topics/all4\"},\"requestMetadata\":{\"callerIp\":\"175.55.149.113\",\"callerSuppliedUserAgent\":\"grpc-node-js/1.6.8,gzip(gfe)\",\"destinationAttributes\":{},\"requestAttributes\":{\"auth\":{},\"time\":\"2025-09-12T13:46:51.048270838Z\"}},\"resourceName\":\"projects/affable-alpha-703700/topics/all4\",\"response\":{\"@type\":\"type.googleapis.com/google.pubsub.v1.Subscription\"},\"serviceName\":\"pubsub.googleapis.com\"},\"receiveTimestamp\":\"2025-09-12T13:46:51.048270838Z\",\"resource\":{\"labels\":{\"project_id\":\"affable-alpha-312617\",\"topic_id\":\"projects/affable-alpha-894811/topics/all4\"},\"type\":\"pubsub_topic\"},\"severity\":\"NOTICE\",\"timestamp\":\"2025-09-12T13:46:51.048270838Z\"}",
  "resource": {
    "ed.demo": "",
    "ed.source.name": "telemetrygen_input_4aa9",
    "ed.source.type": "telemetrygen_input",
    "host.ip": "172.19.0.4",
    "host.name": "data-reduction-test-control-plane",
    "service.name": "telemetry-gen-telemetrygen_input_4aa9"
  },
  "attributes": {}
}

When this configuration is applied, log size is reduced by about 88% and converted to metrics.

This stack parses the massive nested JSON to extract just five critical fields (service, method, user, severity, resource type) from over 50 fields in the audit payload. It immediately deletes the verbose protoPayload and other nested structures before any metric extraction occurs - this is crucial to prevent metrics from carrying unnecessary data. The pipeline then generates both aggregated metrics for dashboards and pattern metrics for anomaly detection, replacing thousands of bytes of audit detail with compact, actionable signals. The result transforms verbose audit logs into lean metrics while preserving full observability through aggregation.

Resulting in metrics like this:

{
  "_type": "metric",
  "timestamp": 1757684820000,
  "resource": {},
  "attributes": {
    "method": "google.pubsub.v1.Subscriber.CreateSubscription",
    "pattern": "pubsub.googleapis.com google.pubsub.v1.Subscriber.CreateSubscription by 153750138338-compute@developer.gserviceaccount.com - NOTICE",
    "service": "pubsub.googleapis.com",
    "severity": "NOTICE"
  },
  "kind": "sum",
  "name": "gcp_audit_aggregated",
  "start_timestamp": 1757684810000,
  "sum": {
    "aggregation_temporality": "cumulative",
    "is_monotonic": false,
    "value": 4
  },
  "unit": "1",
  "_stat_type": "sum"
}

The GUI generates a pipeline with the following YAML in the Multi-processor node:

- type: ottl_transform
  metadata: '{"id":"32ZqUErKPAOS80x4ftfV17vC6zi","type":"parse-json","name":"Parse JSON"}'
  data_types:
  - log
  statements: |-
    set(cache["parsed-json"], ParseJSON(body))
    merge_maps(attributes, cache["parsed-json"], "upsert") where IsMap(attributes) and IsMap(cache["parsed-json"])
    set(attributes, cache["parsed-json"]) where not (IsMap(attributes) and IsMap(cache["parsed-json"]))    
- type: ottl_transform
  metadata: '{"id":"Ooji6H0r2e89z9d1_40yy","type":"ottl_transform","name":"Extract and Clean"}'
  data_types:
  - log
  statements: |-
    // Extract only the essential fields
    set(cache["service"], String(attributes["protoPayload"]["serviceName"]))
    set(cache["method"], String(attributes["protoPayload"]["methodName"]))
    set(cache["user"], String(attributes["protoPayload"]["authenticationInfo"]["principalEmail"]))
    set(cache["severity"], String(attributes["severity"]))
    set(cache["resource_type"], String(attributes["resource"]["type"]))
    
    // Create compact pattern
    set(attributes["pattern"], Concat([cache["service"], " ", cache["method"], " by ", cache["user"], " - ", cache["severity"]], ""))
    set(body, attributes["pattern"])
    
    // DELETE VERBOSE FIELDS IMMEDIATELY (before metrics)
    delete_key(attributes, "protoPayload")
    delete_key(attributes, "insertId")
    delete_key(attributes, "logName")
    delete_key(attributes, "receiveTimestamp")
    delete_key(attributes, "resource")
    delete_key(attributes, "timestamp")
    
    // Set dimensions for metrics
    set(attributes["service"], cache["service"])
    set(attributes["method"], cache["method"])
    set(attributes["severity"], cache["severity"])    
- type: extract_metric
  metadata: '{"id":"hk1ewFWnss-jytQr_kZ1x","type":"extract_metric","name":"Extract Metric"}'
  keep_item: true
  data_types:
  - log
  extract_metric_rules:
  - name: gcp_audit_events
    unit: "1"
    sum:
      aggregation_temporality: cumulative
      value: "1"
- type: aggregate_metric
  metadata: '{"id":"Q89VVlmBfkh75nzuDvl6P","type":"aggregate_metric","name":"Aggregate Metric"}'
  data_types:
  - metric
  aggregate_metric_rules:
  - name: gcp_audit_aggregated
    interval: 10s
    aggregation_type: count
    group_by:
    - attributes["service"]
    - attributes["method"]
    - attributes["severity"]
- type: log_to_pattern_metric
  metadata: '{"id":"6QCbncbPA8i01vrerrH96","type":"log_to_pattern_metric","name":"Log to Pattern"}'
  data_types:
  - log
  reporting_frequency: 30s