Security Monitoring with Kafka and Edge Delta

Deploy Edge Delta agents per Kafka topic to process security events from authentication systems, network firewalls, and endpoint detection tools that cannot run agents directly.

Overview

Security events originate from systems that cannot run observability agents directly: perimeter firewalls, SSO gateways, VPN concentrators, IDS appliances, and endpoint detection tools installed on managed workstations. These far-edge systems export events to Apache Kafka topics, creating a central event bus that decouples producers from consumers.

Edge Delta agents consume from individual Kafka topics and apply domain-specific processing before routing events to their destinations. Each agent operates independently with its own pipeline, enabling isolated scaling and failure domains per security domain.

Architecture

flowchart LR
    subgraph "Far Edge (no agents)"
        A1[SSO Gateway]
        A2[VPN Concentrator]
        A3[IAM Service]
        B1[Perimeter Firewall]
        B2[IDS / Suricata]
        B3[DNS Server]
        C1[CrowdStrike EDR]
        C2[Workstations]
    end

    subgraph "Kafka Event Bus"
        T1[auth-events]
        T2[network-events]
        T3[endpoint-events]
    end

    subgraph "Edge Delta Agents"
        E1[Auth Agent]
        E2[Network Agent]
        E3[Endpoint Agent]
    end

    subgraph "Destinations"
        S1[SIEM]
        S2[Data Lake / Archive]
    end

    A1 & A2 & A3 --> T1
    B1 & B2 & B3 --> T2
    C1 & C2 --> T3

    T1 --> E1
    T2 --> E2
    T3 --> E3

    E1 -->|Critical/High| S1
    E1 -->|All| S2
    E2 -->|Critical/High| S1
    E2 -->|All| S2
    E3 -->|Critical/High| S1
    E3 -->|All| S2

Agent-per-Topic Pattern

Assigning one Edge Delta agent to each Kafka topic provides several advantages over a single agent consuming from all topics:

  • Domain-specific processing: Authentication events need brute-force detection logic. Network events need port-scan identification. Endpoint events need process-name matching. Each pipeline contains only the OTTL statements relevant to its domain.
  • Independent scaling: If endpoint telemetry volume spikes (common during threat hunting), scale the endpoint agent without affecting authentication or network processing.
  • Failure isolation: A misconfigured pipeline for one domain does not disrupt processing of the others.
  • Consumer group separation: Each agent joins its own Kafka consumer group, so topic offsets are tracked independently.

Kafka source configuration

Each agent uses the Kafka source node pointing to the same broker but consuming from a different topic:

- name: kafka_auth_input
  type: kafka_input
  brokers:
  - "kafka.example.com:9092"
  topics:
  - "auth-events"
  group_id: "ed-auth-consumer"

Processing Patterns

Severity classification

OTTL transform statements classify each event into a severity level based on its type and attributes:

- type: ottl_transform
  data_types: [log]
  statements: |-
    set(attributes["severity"], "medium") where body["event_type"] == "auth_failure"
    set(attributes["severity"], "high") where body["event_type"] == "privilege_escalation"
    set(attributes["severity"], "critical") where body["event_type"] == "auth_failure" and body["attempt_count"] > 10

This converts raw events into a uniform severity model that downstream systems (SIEM rules, alerting, dashboards) can consume without parsing domain-specific fields.

Volume reduction

Security telemetry is dominated by noise. Firewall logs are mostly connection_allowed entries. EDR telemetry is mostly normal_process executions. Filtering these at the pipeline level before data reaches the SIEM dramatically reduces storage costs and improves signal-to-noise ratio:

- type: ottl_filter
  data_types: [log]
  condition: body["event_type"] == "connection_allowed"

In practice, this pattern reduces network event volume by approximately 50-90% and endpoint event volume by 50-95%, depending on the environment.

Content-based routing

The route processor splits the event stream by severity. Critical and high severity events route to the SIEM for immediate analyst attention. All processed events (minus filtered noise) route to the data lake for forensic and compliance purposes:

- name: severity_router
  type: route_ottl
  paths:
  - path: critical_and_high
    condition: attributes["severity"] == "critical" or attributes["severity"] == "high"
  - path: all_events
    condition: "true"

The Cost Problem

A modest Kafka deployment of five brokers can produce over 1.5 TB of log data per month before accounting for the security event topics themselves. When firewall, authentication, and EDR streams are added, raw volume can easily double. Forwarding all of this to a SIEM at ingestion pricing is unsustainable.

Edge Delta’s pipelines address this as a preprocessing layer between Kafka and the SIEM. By filtering noise, enriching events, and routing only high-value signals to expensive destinations, the agent-per-topic pattern becomes a cost optimization strategy as much as a security architecture.

Monitoring the Agents

Each Edge Delta agent operates as a Kafka consumer with its own consumer group. Monitor these metrics to ensure the agents keep pace with event production:

  • Consumer lag: The difference between the log end offset and the consumer’s committed offset. Track maximum lag per consumer group (not per partition) to avoid high-cardinality metric explosions. Alert if lag exceeds 100,000 for more than 5 minutes, or if the consumer offset stops advancing for 10 minutes.
  • ISR shrink rate: A leading indicator that the Kafka cluster is under stress. If in-sync replicas are shrinking, investigate broker health before agent processing is affected.
  • Request latency (P99): Elevated fetch latency means the agent is waiting longer for messages, which can delay security event processing.

These operational metrics complement the security telemetry flowing through the pipelines. Edge Delta’s self-telemetry nodes (ed_self_telemetry_input, ed_system_stats_input) emit agent health data that routes to the same backend as the security events, providing a unified view.

Kafka Cluster Security

The Kafka cluster itself is a security boundary. Common gaps to address:

  • JMX authentication: Many deployments leave JMX monitoring endpoints with authenticate=false and ssl=false for convenience. In production, enable authentication and TLS on JMX to prevent unauthorized access to cluster management.
  • SASL/TLS for producers and consumers: Configure the Kafka source with SASL credentials and TLS certificates. Edge Delta supports SASL/PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, and Kerberos (GSSAPI) authentication. See Kafka secrets management for credential configuration.
  • Topic-level ACLs: Restrict which consumer groups can read from security topics. Each Edge Delta agent should authenticate with credentials scoped to its assigned topic.

Security Domains

DomainKafka TopicKey EventsProcessing
Authenticationauth-eventsLogin failures, privilege escalation, MFA challenges, session expiryBrute-force detection (high attempt count), severity classification
Networknetwork-eventsFirewall blocks, port scans, SYN floods, DNS anomalies, DNS tunnelingDrop allowed connections, flag active threats
Endpointendpoint-eventsSuspicious process execution, file changes, malware detection, lateral movement, data exfiltrationDrop normal processes, flag known-bad process names

Next Steps