Security Monitoring with Kafka and Edge Delta
5 minute read
Overview
Security events originate from systems that cannot run observability agents directly: perimeter firewalls, SSO gateways, VPN concentrators, IDS appliances, and endpoint detection tools installed on managed workstations. These far-edge systems export events to Apache Kafka topics, creating a central event bus that decouples producers from consumers.
Edge Delta agents consume from individual Kafka topics and apply domain-specific processing before routing events to their destinations. Each agent operates independently with its own pipeline, enabling isolated scaling and failure domains per security domain.
Architecture
flowchart LR
subgraph "Far Edge (no agents)"
A1[SSO Gateway]
A2[VPN Concentrator]
A3[IAM Service]
B1[Perimeter Firewall]
B2[IDS / Suricata]
B3[DNS Server]
C1[CrowdStrike EDR]
C2[Workstations]
end
subgraph "Kafka Event Bus"
T1[auth-events]
T2[network-events]
T3[endpoint-events]
end
subgraph "Edge Delta Agents"
E1[Auth Agent]
E2[Network Agent]
E3[Endpoint Agent]
end
subgraph "Destinations"
S1[SIEM]
S2[Data Lake / Archive]
end
A1 & A2 & A3 --> T1
B1 & B2 & B3 --> T2
C1 & C2 --> T3
T1 --> E1
T2 --> E2
T3 --> E3
E1 -->|Critical/High| S1
E1 -->|All| S2
E2 -->|Critical/High| S1
E2 -->|All| S2
E3 -->|Critical/High| S1
E3 -->|All| S2Agent-per-Topic Pattern
Assigning one Edge Delta agent to each Kafka topic provides several advantages over a single agent consuming from all topics:
- Domain-specific processing: Authentication events need brute-force detection logic. Network events need port-scan identification. Endpoint events need process-name matching. Each pipeline contains only the OTTL statements relevant to its domain.
- Independent scaling: If endpoint telemetry volume spikes (common during threat hunting), scale the endpoint agent without affecting authentication or network processing.
- Failure isolation: A misconfigured pipeline for one domain does not disrupt processing of the others.
- Consumer group separation: Each agent joins its own Kafka consumer group, so topic offsets are tracked independently.
Kafka source configuration
Each agent uses the Kafka source node pointing to the same broker but consuming from a different topic:
- name: kafka_auth_input
type: kafka_input
brokers:
- "kafka.example.com:9092"
topics:
- "auth-events"
group_id: "ed-auth-consumer"
Processing Patterns
Severity classification
OTTL transform statements classify each event into a severity level based on its type and attributes:
- type: ottl_transform
data_types: [log]
statements: |-
set(attributes["severity"], "medium") where body["event_type"] == "auth_failure"
set(attributes["severity"], "high") where body["event_type"] == "privilege_escalation"
set(attributes["severity"], "critical") where body["event_type"] == "auth_failure" and body["attempt_count"] > 10
This converts raw events into a uniform severity model that downstream systems (SIEM rules, alerting, dashboards) can consume without parsing domain-specific fields.
Volume reduction
Security telemetry is dominated by noise. Firewall logs are mostly connection_allowed entries. EDR telemetry is mostly normal_process executions. Filtering these at the pipeline level before data reaches the SIEM dramatically reduces storage costs and improves signal-to-noise ratio:
- type: ottl_filter
data_types: [log]
condition: body["event_type"] == "connection_allowed"
In practice, this pattern reduces network event volume by approximately 50-90% and endpoint event volume by 50-95%, depending on the environment.
Content-based routing
The route processor splits the event stream by severity. Critical and high severity events route to the SIEM for immediate analyst attention. All processed events (minus filtered noise) route to the data lake for forensic and compliance purposes:
- name: severity_router
type: route_ottl
paths:
- path: critical_and_high
condition: attributes["severity"] == "critical" or attributes["severity"] == "high"
- path: all_events
condition: "true"
The Cost Problem
A modest Kafka deployment of five brokers can produce over 1.5 TB of log data per month before accounting for the security event topics themselves. When firewall, authentication, and EDR streams are added, raw volume can easily double. Forwarding all of this to a SIEM at ingestion pricing is unsustainable.
Edge Delta’s pipelines address this as a preprocessing layer between Kafka and the SIEM. By filtering noise, enriching events, and routing only high-value signals to expensive destinations, the agent-per-topic pattern becomes a cost optimization strategy as much as a security architecture.
Monitoring the Agents
Each Edge Delta agent operates as a Kafka consumer with its own consumer group. Monitor these metrics to ensure the agents keep pace with event production:
- Consumer lag: The difference between the log end offset and the consumer’s committed offset. Track maximum lag per consumer group (not per partition) to avoid high-cardinality metric explosions. Alert if lag exceeds 100,000 for more than 5 minutes, or if the consumer offset stops advancing for 10 minutes.
- ISR shrink rate: A leading indicator that the Kafka cluster is under stress. If in-sync replicas are shrinking, investigate broker health before agent processing is affected.
- Request latency (P99): Elevated fetch latency means the agent is waiting longer for messages, which can delay security event processing.
These operational metrics complement the security telemetry flowing through the pipelines. Edge Delta’s self-telemetry nodes (ed_self_telemetry_input, ed_system_stats_input) emit agent health data that routes to the same backend as the security events, providing a unified view.
Kafka Cluster Security
The Kafka cluster itself is a security boundary. Common gaps to address:
- JMX authentication: Many deployments leave JMX monitoring endpoints with
authenticate=falseandssl=falsefor convenience. In production, enable authentication and TLS on JMX to prevent unauthorized access to cluster management. - SASL/TLS for producers and consumers: Configure the Kafka source with SASL credentials and TLS certificates. Edge Delta supports SASL/PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, and Kerberos (GSSAPI) authentication. See Kafka secrets management for credential configuration.
- Topic-level ACLs: Restrict which consumer groups can read from security topics. Each Edge Delta agent should authenticate with credentials scoped to its assigned topic.
Security Domains
| Domain | Kafka Topic | Key Events | Processing |
|---|---|---|---|
| Authentication | auth-events | Login failures, privilege escalation, MFA challenges, session expiry | Brute-force detection (high attempt count), severity classification |
| Network | network-events | Firewall blocks, port scans, SYN floods, DNS anomalies, DNS tunneling | Drop allowed connections, flag active threats |
| Endpoint | endpoint-events | Suspicious process execution, file changes, malware detection, lateral movement, data exfiltration | Drop normal processes, flag known-bad process names |
Next Steps
- Configure the Kafka source for your broker and topics
- Use OTTL transforms for enrichment and classification
- Set up route processing for severity-based routing
- Review security and compliance for data governance best practices
- Read the Kafka metrics monitoring guide for broker and consumer group health metrics