Datadog Agent Connector

Configure the Datadog Agent connector to receive metrics, logs, and traces from Datadog agents using the native Datadog protocol.

  10 minute read  

Overview

The Datadog Agent connector receives telemetry data from Datadog agents using the native Datadog protocol. It collects metrics, logs, and APM traces from Datadog-instrumented applications and infrastructure, making the data available for AI teammates to query through the Edge Delta MCP connector.

When you add this streaming connector, it appears as a Datadog Agent source in your selected pipeline. AI teammates access this data by querying the Edge Delta backend with the Edge Delta MCP connector.

Add the Datadog Agent Connector

To add the Datadog Agent connector, you configure a network endpoint where Edge Delta will listen for Datadog protocol traffic, then reconfigure your Datadog agents to send data to that endpoint.

Prerequisites

Before configuring the connector, ensure you have:

  • Datadog agents or applications instrumented with Datadog client libraries
  • Network connectivity between Datadog agents and Edge Delta infrastructure
  • Firewall rules allowing inbound traffic on the connector port

Configuration Steps

  1. Navigate to AI Team > Connectors in the Edge Delta application
  2. Find the Datadog Agent connector in Streaming Connectors
  3. Click the connector card
  4. Select the pipeline (environment) to receive this data
  5. Configure network and timeout options (see below)
  6. Click Save

The connector is now ready to receive Datadog protocol traffic.

Datadog Agent connector configuration showing listen address, port, and timeout settings

Reconfiguring Datadog Agents

After deploying the connector, reconfigure your Datadog agents to send data to Edge Delta instead of the Datadog backend.

For Datadog Agents on Servers

Edit /etc/datadog-agent/datadog.yaml:

dd_url: http://edge-delta-host:3421
apm_config:
  apm_dd_url: http://edge-delta-host:3421

Restart the agent: sudo systemctl restart datadog-agent

For DogStatsD Client Libraries

Python:

from datadog import initialize
initialize(statsd_host='edge-delta-host', statsd_port=3421)

Node.js:

const StatsD = require('node-statsd');
const client = new StatsD('edge-delta-host', 3421);

For Datadog APM Libraries

Java: Add JVM arguments:

-Ddd.agent.host=edge-delta-host -Ddd.agent.port=3421

Python:

from ddtrace import tracer
tracer.configure(hostname='edge-delta-host', port=3421)

Go:

tracer.Start(tracer.WithAgentAddr("edge-delta-host:3421"))

Configuration Options

Connector Name

Name to identify this Datadog Agent connector instance.

Listen

Network address where the connector listens for incoming Datadog agent traffic. Use 0.0.0.0 to accept connections on all network interfaces.

Default: 0.0.0.0

Format: Valid IP address (e.g., 0.0.0.0, 192.168.1.100, 127.0.0.1)

Port

TCP port number where the connector listens for Datadog protocol connections.

Default: 3421

Format: Port number between 1 and 65535

Read Timeout

How long the connector waits for data from an established connection before closing it due to inactivity.

Default: 1m (1 minute)

Format: Duration with unit suffix (e.g., 1m, 30s, 2m, 5000ms)

Target Environments

Select the Edge Delta pipeline (environment) where you want to deploy this connector.

Advanced Settings

TLS

Optional TLS/SSL configuration for encrypted communication between Datadog agents and Edge Delta. When enabled, all telemetry data is transmitted over an encrypted connection.

TLS Options:

  • Ignore Certificate Check: Disables SSL/TLS certificate verification (use with caution)
  • CA File: Absolute file path to the CA certificate for SSL/TLS connections
  • CA Path: Absolute path where CA certificate files are located
  • CRT File: Absolute path to the SSL/TLS certificate file
  • Key File: Absolute path to the private key file
  • Key Password: Optional password for the key file
  • Client Auth Type: Client authentication type (default: noclientcert)
  • Minimum Version: Minimum TLS version (default: TLSv1_2)
  • Maximum Version: Maximum TLS version

Metadata Level

This option is used to define which detected resources and attributes to add to each data item as it is ingested by Edge Delta. You can select:

  • Required Only: This option includes the minimum required resources and attributes for Edge Delta to operate.
  • Default: This option includes the required resources and attributes plus those selected by Edge Delta
  • High: This option includes the required resources and attributes along with a larger selection of common optional fields.
  • Custom: With this option selected, you can choose which attributes and resources to include. The required fields are selected by default and can’t be unchecked.

Based on your selection in the GUI, the source_metadata YAML is populated as two dictionaries (resource_attributes and attributes) with Boolean values.

See Choose Data Item Metadata for more information on selecting metadata.

Rate Limit

The rate_limit parameter enables you to control data ingestion based on system resource usage. This advanced setting helps prevent source nodes from overwhelming the agent by automatically throttling or stopping data collection when CPU or memory thresholds are exceeded.

Use rate limiting to prevent runaway log collection from overwhelming the agent in high-volume sources, protect agent stability in resource-constrained environments with limited CPU/memory, automatically throttle during bursty traffic patterns, and ensure fair resource allocation across source nodes in multi-tenant deployments.

When rate limiting triggers, pull-based sources (File, S3, HTTP Pull) stop fetching new data, push-based sources (HTTP, TCP, UDP, OTLP) reject incoming data, and stream-based sources (Kafka, Pub/Sub) pause consumption. Rate limiting operates at the source node level, where each source with rate limiting enabled independently monitors and enforces its own thresholds.

Configuration Steps:

  1. Click Add New in the Rate Limit section
  2. Click Add New for Evaluation Policy
  3. Select Policy Type:
  • CPU Usage: Monitors CPU consumption and rate limits when usage exceeds defined thresholds. Use for CPU-intensive sources like file parsing or complex transformations.
  • Memory Usage: Monitors memory consumption and rate limits when usage exceeds defined thresholds. Use for memory-intensive sources like large message buffers or caching.
  • AND (composite): Combines multiple sub-policies with AND logic. All sub-policies must be true simultaneously to trigger rate limiting. Use when you want conservative rate limiting (both CPU and memory must be high).
  • OR (composite): Combines multiple sub-policies with OR logic. Any sub-policy can trigger rate limiting. Use when you want aggressive rate limiting (either CPU or memory being high triggers).
  1. Select Evaluation Mode. Choose how the policy behaves when thresholds are exceeded:
  • Enforce (default): Actively applies rate limiting when thresholds are met. Pull-based sources (File, S3, HTTP Pull) stop fetching new data, push-based sources (HTTP, TCP, UDP, OTLP) reject incoming data, and stream-based sources (Kafka, Pub/Sub) pause consumption. Use in production to protect agent resources.
  • Monitor: Logs when rate limiting would occur without actually limiting data flow. Use for testing thresholds before enforcing them in production.
  • Passthrough: Disables rate limiting entirely while keeping the configuration in place. Use to temporarily disable rate limiting without removing configuration.
  1. Set Absolute Limits and Relative Limits (for CPU Usage and Memory Usage policies)

Note: If you specify both absolute and relative limits, the system evaluates both conditions and rate limiting triggers when either condition is met (OR logic). For example, if you set absolute limit to 1.0 CPU cores and relative limit to 50%, rate limiting triggers when the source uses either 1 full core OR 50% of available CPU, whichever happens first.

  • For CPU Absolute Limits: Enter value in full core units:

    • 0.1 = one-tenth of a CPU core
    • 0.5 = half a CPU core
    • 1.0 = one full CPU core
    • 2.0 = two full CPU cores
  • For CPU Relative Limits: Enter percentage of total available CPU (0-100):

    • 50 = 50% of available CPU
    • 75 = 75% of available CPU
    • 85 = 85% of available CPU
  • For Memory Absolute Limits: Enter value in bytes

    • 104857600 = 100Mi (100 × 1024 × 1024)
    • 536870912 = 512Mi (512 × 1024 × 1024)
    • 1073741824 = 1Gi (1 × 1024 × 1024 × 1024)
  • For Memory Relative Limits: Enter percentage of total available memory (0-100)

    • 60 = 60% of available memory
    • 75 = 75% of available memory
    • 80 = 80% of available memory
  1. Set Refresh Interval (for CPU Usage and Memory Usage policies). Specify how frequently the system checks resource usage:
  • Recommended Values:
    • 10s to 30s for most use cases
    • 5s to 10s for high-volume sources requiring quick response
    • 1m or higher for stable, low-volume sources

The system fetches current CPU/memory usage at the specified refresh interval and uses that value for evaluation until the next refresh. Shorter intervals provide more responsive rate limiting but incur slightly higher overhead, while longer intervals are more efficient but slower to react to sudden resource spikes.

The GUI generates YAML as follows:

# Simple CPU-based rate limiting
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: cpu_usage
        evaluation_mode: enforce
        absolute_limit: 0.5  # Limit to half a CPU core
        refresh_interval: 10s
# Simple memory-based rate limiting
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: memory_usage
        evaluation_mode: enforce
        absolute_limit: 536870912  # 512Mi in bytes
        refresh_interval: 30s

Composite Policies (AND / OR)

When using AND or OR policy types, you define sub-policies instead of limits. Sub-policies must be siblings (at the same level)—do not nest sub-policies within other sub-policies. Each sub-policy is independently evaluated, and the parent policy’s evaluation mode applies to the composite result.

  • AND Logic: All sub-policies must evaluate to true at the same time to trigger rate limiting. Use when you want conservative rate limiting (limit only when CPU AND memory are both high).
  • OR Logic: Any sub-policy evaluating to true triggers rate limiting. Use when you want aggressive protection (limit when either CPU OR memory is high).

Configuration Steps:

  1. Select AND (composite) or OR (composite) as the Policy Type
  2. Choose the Evaluation Mode (typically Enforce)
  3. Click Add New under Sub-Policies to add the first condition
  4. Configure the first sub-policy by selecting policy type (CPU Usage or Memory Usage), selecting evaluation mode, setting absolute and/or relative limits, and setting refresh interval
  5. In the parent policy (not within the child), click Add New again to add a sibling sub-policy
  6. Configure additional sub-policies following the same pattern

The GUI generates YAML as follows:

# AND composite policy - both CPU AND memory must exceed limits
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: and
        evaluation_mode: enforce
        sub_policies:
          # First sub-policy (sibling)
          - policy_type: cpu_usage
            evaluation_mode: enforce
            absolute_limit: 0.75  # Limit to 75% of one core
            refresh_interval: 15s
          # Second sub-policy (sibling)
          - policy_type: memory_usage
            evaluation_mode: enforce
            absolute_limit: 1073741824  # 1Gi in bytes
            refresh_interval: 15s
# OR composite policy - either CPU OR memory can trigger
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: or
        evaluation_mode: enforce
        sub_policies:
          - policy_type: cpu_usage
            evaluation_mode: enforce
            relative_limit: 85  # 85% of available CPU
            refresh_interval: 20s
          - policy_type: memory_usage
            evaluation_mode: enforce
            relative_limit: 80  # 80% of available memory
            refresh_interval: 20s
# Monitor mode for testing thresholds
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: memory_usage
        evaluation_mode: monitor  # Only logs, doesn't limit
        relative_limit: 70  # Test at 70% before enforcing
        refresh_interval: 30s

How to Use the Datadog Agent Connector

The Datadog Agent connector integrates seamlessly with AI Team, enabling analysis of metrics, logs, and traces from Datadog-instrumented infrastructure. AI teammates automatically leverage the ingested data based on the queries they receive and the context of the conversation.

Use Case: Microservices Performance Analysis

Applications using DogStatsD client libraries emit custom metrics like request counts, error rates, and response times. AI teammates can analyze these metrics to identify performance issues. For example, when investigating slow API responses, teammates can correlate DogStatsD metrics with APM traces to identify database bottlenecks or external service delays.

Use Case: Infrastructure Health Monitoring

Datadog agents running on servers send system-level metrics (CPU, memory, disk, network). AI teammates can analyze resource utilization patterns and predict capacity issues. When combined with alert connectors like PagerDuty, teammates can investigate infrastructure alerts by querying recent metrics from affected hosts.

Use Case: Distributed Trace Analysis

Applications instrumented with Datadog APM libraries send detailed trace data. AI teammates can analyze latency patterns across microservices, identify slow database queries, and detect bottlenecks in external API calls. This is valuable when investigating production incidents—teammates correlate traces with logs and metrics for root cause analysis.

Troubleshooting

No data appearing: Verify Datadog agents are configured to send data to the Edge Delta endpoint (check /etc/datadog-agent/datadog.yaml). Test connectivity with telnet edge-delta-host 3421.

Connection refused errors: Confirm Edge Delta is listening on the configured port (netstat -tuln | grep 3421). Check firewall rules allow inbound traffic on the connector port.

Missing tags or metadata: Verify Datadog agents include tags in metric submissions. Check that DogStatsD metrics use proper tag format: metric_name:value|type|#tag1:value1,tag2:value2.

APM traces not appearing: Confirm tracer libraries are configured with the Edge Delta endpoint (check DD_AGENT_HOST and DD_TRACE_AGENT_PORT environment variables). Increase read timeout to 2m for large trace payloads.

High memory usage: Reduce read timeout to release connections faster. Configure rate limiting in your pipeline if receiving unexpectedly high volumes.

Next Steps

For additional help, visit AI Team Support.