Edge Delta HTTP Pull Source

Fetch HTTP log traffic from an endpoint.

Overview

The HTTP Pull node allows the Edge Delta agent to periodically send HTTP requests to an endpoint in order to pull data. This type of data ingestion is useful when you need to retrieve logs or data from an HTTP-based API.

The HTTP Pull input supports OTTL (OpenTelemetry Transformation Language) expressions through dedicated configuration fields, allowing you to use environment variables, timestamps, and other dynamic data without hardcoding values in your configuration.

  • outgoing_data_types: log

OTTL Expressions

The HTTP Pull input supports OTTL (OpenTelemetry Transformation Language) expressions through three dedicated configuration fields that are evaluated on each pull cycle, enabling secure credential management and time-based queries. These fields—endpoint_expression for dynamic endpoint URLs, header_expressions for runtime-evaluated headers, and parameter_expressions for dynamic query parameters—must be used explicitly as OTTL expressions are not auto-detected in regular configuration fields. Each expression field accepts OTTL functions and is fully defined in the Optional Parameters section below.

If you’re migrating from v2 configurations that use the {{ Env }} syntax, see the Migration from Environment Variable Syntax section for conversion guidance.

The following functions are supported:

Function Description Example
Now() Current timestamp Now()
UnixSeconds() Unix timestamp in seconds UnixSeconds(Now())
UnixMilli() Unix timestamp in milliseconds UnixMilli(Now())
Duration() Time duration parsing Duration("10m"), Duration("1h")
EDXEnv() Environment variable with fallback EDXEnv("API_KEY", "default")
Concat() String concatenation Concat(["Bearer ", EDXEnv("TOKEN", "")], "")
FormatTime() Time formatting FormatTime(Now(), "%Y-%m-%d")
String() Convert to string String(123)
Time() Parse time strings Time("2024-01-01")
ToLowerCase() Convert to lowercase ToLowerCase("HELLO")
ToUpperCase() Convert to uppercase ToUpperCase("hello")
Replace() String replacement Replace("hello world", "world", "universe")
Substring() Extract substring Substring("hello", 0, 2)

Best practices for OTTL expressions:

  • Secure Credentials: Always use EDXEnv() for API tokens and sensitive values instead of hardcoding them
  • Use Dedicated Fields: Always use *_expression fields for OTTL expressions, not the regular fields
  • Time Windows: Use Duration() with Now() for relative time queries instead of absolute timestamps
  • Fallback Values: Provide meaningful fallback values in EDXEnv() calls for better error handling
  • Expression Testing: Test OTTL expressions in a development environment before deploying to production
  • Mix Static and Dynamic: You can use both static fields and expression fields in the same configuration

Pagination

The HTTP Pull input provides automatic pagination support for APIs that return data across multiple pages, eliminating the need to manually handle pagination logic in your configuration. The node intelligently detects and follows pagination links, whether they’re embedded in JSON response bodies or provided through standard Link headers (RFC 5988), ensuring complete data retrieval from paginated endpoints.

When you configure pagination using the pagination field, the HTTP Pull input starts with the initial endpoint request, then automatically discovers and fetches additional pages based on your chosen pagination method. You can specify either url_json_path to extract next-page URLs from JSON responses (common with APIs like Microsoft Graph), or link_relation to follow standardized Link headers used by GitHub, GitLab, and similar APIs. The max_parallel_requests parameter allows you to control concurrency, balancing between faster data retrieval and respecting API rate limits. Full parameter definitions are available in the Optional Parameters section.

Pagination behaves as follows:

  1. Initial Request: The first request is made to the configured endpoint
  2. Page Discovery: The response is checked for pagination information
  3. Concurrent Fetching: Additional pages are fetched concurrently (up to max_parallel_requests)
  4. Completion: Pagination stops when no more pages are found or on error
  5. Data Processing: All retrieved data is processed and forwarded as logs

Example Configurations

Static Configuration

This example demonstrates a straightforward HTTP Pull configuration with static values for all parameters.

nodes:
- name: my_api_http_pull
  type: http_pull_input
  endpoint: https://api.yourapp.com
  method: GET
  headers:
    - header: Accept
      value: application/json
  parameters:
    - name: tag
      value: source_id
  pull_interval: 1m
  retry_http_code:
    - 409
    - 429

This configuration sets up periodic data retrieval from https://api.yourapp.com using a GET request. The node includes an Accept header to request JSON responses and adds a tag query parameter with the value source_id to identify the data source. The pull_interval of 1 minute determines how frequently the endpoint is queried for new data. The retry_http_code array specifies that HTTP 409 (Conflict) and 429 (Too Many Requests) responses should trigger automatic retries, ensuring resilience against temporary server issues or rate limiting.

Basic Dynamic Configuration with OTTL

This example showcases the power of OTTL expressions for dynamic configuration, enabling secure credential management and time-based queries without hardcoding sensitive values.

nodes:
- name: github_events_pull
  type: http_pull_input
  endpoint: https://api.github.com/orgs/edgedelta/events
  method: GET
  # Static headers
  headers:
    - header: Accept
      value: application/json
  # Dynamic headers using OTTL expressions
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
  # Dynamic parameters using OTTL expressions
  parameter_expressions:
    since: FormatTime(Now() - Duration("10m"), "%Y-%m-%dT%H:%M:%SZ")
    per_page: "100"
  pull_interval: 5m

Unlike the static configuration, this example leverages OTTL expressions to inject dynamic values at runtime. The header_expressions field securely retrieves the GitHub API token from environment variables using EDXEnv(), avoiding hardcoded credentials in the configuration. The parameter_expressions field dynamically calculates a timestamp for the since parameter, requesting only events from the last 10 minutes relative to each pull cycle. This approach ensures the configuration remains secure and adaptable, with the since parameter automatically adjusting to capture recent data on every 5-minute pull interval, while the static per_page parameter limits results to 100 events per request.

Many APIs (GitHub, GitLab, Stripe) use the Link header for pagination following RFC 5988:

nodes:
- name: github_repos
  type: http_pull_input
  endpoint: https://api.github.com/orgs/edgedelta/repos
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
  parameter_expressions:
    per_page: "100"
  pagination:
    link_relation: "next"  # Follows the "next" link from Link header
    max_parallel_requests: 3

Example Link header response:

Link: <https://api.github.com/organizations/123/repos?page=2>; rel="next",
      <https://api.github.com/organizations/123/repos?page=10>; rel="last"

GitHub API with Dynamic Timestamps

This configuration demonstrates how to continuously monitor recent GitHub organization events using time-windowed queries that automatically adjust with each polling cycle.

nodes:
- name: github_events
  type: http_pull_input
  endpoint: https://api.github.com/orgs/myorg/events  
  # Static headers
  headers:
    - header: Accept
      value: application/vnd.github.v3+json
  # Dynamic headers using OTTL expressions
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
  # Dynamic parameters using OTTL expressions
  parameter_expressions:
    # Get events from the last 10 minutes
    since: FormatTime(Now() - Duration("10m"), "%Y-%m-%dT%H:%M:%SZ")
    per_page: "100"
  pull_interval: 5m

This configuration combines static and dynamic elements to create a robust GitHub monitoring solution. The static Accept header ensures the API returns data in the expected JSON format, while the dynamic Authorization header securely retrieves credentials from the GITHUB_TOKEN environment variable at runtime. The since parameter uses OTTL’s time functions to create a sliding 10-minute window, automatically calculating the appropriate timestamp for each request based on the current time. With a 5-minute pull interval, this creates overlapping time windows that ensure no events are missed, while the per_page parameter optimizes data retrieval by requesting up to 100 events per API call.

Dynamic Endpoint with Environment Variables

This example shows how to build fully dynamic configurations where even the endpoint URL adapts to different environments without modifying the configuration file.

nodes:
- name: api_monitor
  type: http_pull_input
  # Dynamic endpoint based on environment
  endpoint_expression: Concat(["https://", EDXEnv("API_HOST", "api.example.com"), "/v1/metrics"], "")
  # Dynamic headers
  header_expressions:
    X-API-Key: EDXEnv("API_KEY", "")
    X-Request-ID: Concat(["req-", String(UnixMilli(Now()))], "")
  pull_interval: 30s

This configuration demonstrates complete environment-driven flexibility where nothing is hardcoded. The endpoint_expression dynamically constructs the full URL using the API_HOST environment variable, allowing the same configuration to work across development, staging, and production environments by simply changing environment variables. The X-API-Key header pulls authentication credentials from the environment, maintaining security by never exposing sensitive values in the configuration. The X-Request-ID header generates a unique identifier for each request using the current timestamp in milliseconds, which aids in request tracing and debugging across distributed systems. This pattern is particularly valuable for multi-environment deployments where the same configuration needs to adapt to different API endpoints and credentials.

Microsoft Graph API with Time Windows

This configuration demonstrates how to query Microsoft Graph API for Azure AD sign-in logs using OData filters with dynamic time ranges for continuous security monitoring.

nodes:
- name: office365_audit_logs
  type: http_pull_input
  endpoint: https://graph.microsoft.com/v1.0/auditLogs/signIns
  # Dynamic headers
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GRAPH_TOKEN", "")], "")
  # Dynamic parameters
  parameter_expressions:
    # Query for sign-ins in the last hour
    $filter: Concat(["createdDateTime ge ", FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%S.000Z")], "")
    $top: "100"
  pull_interval: 15m

This configuration leverages Microsoft Graph’s OData query syntax to retrieve Azure AD sign-in events within a rolling one-hour window. The $filter parameter dynamically constructs an OData filter expression that requests only events created after a timestamp calculated as one hour before the current time, formatted to match Microsoft’s ISO 8601 requirements. The GRAPH_TOKEN environment variable provides secure authentication without exposing credentials, while the $top parameter limits each request to 100 records to manage response size. With a 15-minute pull interval and a one-hour lookback window, this creates substantial overlap between queries, ensuring comprehensive audit log coverage even if there are brief connectivity issues or processing delays.

Unix Timestamp Parameters

Many APIs require Unix timestamps for time-based queries, and this example shows how to generate both seconds and milliseconds precision timestamps dynamically.

nodes:
- name: api_with_unix_time
  type: http_pull_input
  endpoint: https://api.example.com/logs
  parameter_expressions:
    # Unix timestamp in seconds - last 24 hours
    start_time: String(UnixSeconds(Now() - Duration("24h")))
    # Unix timestamp in milliseconds - current time
    end_time: String(UnixMilli(Now()))
  pull_interval: 1h

This configuration demonstrates how to work with APIs that expect Unix timestamps rather than formatted date strings. The start_time parameter calculates a timestamp 24 hours before the current time and converts it to Unix seconds using UnixSeconds(), while end_time captures the current moment with millisecond precision using UnixMilli(). Both values are then converted to strings since query parameters must be text values. This creates a rolling 24-hour window that advances with each hourly pull, ensuring continuous coverage of log data without gaps or excessive duplication. This pattern is particularly common with logging and monitoring APIs that use numeric timestamps for efficient time-range queries.

This example demonstrates automatic pagination using RFC 5988 Link headers, which GitHub and similar APIs use to provide navigation links between result pages.

nodes:
- name: github_events
  type: http_pull_input
  endpoint: https://api.github.com/orgs/edgedelta/events
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
  parameter_expressions:
    per_page: "100"
  pagination:
    link_relation: "next"

This streamlined configuration shows how pagination can be effortlessly handled for APIs that follow the Link header standard. By specifying link_relation: "next", the HTTP Pull input automatically parses the Link header from each response and follows the URL marked with rel="next" to retrieve subsequent pages. The per_page parameter maximizes efficiency by requesting 100 items per API call, reducing the total number of requests needed. Combined with secure token authentication from environment variables, this configuration will automatically retrieve all available event data across multiple pages without any manual pagination logic, making it ideal for complete data extraction from GitHub’s event streams.

JSON URL Extraction Pagination

For APIs that return the next page URL in the response body:

nodes:
- name: office365_logs
  type: http_pull_input
  endpoint: https://graph.microsoft.com/v1.0/auditLogs/signIns
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GRAPH_TOKEN", "")], "")
  pagination:
    url_json_path: "$['@odata.nextLink']"  # Microsoft Graph pagination
    max_parallel_requests: 5

Common JSONPath patterns:

  • Microsoft Graph: "$['@odata.nextLink']"
  • Simple next URL: "$.next"
  • Nested pagination: "$.pagination.next_url"
  • Array of URLs: "$.urls[*]"

Microsoft Graph with JSON Pagination

This configuration combines time-windowed filtering with JSON-based pagination to comprehensively collect Azure AD sign-in logs from Microsoft Graph API.

nodes:
- name: azure_ad_signins
  type: http_pull_input
  endpoint: https://graph.microsoft.com/v1.0/auditLogs/signIns
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("GRAPH_TOKEN", "")], "")
  parameter_expressions:
    $top: "100"
    $filter: Concat(["createdDateTime ge ", FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")], "")
  pagination:
    url_json_path: "$['@odata.nextLink']"

This configuration showcases how to handle Microsoft Graph’s OData pagination model alongside dynamic filtering. The url_json_path parameter uses a JSONPath expression to extract the continuation URL from the @odata.nextLink field that Microsoft includes in responses when more data is available. The bracket notation $['@odata.nextLink'] is necessary because the field name contains a special character (@). Combined with the one-hour time filter and 100-record page size, this configuration ensures complete retrieval of all sign-in events within the time window, automatically following pagination links until all matching records are collected. This pattern is essential for security monitoring scenarios where missing even a single sign-in event could impact audit compliance or threat detection.

Custom API with Nested Pagination

This example illustrates how to handle APIs with nested pagination metadata and optimize retrieval speed through increased parallelism.

nodes:
- name: custom_api
  type: http_pull_input
  endpoint: https://api.custom.com/v1/logs
  header_expressions:
    X-API-Key: EDXEnv("CUSTOM_API_KEY", "")
  pagination:
    url_json_path: "$.meta.pagination.next_url"
    max_parallel_requests: 10

This configuration demonstrates pagination for APIs that nest their navigation links within structured metadata objects. The JSONPath expression $.meta.pagination.next_url navigates through the response structure to locate the next page URL, showing how to handle APIs that organize their pagination information differently than standard implementations. The increased max_parallel_requests value of 10 significantly accelerates data retrieval by fetching multiple pages simultaneously, which is particularly beneficial when dealing with APIs that have good rate limiting tolerance and large datasets spread across many pages. This parallel fetching approach can reduce total collection time from minutes to seconds for APIs with hundreds of pages, while the environment-based API key ensures secure authentication across all concurrent requests.

Required Parameters

name

A descriptive name for the node. This is the name that will appear in pipeline builder and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: http_pull_input

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

endpoint

The endpoint specifies the URL to which the HTTP requests are sent. It is a required parameter and must be specified as a valid URL. For dynamic endpoints, use the endpoint_expression field instead.

nodes:
- name: ed_api_http_pull
  type: http_pull_input
  endpoint: https://api.yourapp.com
  method: GET

method

The method parameter defines the HTTP method used for requests. Supported values are GET and POST. This is a required parameter.

nodes:
- name: ed_api_http_pull
  type: http_pull_input
  endpoint: https://api.yourapp.com
  method: GET

Optional Parameters

endpoint_expression

The endpoint_expression parameter enables you to construct dynamic endpoint URLs using OTTL expressions. This is useful when the endpoint URL needs to be determined at runtime based on environment variables or other dynamic values. When specified, this field takes precedence over the static endpoint field.

nodes:
- name: dynamic_api_pull
  type: http_pull_input
  endpoint_expression: Concat(["https://", EDXEnv("API_HOST", "api.example.com"), "/v1/data"], "")
  method: GET

headers

The headers parameter allows adding static HTTP headers to the requests. It is specified as a list of key-value pairs. For dynamic headers, use the header_expressions field instead.

nodes:
- name: ed_api_http_pull
  type: http_pull_input
  endpoint: https://api.yourapp.com
  method: GET
  headers:
    - header: Accept
      value: application/json
    - header: User-Agent
      value: EdgeDelta/1.0

header_expressions

The header_expressions parameter allows you to define HTTP headers using OTTL expressions that are evaluated at runtime. This is essential for secure credential management and dynamic header values. It is specified as a map of header names to OTTL expressions.

nodes:
- name: secure_api_pull
  type: http_pull_input
  endpoint: https://api.example.com/data
  method: GET
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("API_TOKEN", "")], "")
    X-Request-ID: Concat(["req-", String(UnixMilli(Now()))], "")
    X-Client-Version: EDXEnv("CLIENT_VERSION", "1.0.0")

parameters

The parameters field enables you to add static query parameters to the requests. It is specified as a list of key-value pairs. For dynamic parameters, use the parameter_expressions field instead.

nodes:
- name: ed_api_http_pull
  type: http_pull_input
  endpoint: https://api.yourapp.com
  method: GET
  parameters:
    - name: tag
      value: source_id
    - name: limit
      value: "100"

parameter_expressions

The parameter_expressions parameter enables you to define query parameters using OTTL expressions that are evaluated on each pull cycle. This is particularly useful for time-based queries, pagination tokens, and other dynamic values. It is specified as a map of parameter names to OTTL expressions.

nodes:
- name: time_windowed_api
  type: http_pull_input
  endpoint: https://api.example.com/events
  method: GET
  parameter_expressions:
    since: FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")
    until: FormatTime(Now(), "%Y-%m-%dT%H:%M:%SZ")
    limit: "500"
    offset: EDXEnv("INITIAL_OFFSET", "0")

pull_interval

The pull_interval is the frequency at which HTTP requests are sent to the endpoint. The default is 1m (1 minute) and it is specified as a duration.

nodes:
- name: ed_api_http_pull
  type: http_pull_input
  endpoint: https://api.yourapp.com
  method: GET
  pull_interval: 1m

retry_http_code

The retry_http_code parameter specifies additional HTTP status codes that will trigger a retry of the request. It is specified as a list of integers and is optional.

nodes:
- name: ed_api_http_pull
  type: http_pull_input
  endpoint: https://api.yourapp.com
  method: GET
  retry_http_code:
    - 409
    - 429

pagination

The pagination parameter configures automatic pagination for APIs that return data across multiple pages. When specified, the HTTP Pull input will automatically follow pagination links to retrieve all available data. It is specified as an object with configuration options and is optional.

nodes:
- name: paginated_api
  type: http_pull_input
  endpoint: https://api.example.com/data
  pagination:
    url_json_path: "$.next"  # For JSON-based pagination
    # OR
    link_relation: "next"  # For Link header pagination
    max_parallel_requests: 5

url_json_path

The url_json_path parameter specifies a JSONPath expression to extract the next page URL from the response body. Use this for APIs that include pagination URLs in their JSON responses. It is specified as a string and is optional.

nodes:
- name: json_paginated_api
  type: http_pull_input
  endpoint: https://api.example.com/data
  pagination:
    url_json_path: "$.pagination.next_url"

Common patterns:

  • Microsoft Graph: "$['@odata.nextLink']"
  • Simple next URL: "$.next"
  • Nested pagination: "$.meta.pagination.next"

The link_relation parameter specifies which link relation to follow from the Link header (RFC 5988). Use this for APIs that provide pagination through Link headers like GitHub, GitLab, and Stripe. It is specified as a string and is optional.

nodes:
- name: link_paginated_api
  type: http_pull_input
  endpoint: https://api.github.com/repos
  pagination:
    link_relation: "next"

The agent will parse Link headers like:

Link: <https://api.example.com/data?page=2>; rel="next"

max_parallel_requests

The max_parallel_requests parameter limits the number of concurrent requests when fetching additional pages. This helps prevent overwhelming the API server or hitting rate limits. Default is 5. It is specified as an integer and is optional.

nodes:
- name: rate_limited_api
  type: http_pull_input
  endpoint: https://api.example.com/data
  pagination:
    link_relation: "next"
    max_parallel_requests: 3  # Reduce for rate-limited APIs

source_metadata

The source_metadata parameter is used to define which detected resources and attributes to add to each data item as it is ingested by the Edge Delta agent. In the GUI you can select:

  • Required Only: This option includes the minimum required resources and attributes for Edge Delta to operate.
  • Default: This option includes the required resources and attributes plus those selected by Edge Delta
  • High: This option includes the required resources and attributes along with a larger selection of common optional fields.
  • Custom: With this option selected, you can choose which attributes and resources to include. The required fields are selected by default and can’t be unchecked.

Based on your selection in the GUI, the source_metadata YAML is populated as two dictionaries (resource_attributes and attributes) with Boolean values.

See Choose Data Item Metadata for more information on selecting metadata.

Migration from Environment Variable Syntax

If you’re migrating from v2 configs that use the {{ Env }} syntax, here’s how to convert to OTTL expressions:

Old v2 Syntax:

nodes:
- name: api_pull
  type: http_pull_input
  endpoint: "https://{{ Env \"API_HOST\" \"api.example.com\" }}/data"
  headers:
    - header: Authorization
      value: "Bearer {{ Env \"API_TOKEN\" \"default_token\" }}"

New OTTL Syntax:

nodes:
- name: api_pull
  type: http_pull_input
  # Use endpoint_expression for dynamic endpoints
  endpoint_expression: Concat(["https://", EDXEnv("API_HOST", "api.example.com"), "/data"], "")
  # Use header_expressions for dynamic headers
  header_expressions:
    Authorization: Concat(["Bearer ", EDXEnv("API_TOKEN", "default_token")], "")

Troubleshooting Pagination

To troubleshoot pagination issues, enable debug logging for your Edge Delta agent by setting the log level in your agent configuration. Debug logs will reveal detailed pagination behavior including which URLs are being followed and any errors encountered.

nodes:
- name: debug_pagination
  type: http_pull_input
  endpoint: https://api.example.com/data
  pagination:
    link_relation: "next"

When debug logging is enabled, you’ll see messages like:

  • "Following pagination URL: <url> (page N)" - Shows each page being fetched
  • "Total pages retrieved: X" - Summary of pagination results
  • "Pagination error: <details>" - Any issues encountered during pagination

Common pagination issues and solutions:

  • No pagination detected: Verify the API response contains the expected Link header or JSON field at the path specified in your configuration
  • Infinite loops: The agent automatically detects and prevents circular pagination, but check your API documentation for proper pagination handling
  • Rate limiting errors: Reduce max_parallel_requests to stay within API rate limits
  • Authentication failures on subsequent pages: Ensure tokens/credentials remain valid for the entire pagination process and aren’t request-specific

Testing an Endpoint

You can test your endpoint with an HTTP request using curl, a command-line tool for transferring data with URLs. The command contents depend on how your endpoint is configured, such as the HTTP method, headers, query parameters, and whether authentication is required.

Syntax:

curl -X <HTTP_METHOD> "<ENDPOINT_URL>?<QUERY_PARAMETERS>" -H "<HEADER>: <HEADER_VALUE>"
  • HTTP_METHOD: This could be GET or POST, depending on the method supported by the endpoint.
  • ENDPOINT_URL: The URL of the endpoint to which the request is being sent.
  • QUERY_PARAMETERS: Optional, used if the endpoint requires query parameters in the URL.
  • HEADER_VALUE: Specified using -H flag, which adds custom header fields to the request. Headers are often needed for specifying content types or for authentication.

If using POST, include -d to specify data to send in the request body:

-d '{"key1":"value1", "key2":"value2"}'

Example Command:

curl -X GET "https://api.your-company.com/data?tag=value" -H "Accept: application/json" -H "Authorization: Bearer XYZ123"