Edge Delta HTTP Pull Source
17 minute read
Overview
The HTTP Pull node allows the Edge Delta agent to periodically send HTTP requests to an endpoint in order to pull data. This type of data ingestion is useful when you need to retrieve logs or data from an HTTP-based API.
The HTTP Pull input supports OTTL (OpenTelemetry Transformation Language) expressions through dedicated configuration fields, allowing you to use environment variables, timestamps, and other dynamic data without hardcoding values in your configuration.
- outgoing_data_types: log

OTTL Expressions
The HTTP Pull input supports OTTL (OpenTelemetry Transformation Language) expressions through three dedicated configuration fields that are evaluated on each pull cycle, enabling secure credential management and time-based queries. These fields—endpoint_expression
for dynamic endpoint URLs, header_expressions
for runtime-evaluated headers, and parameter_expressions
for dynamic query parameters—must be used explicitly as OTTL expressions are not auto-detected in regular configuration fields. Each expression field accepts OTTL functions and is fully defined in the Optional Parameters section below.
If you’re migrating from v2 configurations that use the
{{ Env }}
syntax, see the Migration from Environment Variable Syntax section for conversion guidance.
The following functions are supported:
Function | Description | Example |
---|---|---|
Now() |
Current timestamp | Now() |
UnixSeconds() |
Unix timestamp in seconds | UnixSeconds(Now()) |
UnixMilli() |
Unix timestamp in milliseconds | UnixMilli(Now()) |
Duration() |
Time duration parsing | Duration("10m") , Duration("1h") |
EDXEnv() |
Environment variable with fallback | EDXEnv("API_KEY", "default") |
Concat() |
String concatenation | Concat(["Bearer ", EDXEnv("TOKEN", "")], "") |
FormatTime() |
Time formatting | FormatTime(Now(), "%Y-%m-%d") |
String() |
Convert to string | String(123) |
Time() |
Parse time strings | Time("2024-01-01") |
ToLowerCase() |
Convert to lowercase | ToLowerCase("HELLO") |
ToUpperCase() |
Convert to uppercase | ToUpperCase("hello") |
Replace() |
String replacement | Replace("hello world", "world", "universe") |
Substring() |
Extract substring | Substring("hello", 0, 2) |
Best practices for OTTL expressions:
- Secure Credentials: Always use
EDXEnv()
for API tokens and sensitive values instead of hardcoding them - Use Dedicated Fields: Always use
*_expression
fields for OTTL expressions, not the regular fields - Time Windows: Use
Duration()
withNow()
for relative time queries instead of absolute timestamps - Fallback Values: Provide meaningful fallback values in
EDXEnv()
calls for better error handling - Expression Testing: Test OTTL expressions in a development environment before deploying to production
- Mix Static and Dynamic: You can use both static fields and expression fields in the same configuration
Pagination
The HTTP Pull input provides automatic pagination support for APIs that return data across multiple pages, eliminating the need to manually handle pagination logic in your configuration. The node intelligently detects and follows pagination links, whether they’re embedded in JSON response bodies or provided through standard Link headers (RFC 5988), ensuring complete data retrieval from paginated endpoints.
When you configure pagination using the pagination
field, the HTTP Pull input starts with the initial endpoint request, then automatically discovers and fetches additional pages based on your chosen pagination method. You can specify either url_json_path
to extract next-page URLs from JSON responses (common with APIs like Microsoft Graph), or link_relation
to follow standardized Link headers used by GitHub, GitLab, and similar APIs. The max_parallel_requests
parameter allows you to control concurrency, balancing between faster data retrieval and respecting API rate limits. Full parameter definitions are available in the Optional Parameters section.
Pagination behaves as follows:
- Initial Request: The first request is made to the configured endpoint
- Page Discovery: The response is checked for pagination information
- Concurrent Fetching: Additional pages are fetched concurrently (up to
max_parallel_requests
) - Completion: Pagination stops when no more pages are found or on error
- Data Processing: All retrieved data is processed and forwarded as logs
Example Configurations
Static Configuration
This example demonstrates a straightforward HTTP Pull configuration with static values for all parameters.
nodes:
- name: my_api_http_pull
type: http_pull_input
endpoint: https://api.yourapp.com
method: GET
headers:
- header: Accept
value: application/json
parameters:
- name: tag
value: source_id
pull_interval: 1m
retry_http_code:
- 409
- 429
This configuration sets up periodic data retrieval from https://api.yourapp.com
using a GET request. The node includes an Accept
header to request JSON responses and adds a tag
query parameter with the value source_id
to identify the data source. The pull_interval
of 1 minute determines how frequently the endpoint is queried for new data. The retry_http_code
array specifies that HTTP 409 (Conflict) and 429 (Too Many Requests) responses should trigger automatic retries, ensuring resilience against temporary server issues or rate limiting.
Basic Dynamic Configuration with OTTL
This example showcases the power of OTTL expressions for dynamic configuration, enabling secure credential management and time-based queries without hardcoding sensitive values.
nodes:
- name: github_events_pull
type: http_pull_input
endpoint: https://api.github.com/orgs/edgedelta/events
method: GET
# Static headers
headers:
- header: Accept
value: application/json
# Dynamic headers using OTTL expressions
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
# Dynamic parameters using OTTL expressions
parameter_expressions:
since: FormatTime(Now() - Duration("10m"), "%Y-%m-%dT%H:%M:%SZ")
per_page: "100"
pull_interval: 5m
Unlike the static configuration, this example leverages OTTL expressions to inject dynamic values at runtime. The header_expressions
field securely retrieves the GitHub API token from environment variables using EDXEnv()
, avoiding hardcoded credentials in the configuration. The parameter_expressions
field dynamically calculates a timestamp for the since
parameter, requesting only events from the last 10 minutes relative to each pull cycle. This approach ensures the configuration remains secure and adaptable, with the since
parameter automatically adjusting to capture recent data on every 5-minute pull interval, while the static per_page
parameter limits results to 100 events per request.
Link header for pagination
Many APIs (GitHub, GitLab, Stripe) use the Link header for pagination following RFC 5988:
nodes:
- name: github_repos
type: http_pull_input
endpoint: https://api.github.com/orgs/edgedelta/repos
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameter_expressions:
per_page: "100"
pagination:
link_relation: "next" # Follows the "next" link from Link header
max_parallel_requests: 3
Example Link header response:
Link: <https://api.github.com/organizations/123/repos?page=2>; rel="next",
<https://api.github.com/organizations/123/repos?page=10>; rel="last"
GitHub API with Dynamic Timestamps
This configuration demonstrates how to continuously monitor recent GitHub organization events using time-windowed queries that automatically adjust with each polling cycle.
nodes:
- name: github_events
type: http_pull_input
endpoint: https://api.github.com/orgs/myorg/events
# Static headers
headers:
- header: Accept
value: application/vnd.github.v3+json
# Dynamic headers using OTTL expressions
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
# Dynamic parameters using OTTL expressions
parameter_expressions:
# Get events from the last 10 minutes
since: FormatTime(Now() - Duration("10m"), "%Y-%m-%dT%H:%M:%SZ")
per_page: "100"
pull_interval: 5m
This configuration combines static and dynamic elements to create a robust GitHub monitoring solution. The static Accept
header ensures the API returns data in the expected JSON format, while the dynamic Authorization
header securely retrieves credentials from the GITHUB_TOKEN
environment variable at runtime. The since
parameter uses OTTL’s time functions to create a sliding 10-minute window, automatically calculating the appropriate timestamp for each request based on the current time. With a 5-minute pull interval, this creates overlapping time windows that ensure no events are missed, while the per_page
parameter optimizes data retrieval by requesting up to 100 events per API call.
Dynamic Endpoint with Environment Variables
This example shows how to build fully dynamic configurations where even the endpoint URL adapts to different environments without modifying the configuration file.
nodes:
- name: api_monitor
type: http_pull_input
# Dynamic endpoint based on environment
endpoint_expression: Concat(["https://", EDXEnv("API_HOST", "api.example.com"), "/v1/metrics"], "")
# Dynamic headers
header_expressions:
X-API-Key: EDXEnv("API_KEY", "")
X-Request-ID: Concat(["req-", String(UnixMilli(Now()))], "")
pull_interval: 30s
This configuration demonstrates complete environment-driven flexibility where nothing is hardcoded. The endpoint_expression
dynamically constructs the full URL using the API_HOST
environment variable, allowing the same configuration to work across development, staging, and production environments by simply changing environment variables. The X-API-Key
header pulls authentication credentials from the environment, maintaining security by never exposing sensitive values in the configuration. The X-Request-ID
header generates a unique identifier for each request using the current timestamp in milliseconds, which aids in request tracing and debugging across distributed systems. This pattern is particularly valuable for multi-environment deployments where the same configuration needs to adapt to different API endpoints and credentials.
Microsoft Graph API with Time Windows
This configuration demonstrates how to query Microsoft Graph API for Azure AD sign-in logs using OData filters with dynamic time ranges for continuous security monitoring.
nodes:
- name: office365_audit_logs
type: http_pull_input
endpoint: https://graph.microsoft.com/v1.0/auditLogs/signIns
# Dynamic headers
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("GRAPH_TOKEN", "")], "")
# Dynamic parameters
parameter_expressions:
# Query for sign-ins in the last hour
$filter: Concat(["createdDateTime ge ", FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%S.000Z")], "")
$top: "100"
pull_interval: 15m
This configuration leverages Microsoft Graph’s OData query syntax to retrieve Azure AD sign-in events within a rolling one-hour window. The $filter
parameter dynamically constructs an OData filter expression that requests only events created after a timestamp calculated as one hour before the current time, formatted to match Microsoft’s ISO 8601 requirements. The GRAPH_TOKEN
environment variable provides secure authentication without exposing credentials, while the $top
parameter limits each request to 100 records to manage response size. With a 15-minute pull interval and a one-hour lookback window, this creates substantial overlap between queries, ensuring comprehensive audit log coverage even if there are brief connectivity issues or processing delays.
Unix Timestamp Parameters
Many APIs require Unix timestamps for time-based queries, and this example shows how to generate both seconds and milliseconds precision timestamps dynamically.
nodes:
- name: api_with_unix_time
type: http_pull_input
endpoint: https://api.example.com/logs
parameter_expressions:
# Unix timestamp in seconds - last 24 hours
start_time: String(UnixSeconds(Now() - Duration("24h")))
# Unix timestamp in milliseconds - current time
end_time: String(UnixMilli(Now()))
pull_interval: 1h
This configuration demonstrates how to work with APIs that expect Unix timestamps rather than formatted date strings. The start_time
parameter calculates a timestamp 24 hours before the current time and converts it to Unix seconds using UnixSeconds()
, while end_time
captures the current moment with millisecond precision using UnixMilli()
. Both values are then converted to strings since query parameters must be text values. This creates a rolling 24-hour window that advances with each hourly pull, ensuring continuous coverage of log data without gaps or excessive duplication. This pattern is particularly common with logging and monitoring APIs that use numeric timestamps for efficient time-range queries.
GitHub API with Link Headers
This example demonstrates automatic pagination using RFC 5988 Link headers, which GitHub and similar APIs use to provide navigation links between result pages.
nodes:
- name: github_events
type: http_pull_input
endpoint: https://api.github.com/orgs/edgedelta/events
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("GITHUB_TOKEN", "")], "")
parameter_expressions:
per_page: "100"
pagination:
link_relation: "next"
This streamlined configuration shows how pagination can be effortlessly handled for APIs that follow the Link header standard. By specifying link_relation: "next"
, the HTTP Pull input automatically parses the Link header from each response and follows the URL marked with rel="next"
to retrieve subsequent pages. The per_page
parameter maximizes efficiency by requesting 100 items per API call, reducing the total number of requests needed. Combined with secure token authentication from environment variables, this configuration will automatically retrieve all available event data across multiple pages without any manual pagination logic, making it ideal for complete data extraction from GitHub’s event streams.
JSON URL Extraction Pagination
For APIs that return the next page URL in the response body:
nodes:
- name: office365_logs
type: http_pull_input
endpoint: https://graph.microsoft.com/v1.0/auditLogs/signIns
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("GRAPH_TOKEN", "")], "")
pagination:
url_json_path: "$['@odata.nextLink']" # Microsoft Graph pagination
max_parallel_requests: 5
Common JSONPath patterns:
- Microsoft Graph:
"$['@odata.nextLink']"
- Simple next URL:
"$.next"
- Nested pagination:
"$.pagination.next_url"
- Array of URLs:
"$.urls[*]"
Microsoft Graph with JSON Pagination
This configuration combines time-windowed filtering with JSON-based pagination to comprehensively collect Azure AD sign-in logs from Microsoft Graph API.
nodes:
- name: azure_ad_signins
type: http_pull_input
endpoint: https://graph.microsoft.com/v1.0/auditLogs/signIns
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("GRAPH_TOKEN", "")], "")
parameter_expressions:
$top: "100"
$filter: Concat(["createdDateTime ge ", FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")], "")
pagination:
url_json_path: "$['@odata.nextLink']"
This configuration showcases how to handle Microsoft Graph’s OData pagination model alongside dynamic filtering. The url_json_path
parameter uses a JSONPath expression to extract the continuation URL from the @odata.nextLink
field that Microsoft includes in responses when more data is available. The bracket notation $['@odata.nextLink']
is necessary because the field name contains a special character (@). Combined with the one-hour time filter and 100-record page size, this configuration ensures complete retrieval of all sign-in events within the time window, automatically following pagination links until all matching records are collected. This pattern is essential for security monitoring scenarios where missing even a single sign-in event could impact audit compliance or threat detection.
Custom API with Nested Pagination
This example illustrates how to handle APIs with nested pagination metadata and optimize retrieval speed through increased parallelism.
nodes:
- name: custom_api
type: http_pull_input
endpoint: https://api.custom.com/v1/logs
header_expressions:
X-API-Key: EDXEnv("CUSTOM_API_KEY", "")
pagination:
url_json_path: "$.meta.pagination.next_url"
max_parallel_requests: 10
This configuration demonstrates pagination for APIs that nest their navigation links within structured metadata objects. The JSONPath expression $.meta.pagination.next_url
navigates through the response structure to locate the next page URL, showing how to handle APIs that organize their pagination information differently than standard implementations. The increased max_parallel_requests
value of 10 significantly accelerates data retrieval by fetching multiple pages simultaneously, which is particularly beneficial when dealing with APIs that have good rate limiting tolerance and large datasets spread across many pages. This parallel fetching approach can reduce total collection time from minutes to seconds for APIs with hundreds of pages, while the environment-based API key ensures secure authentication across all concurrent requests.
Required Parameters
name
A descriptive name for the node. This is the name that will appear in pipeline builder and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: http_pull_input
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
endpoint
The endpoint specifies the URL to which the HTTP requests are sent. It is a required parameter and must be specified as a valid URL. For dynamic endpoints, use the endpoint_expression
field instead.
nodes:
- name: ed_api_http_pull
type: http_pull_input
endpoint: https://api.yourapp.com
method: GET
method
The method parameter defines the HTTP method used for requests. Supported values are GET
and POST
. This is a required parameter.
nodes:
- name: ed_api_http_pull
type: http_pull_input
endpoint: https://api.yourapp.com
method: GET
Optional Parameters
endpoint_expression
The endpoint_expression
parameter enables you to construct dynamic endpoint URLs using OTTL expressions. This is useful when the endpoint URL needs to be determined at runtime based on environment variables or other dynamic values. When specified, this field takes precedence over the static endpoint
field.
nodes:
- name: dynamic_api_pull
type: http_pull_input
endpoint_expression: Concat(["https://", EDXEnv("API_HOST", "api.example.com"), "/v1/data"], "")
method: GET
headers
The headers parameter allows adding static HTTP headers to the requests. It is specified as a list of key-value pairs. For dynamic headers, use the header_expressions
field instead.
nodes:
- name: ed_api_http_pull
type: http_pull_input
endpoint: https://api.yourapp.com
method: GET
headers:
- header: Accept
value: application/json
- header: User-Agent
value: EdgeDelta/1.0
header_expressions
The header_expressions
parameter allows you to define HTTP headers using OTTL expressions that are evaluated at runtime. This is essential for secure credential management and dynamic header values. It is specified as a map of header names to OTTL expressions.
nodes:
- name: secure_api_pull
type: http_pull_input
endpoint: https://api.example.com/data
method: GET
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("API_TOKEN", "")], "")
X-Request-ID: Concat(["req-", String(UnixMilli(Now()))], "")
X-Client-Version: EDXEnv("CLIENT_VERSION", "1.0.0")
parameters
The parameters
field enables you to add static query parameters to the requests. It is specified as a list of key-value pairs. For dynamic parameters, use the parameter_expressions
field instead.
nodes:
- name: ed_api_http_pull
type: http_pull_input
endpoint: https://api.yourapp.com
method: GET
parameters:
- name: tag
value: source_id
- name: limit
value: "100"
parameter_expressions
The parameter_expressions
parameter enables you to define query parameters using OTTL expressions that are evaluated on each pull cycle. This is particularly useful for time-based queries, pagination tokens, and other dynamic values. It is specified as a map of parameter names to OTTL expressions.
nodes:
- name: time_windowed_api
type: http_pull_input
endpoint: https://api.example.com/events
method: GET
parameter_expressions:
since: FormatTime(Now() - Duration("1h"), "%Y-%m-%dT%H:%M:%SZ")
until: FormatTime(Now(), "%Y-%m-%dT%H:%M:%SZ")
limit: "500"
offset: EDXEnv("INITIAL_OFFSET", "0")
pull_interval
The pull_interval
is the frequency at which HTTP requests are sent to the endpoint. The default is 1m (1 minute) and it is specified as a duration.
nodes:
- name: ed_api_http_pull
type: http_pull_input
endpoint: https://api.yourapp.com
method: GET
pull_interval: 1m
retry_http_code
The retry_http_code
parameter specifies additional HTTP status codes that will trigger a retry of the request. It is specified as a list of integers and is optional.
nodes:
- name: ed_api_http_pull
type: http_pull_input
endpoint: https://api.yourapp.com
method: GET
retry_http_code:
- 409
- 429
pagination
The pagination
parameter configures automatic pagination for APIs that return data across multiple pages. When specified, the HTTP Pull input will automatically follow pagination links to retrieve all available data. It is specified as an object with configuration options and is optional.
nodes:
- name: paginated_api
type: http_pull_input
endpoint: https://api.example.com/data
pagination:
url_json_path: "$.next" # For JSON-based pagination
# OR
link_relation: "next" # For Link header pagination
max_parallel_requests: 5
url_json_path
The url_json_path
parameter specifies a JSONPath expression to extract the next page URL from the response body. Use this for APIs that include pagination URLs in their JSON responses. It is specified as a string and is optional.
nodes:
- name: json_paginated_api
type: http_pull_input
endpoint: https://api.example.com/data
pagination:
url_json_path: "$.pagination.next_url"
Common patterns:
- Microsoft Graph:
"$['@odata.nextLink']"
- Simple next URL:
"$.next"
- Nested pagination:
"$.meta.pagination.next"
link_relation
The link_relation
parameter specifies which link relation to follow from the Link header (RFC 5988). Use this for APIs that provide pagination through Link headers like GitHub, GitLab, and Stripe. It is specified as a string and is optional.
nodes:
- name: link_paginated_api
type: http_pull_input
endpoint: https://api.github.com/repos
pagination:
link_relation: "next"
The agent will parse Link headers like:
Link: <https://api.example.com/data?page=2>; rel="next"
max_parallel_requests
The max_parallel_requests
parameter limits the number of concurrent requests when fetching additional pages. This helps prevent overwhelming the API server or hitting rate limits. Default is 5. It is specified as an integer and is optional.
nodes:
- name: rate_limited_api
type: http_pull_input
endpoint: https://api.example.com/data
pagination:
link_relation: "next"
max_parallel_requests: 3 # Reduce for rate-limited APIs
source_metadata
The source_metadata
parameter is used to define which detected resources and attributes to add to each data item as it is ingested by the Edge Delta agent. In the GUI you can select:
- Required Only: This option includes the minimum required resources and attributes for Edge Delta to operate.
- Default: This option includes the required resources and attributes plus those selected by Edge Delta
- High: This option includes the required resources and attributes along with a larger selection of common optional fields.
- Custom: With this option selected, you can choose which attributes and resources to include. The required fields are selected by default and can’t be unchecked.
Based on your selection in the GUI, the source_metadata
YAML is populated as two dictionaries (resource_attributes
and attributes
) with Boolean values.
See Choose Data Item Metadata for more information on selecting metadata.
Migration from Environment Variable Syntax
If you’re migrating from v2 configs that use the {{ Env }}
syntax, here’s how to convert to OTTL expressions:
Old v2 Syntax:
nodes:
- name: api_pull
type: http_pull_input
endpoint: "https://{{ Env \"API_HOST\" \"api.example.com\" }}/data"
headers:
- header: Authorization
value: "Bearer {{ Env \"API_TOKEN\" \"default_token\" }}"
New OTTL Syntax:
nodes:
- name: api_pull
type: http_pull_input
# Use endpoint_expression for dynamic endpoints
endpoint_expression: Concat(["https://", EDXEnv("API_HOST", "api.example.com"), "/data"], "")
# Use header_expressions for dynamic headers
header_expressions:
Authorization: Concat(["Bearer ", EDXEnv("API_TOKEN", "default_token")], "")
Troubleshooting Pagination
To troubleshoot pagination issues, enable debug logging for your Edge Delta agent by setting the log level in your agent configuration. Debug logs will reveal detailed pagination behavior including which URLs are being followed and any errors encountered.
nodes:
- name: debug_pagination
type: http_pull_input
endpoint: https://api.example.com/data
pagination:
link_relation: "next"
When debug logging is enabled, you’ll see messages like:
"Following pagination URL: <url> (page N)"
- Shows each page being fetched"Total pages retrieved: X"
- Summary of pagination results"Pagination error: <details>"
- Any issues encountered during pagination
Common pagination issues and solutions:
- No pagination detected: Verify the API response contains the expected Link header or JSON field at the path specified in your configuration
- Infinite loops: The agent automatically detects and prevents circular pagination, but check your API documentation for proper pagination handling
- Rate limiting errors: Reduce
max_parallel_requests
to stay within API rate limits - Authentication failures on subsequent pages: Ensure tokens/credentials remain valid for the entire pagination process and aren’t request-specific
Testing an Endpoint
You can test your endpoint with an HTTP request using curl
, a command-line tool for transferring data with URLs. The command contents depend on how your endpoint is configured, such as the HTTP method, headers, query parameters, and whether authentication is required.
Syntax:
curl -X <HTTP_METHOD> "<ENDPOINT_URL>?<QUERY_PARAMETERS>" -H "<HEADER>: <HEADER_VALUE>"
- HTTP_METHOD: This could be GET or POST, depending on the method supported by the endpoint.
- ENDPOINT_URL: The URL of the endpoint to which the request is being sent.
- QUERY_PARAMETERS: Optional, used if the endpoint requires query parameters in the URL.
- HEADER_VALUE: Specified using -H flag, which adds custom header fields to the request. Headers are often needed for specifying content types or for authentication.
If using POST
, include -d
to specify data to send in the request body:
-d '{"key1":"value1", "key2":"value2"}'
Example Command:
curl -X GET "https://api.your-company.com/data?tag=value" -H "Accept: application/json" -H "Authorization: Bearer XYZ123"