Edge Delta S3 Source
10 minute read
Overview
The S3 source node allows the Edge Delta agent to read data from an S3 bucket. This node is essential for ingesting log data stored in S3 and processing it within the Edge Delta ecosystem.
AI Team: Configure this source using the S3 connector for streamlined setup in AI Team.
- outgoing_data_types: log
Configure S3
See Prepare for an S3 Source for information on setting up your environment.
Example Edge Delta Pipeline configuration
Simple version

nodes:
- name: my_s3_input
type: s3_input
sqs_url: https://sqs.example-queue-123.amazonaws.com
region: us-example-1
Advanced version

nodes:
- name: my_s3_input
type: s3_input
sqs_url: https://sqs.example-queue-123.amazonaws.com
region: us-example-1
aws_key_id: EXAMPLEAWSKEYID1234
aws_sec_key: exampleAwsSecKey9876
role_arn: arn:aws:iam::example-account-123:role/example-role
external_id: example-external-id-5678
Cross-region configuration
When your S3 bucket and SQS queue are in different AWS regions, use the s3_config
and sqs_config
parameters to specify region-specific settings:
nodes:
- name: my_s3_input
type: s3_input
sqs_url: https://sqs.us-west-2.amazonaws.com/123456789/my-queue
region: us-east-1
s3_config:
region: us-west-2
sqs_config:
region: us-east-1
Required Parameters
name
A descriptive name for the node. This is the name that will appear in pipeline builder and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: s3_input
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
sqs_url
The sqs_url
parameter is used for S3 event notifications. This parameter is specified as a string and is required.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
region
The region
parameter specifies the region where the S3 bucket and SQS are located. It is specified as a string and is required.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
Optional Parameters
aws_key_id
The aws_key_id
parameter is the AWS key ID that has all four IAM permissions to target the bucket. It is used with aws_sec_key
. It is specified as a string and is optional.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
aws_key_id: <key>
aws_sec_key: <secure key>
aws_sec_key
The aws_sec_key
parameter is the AWS secret key ID that has all four IAM permissions to target the bucket. It is used with aws_key_id
. It is specified as a string and is optional.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
aws_key_id: <key>
aws_sec_key: <secure key>
compression
The compression
parameter is used to define the compression type for incoming logs. You can specify gzip
, zstd
, snappy
, or uncompressed
. It is specified as a string. It is optional and the default is uncompressed
.
nodes:
- name: s3_input
type: s3_input
region: us-west-2
sqs_url: <REDACTED>
compression: gzip
role_arn
The role_arn
parameter is used if authentication and authorization is performed using an assumed AWS IAM role. It should consist of the account ID and role name. A role_arn
is optional for a data destination depending on the access configuration.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
role_arn: <role ARN>
external_id
The external_id
parameter is a unique identifier to avoid a confused deputy attack. It is specified as a string and is optional. While external_id
is optional, when configured it must be used with role_arn
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
external_id: <ID>
role_arn: <role ARN>
rate_limit
The rate_limit
parameter enables you to control data ingestion based on system resource usage. This advanced setting helps prevent source nodes from overwhelming the agent by automatically throttling or stopping data collection when CPU or memory thresholds are exceeded.
Use rate limiting to prevent runaway log collection from overwhelming the agent in high-volume sources, protect agent stability in resource-constrained environments with limited CPU/memory, automatically throttle during bursty traffic patterns, and ensure fair resource allocation across source nodes in multi-tenant deployments.
When rate limiting triggers, pull-based sources (File, S3, HTTP Pull) stop fetching new data, push-based sources (HTTP, TCP, UDP, OTLP) reject incoming data, and stream-based sources (Kafka, Pub/Sub) pause consumption. Rate limiting operates at the source node level, where each source with rate limiting enabled independently monitors and enforces its own thresholds.
Configuration Steps:
- Click Add New in the Rate Limit section
- Click Add New for Evaluation Policy
- Select Policy Type:
- CPU Usage: Monitors CPU consumption and rate limits when usage exceeds defined thresholds. Use for CPU-intensive sources like file parsing or complex transformations.
- Memory Usage: Monitors memory consumption and rate limits when usage exceeds defined thresholds. Use for memory-intensive sources like large message buffers or caching.
- AND (composite): Combines multiple sub-policies with AND logic. All sub-policies must be true simultaneously to trigger rate limiting. Use when you want conservative rate limiting (both CPU and memory must be high).
- OR (composite): Combines multiple sub-policies with OR logic. Any sub-policy can trigger rate limiting. Use when you want aggressive rate limiting (either CPU or memory being high triggers).
- Select Evaluation Mode. Choose how the policy behaves when thresholds are exceeded:
- Enforce (default): Actively applies rate limiting when thresholds are met. Pull-based sources (File, S3, HTTP Pull) stop fetching new data, push-based sources (HTTP, TCP, UDP, OTLP) reject incoming data, and stream-based sources (Kafka, Pub/Sub) pause consumption. Use in production to protect agent resources.
- Monitor: Logs when rate limiting would occur without actually limiting data flow. Use for testing thresholds before enforcing them in production.
- Passthrough: Disables rate limiting entirely while keeping the configuration in place. Use to temporarily disable rate limiting without removing configuration.
- Set Absolute Limits and Relative Limits (for CPU Usage and Memory Usage policies)
Note: If you specify both absolute and relative limits, the system evaluates both conditions and rate limiting triggers when either condition is met (OR logic). For example, if you set absolute limit to
1.0
CPU cores and relative limit to50%
, rate limiting triggers when the source uses either 1 full core OR 50% of available CPU, whichever happens first.
For CPU Absolute Limits: Enter value in full core units:
0.1
= one-tenth of a CPU core0.5
= half a CPU core1.0
= one full CPU core2.0
= two full CPU cores
For CPU Relative Limits: Enter percentage of total available CPU (0-100):
50
= 50% of available CPU75
= 75% of available CPU85
= 85% of available CPU
For Memory Absolute Limits: Enter value in bytes
104857600
= 100Mi (100 × 1024 × 1024)536870912
= 512Mi (512 × 1024 × 1024)1073741824
= 1Gi (1 × 1024 × 1024 × 1024)
For Memory Relative Limits: Enter percentage of total available memory (0-100)
60
= 60% of available memory75
= 75% of available memory80
= 80% of available memory
- Set Refresh Interval (for CPU Usage and Memory Usage policies). Specify how frequently the system checks resource usage:
- Recommended Values:
10s
to30s
for most use cases5s
to10s
for high-volume sources requiring quick response1m
or higher for stable, low-volume sources
The system fetches current CPU/memory usage at the specified refresh interval and uses that value for evaluation until the next refresh. Shorter intervals provide more responsive rate limiting but incur slightly higher overhead, while longer intervals are more efficient but slower to react to sudden resource spikes.
The GUI generates YAML as follows:
# Simple CPU-based rate limiting
nodes:
- name: <node name>
type: <node type>
rate_limit:
evaluation_policy:
policy_type: cpu_usage
evaluation_mode: enforce
absolute_limit: 0.5 # Limit to half a CPU core
refresh_interval: 10s
# Simple memory-based rate limiting
nodes:
- name: <node name>
type: <node type>
rate_limit:
evaluation_policy:
policy_type: memory_usage
evaluation_mode: enforce
absolute_limit: 536870912 # 512Mi in bytes
refresh_interval: 30s
Composite Policies (AND / OR)
When using AND or OR policy types, you define sub-policies instead of limits. Sub-policies must be siblings (at the same level)—do not nest sub-policies within other sub-policies. Each sub-policy is independently evaluated, and the parent policy’s evaluation mode applies to the composite result.
- AND Logic: All sub-policies must evaluate to true at the same time to trigger rate limiting. Use when you want conservative rate limiting (limit only when CPU AND memory are both high).
- OR Logic: Any sub-policy evaluating to true triggers rate limiting. Use when you want aggressive protection (limit when either CPU OR memory is high).
Configuration Steps:
- Select AND (composite) or OR (composite) as the Policy Type
- Choose the Evaluation Mode (typically Enforce)
- Click Add New under Sub-Policies to add the first condition
- Configure the first sub-policy by selecting policy type (CPU Usage or Memory Usage), selecting evaluation mode, setting absolute and/or relative limits, and setting refresh interval
- In the parent policy (not within the child), click Add New again to add a sibling sub-policy
- Configure additional sub-policies following the same pattern
The GUI generates YAML as follows:
# AND composite policy - both CPU AND memory must exceed limits
nodes:
- name: <node name>
type: <node type>
rate_limit:
evaluation_policy:
policy_type: and
evaluation_mode: enforce
sub_policies:
# First sub-policy (sibling)
- policy_type: cpu_usage
evaluation_mode: enforce
absolute_limit: 0.75 # Limit to 75% of one core
refresh_interval: 15s
# Second sub-policy (sibling)
- policy_type: memory_usage
evaluation_mode: enforce
absolute_limit: 1073741824 # 1Gi in bytes
refresh_interval: 15s
# OR composite policy - either CPU OR memory can trigger
nodes:
- name: <node name>
type: <node type>
rate_limit:
evaluation_policy:
policy_type: or
evaluation_mode: enforce
sub_policies:
- policy_type: cpu_usage
evaluation_mode: enforce
relative_limit: 85 # 85% of available CPU
refresh_interval: 20s
- policy_type: memory_usage
evaluation_mode: enforce
relative_limit: 80 # 80% of available memory
refresh_interval: 20s
# Monitor mode for testing thresholds
nodes:
- name: <node name>
type: <node type>
rate_limit:
evaluation_policy:
policy_type: memory_usage
evaluation_mode: monitor # Only logs, doesn't limit
relative_limit: 70 # Test at 70% before enforcing
refresh_interval: 30s
s3_config
The s3_config
parameter allows you to specify AWS configuration specific to the S3 service. When provided, these settings override the base-level region
, aws_key_id
, aws_sec_key
, role_arn
, and external_id
parameters for S3 operations only. This is useful for cross-region deployments where your S3 bucket is in a different region than your SQS queue, or when S3 requires different authentication credentials. It is specified as a nested configuration block and is optional.
The s3_config
block supports the following fields:
region
- AWS region for S3 accessaws_key_id
- AWS access key ID for S3 (optional if using role-based authentication)aws_sec_key
- AWS secret access key for S3 (optional if using role-based authentication)role_arn
- IAM role ARN for S3 access (alternative to access keys)external_id
- External ID for role assumption (required when role_arn is specified)
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <base aws region>
s3_config:
region: <s3 specific region>
aws_key_id: <s3 key>
aws_sec_key: <s3 secret>
sqs_config
The sqs_config
parameter allows you to specify AWS configuration specific to the SQS service. When provided, these settings override the base-level region
, aws_key_id
, aws_sec_key
, role_arn
, and external_id
parameters for SQS operations only. This is useful for cross-region deployments where your SQS queue is in a different region than your S3 bucket, or when SQS requires different authentication credentials. It is specified as a nested configuration block and is optional.
The sqs_config
block supports the following fields:
region
- AWS region for SQS accessaws_key_id
- AWS access key ID for SQS (optional if using role-based authentication)aws_sec_key
- AWS secret access key for SQS (optional if using role-based authentication)role_arn
- IAM role ARN for SQS access (alternative to access keys)external_id
- External ID for role assumption (required when role_arn is specified)
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <base aws region>
sqs_config:
region: <sqs specific region>
aws_key_id: <sqs key>
aws_sec_key: <sqs secret>
source_metadata
This option is used to define which detected resources and attributes to add to each data item as it is ingested by Edge Delta. You can select:
- Required Only: This option includes the minimum required resources and attributes for Edge Delta to operate.
- Default: This option includes the required resources and attributes plus those selected by Edge Delta
- High: This option includes the required resources and attributes along with a larger selection of common optional fields.
- Custom: With this option selected, you can choose which attributes and resources to include. The required fields are selected by default and can’t be unchecked.
Based on your selection in the GUI, the source_metadata
YAML is populated as two dictionaries (resource_attributes
and attributes
) with Boolean values.
See Choose Data Item Metadata for more information on selecting metadata.
For advance authentication options, please check AWS IAM Role Authentication.