Edge Delta S3 Source
4 minute read
Overview
The S3 source node allows the Edge Delta agent to read data from an S3 bucket. This node is essential for ingesting log data stored in S3 and processing it within the Edge Delta ecosystem.
- outgoing_data_types: log
Configure SQS
Set up an Amazon Simple Queue Service (SQS) queue to facilitate communication between Amazon S3 and the Edge Delta agent:
Create an Amazon SQS Standard Queue
- Open the Amazon SQS console.
- Create a queue and use the default Standard queue type.
- Provide a name for your queue.
- Optionally, configure additional parameters such as visibility timeout, message retention period, delivery delay, and maximum message size according to your requirements. Default values are provided by the console.
Define an Access Policy:
Configure who can send and receive messages from the queue. Add the following policy to allow Amazon S3 to send event notifications to the SQS queue:
{
"Sid": "s3_send_statement",
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": [
"SQS:SendMessage"
],
"Resource": "arn:aws:sqs:AWS_REGION:AWS_ACCOUNT_ID:SQS_NAME",
"Condition": {
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:*:*:S3_BUCKET_NAME"
},
"StringEquals": {
"aws:SourceAccount": "AWS_ACCOUNT_ID"
}
}
}
Create S3 Event Notification
Configure your S3 bucket to send event notifications to the SQS queue using the S3 bucket’s event notification feature. This involves specifying the SQS queue’s ARN and selecting the events (like object creation or deletion) that will trigger notifications.
Configure IAM for the Edge Delta Agent
You must configure the necessary AWS resources to read logs from an AWS S3 bucket:
- Create an IAM user or role to access the AWS S3 bucket. To learn how to create an IAM user, review this document from AWS.
- Attach the appropriate policies to the newly created IAM user or role. The policy should grant the necessary permissions for reading from S3 and receiving notifications via SQS.
The custom policy lists 4 permissions:
DeleteMessage
(for SQS)DeleteMessageBatch
(for SQS)ReceiveMessage
(for SQS)GetObject
(for S3)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account-number>:role/<role-name>"
},
"Action": [
"sqs:DeleteMessage",
"sqs:DeleteMessageBatch",
"sqs:ReceiveMessage",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::bucket-name/*",
"arn:aws:sqs:<region>:<account-number>:<sqs-queue-name>"
]
}
]
}
Example Edge Delta Pipeline configuration
Simple version
nodes:
- name: my_s3_input
type: s3_input
sqs_url: https://sqs.example-queue-123.amazonaws.com
region: us-example-1
Advanced version
nodes:
- name: my_s3_input
type: s3_input
sqs_url: https://sqs.example-queue-123.amazonaws.com
region: us-example-1
aws_key_id: EXAMPLEAWSKEYID1234
aws_sec_key: exampleAwsSecKey9876
role_arn: arn:aws:iam::example-account-123:role/example-role
external_id: example-external-id-5678
Required Parameters
name
A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: s3_input
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
sqs_url
The sqs_url
parameter is used for S3 event notifications. This parameter is specified as a string and is required.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
region
The region
parameter specifies the region where the S3 bucket and SQS are located. It is specified as a string and is required.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
Optional Parameters
aws_key_id
The aws_key_id
parameter is the AWS key ID that has all four IAM permissions to target the bucket. It is used with aws_sec_key
. It is specified as a string and is optional.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
aws_key_id: <key>
aws_sec_key: <secure key>
aws_sec_key
The aws_sec_key
parameter is the AWS secret key ID that has all four IAM permissions to target the bucket. It is used with aws_key_id
. It is specified as a string and is optional.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
aws_key_id: <key>
aws_sec_key: <secure key>
compression
The compression
parameter is used to define the compression type for incoming logs. You can specify gzip
, zstd
, snappy
, or uncompressed
. It is specified as a string. It is optional and the default is uncompressed
.
nodes:
- name: s3_input
type: s3_input
region: us-west-2
sqs_url: <REDACTED>
compression: gzip
role_arn
The role_arn
parameter is used if authentication and authorization is performed using an assumed AWS IAM role. It should consist of the account ID and role name. A role_arn
is optional for a data destination depending on the access configuration.
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
role_arn: <role ARN>
external_id
The external_id
parameter is a unique identifier to avoid a confused deputy attack. It is specified as a string and is optional. While external_id
is optional, when configured it must be used with role_arn
nodes:
- name: <node name>
type: s3_input
sqs_url: <sqs to subscribe>
region: <aws region>
external_id: <ID>
role_arn: <role ARN>
For advance authentication options, please check AWS IAM Role Authentication.