Edge Delta S3 Source

Read data from an S3 source.

Overview

The S3 source node allows the Edge Delta agent to read data from an S3 bucket. This node is essential for ingesting log data stored in S3 and processing it within the Edge Delta ecosystem.

  • outgoing_data_types: log

Configure SQS

Set up an Amazon Simple Queue Service (SQS) queue to facilitate communication between Amazon S3 and the Edge Delta agent:

Create an Amazon SQS Standard Queue

  1. Open the Amazon SQS console.
  2. Create a queue and use the default Standard queue type.
  3. Provide a name for your queue.
  4. Optionally, configure additional parameters such as visibility timeout, message retention period, delivery delay, and maximum message size according to your requirements. Default values are provided by the console.

Define an Access Policy:

Configure who can send and receive messages from the queue. Add the following policy to allow Amazon S3 to send event notifications to the SQS queue:

{
    "Sid": "s3_send_statement",
    "Effect": "Allow",
    "Principal": {
        "Service": "s3.amazonaws.com"
    },
    "Action": [
        "SQS:SendMessage"
    ],
    "Resource": "arn:aws:sqs:AWS_REGION:AWS_ACCOUNT_ID:SQS_NAME",
    "Condition": {
        "ArnLike": {
            "aws:SourceArn": "arn:aws:s3:*:*:S3_BUCKET_NAME"
        },
        "StringEquals": {
            "aws:SourceAccount": "AWS_ACCOUNT_ID"
        }
    }
}

Create S3 Event Notification

Configure your S3 bucket to send event notifications to the SQS queue using the S3 bucket’s event notification feature. This involves specifying the SQS queue’s ARN and selecting the events (like object creation or deletion) that will trigger notifications.

Configure IAM for the Edge Delta Agent

You must configure the necessary AWS resources to read logs from an AWS S3 bucket:

  1. Create an IAM user or role to access the AWS S3 bucket. To learn how to create an IAM user, review this document from AWS.
  2. Attach the appropriate policies to the newly created IAM user or role. The policy should grant the necessary permissions for reading from S3 and receiving notifications via SQS.

The custom policy lists 4 permissions:

  • DeleteMessage (for SQS)
  • DeleteMessageBatch(for SQS)
  • ReceiveMessage (for SQS)
  • GetObject (for S3)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account-number>:role/<role-name>"
            },
            "Action": [
                "sqs:DeleteMessage",
                "sqs:DeleteMessageBatch",
                "sqs:ReceiveMessage",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name/*",
                "arn:aws:sqs:<region>:<account-number>:<sqs-queue-name>"
            ]
        }
    ]
}

Example Edge Delta Pipeline configuration

Simple version

nodes:
- name: my_s3_input
  type: s3_input
  sqs_url: https://sqs.example-queue-123.amazonaws.com
  region: us-example-1

Advanced version

nodes:
- name: my_s3_input
  type: s3_input
  sqs_url: https://sqs.example-queue-123.amazonaws.com
  region: us-example-1
  aws_key_id: EXAMPLEAWSKEYID1234
  aws_sec_key: exampleAwsSecKey9876
  role_arn: arn:aws:iam::example-account-123:role/example-role
  external_id: example-external-id-5678

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: s3_input

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

sqs_url

The sqs_url parameter is used for S3 event notifications. This parameter is specified as a string and is required.

nodes:
- name: <node name>
  type: s3_input
  sqs_url: <sqs to subscribe>
  region: <aws region>

region

The region parameter specifies the region where the S3 bucket and SQS are located. It is specified as a string and is required.

nodes:
- name: <node name>
  type: s3_input
  sqs_url: <sqs to subscribe>
  region: <aws region>

Optional Parameters

aws_key_id

The aws_key_id parameter is the AWS key ID that has all four IAM permissions to target the bucket. It is used with aws_sec_key. It is specified as a string and is optional.

nodes:
- name: <node name>
  type: s3_input
  sqs_url: <sqs to subscribe>
  region: <aws region>
  aws_key_id: <key>
  aws_sec_key: <secure key>

aws_sec_key

The aws_sec_key parameter is the AWS secret key ID that has all four IAM permissions to target the bucket. It is used with aws_key_id. It is specified as a string and is optional.

nodes:
- name: <node name>
  type: s3_input
  sqs_url: <sqs to subscribe>
  region: <aws region>
  aws_key_id: <key>
  aws_sec_key: <secure key>

compression

The compression parameter is used to define the compression type for incoming logs. You can specify gzip, zstd, snappy, or uncompressed. It is specified as a string. It is optional and the default is uncompressed.

nodes:
- name: s3_input
  type: s3_input
  region: us-west-2
  sqs_url: <REDACTED>
  compression: gzip

role_arn

The role_arn parameter is used if authentication and authorization is performed using an assumed AWS IAM role. It should consist of the account ID and role name. A role_arn is optional for a data destination depending on the access configuration.

nodes:
- name: <node name>
  type: s3_input
  sqs_url: <sqs to subscribe>
  region: <aws region>
  role_arn: <role ARN>

external_id

The external_id parameter is a unique identifier to avoid a confused deputy attack. It is specified as a string and is optional. While external_id is optional, when configured it must be used with role_arn

nodes:
- name: <node name>
  type: s3_input
  sqs_url: <sqs to subscribe>
  region: <aws region>
  external_id: <ID>
  role_arn: <role ARN>

For advance authentication options, please check AWS IAM Role Authentication.