Use Edge Delta to Ingest from an S3 Source

Prepare to read data from an S3 source.

Overview

The S3 source node allows the Edge Delta agent to read data from an S3 bucket. This node is essential for ingesting log data stored in S3 and processing it within the Edge Delta ecosystem.

Configure SQS

Set up an Amazon Simple Queue Service (SQS) queue to facilitate communication between Amazon S3 and the Edge Delta agent:

Create an Amazon SQS Standard Queue

  1. Open the Amazon SQS console.
  2. Create a queue and use the default Standard queue type.
  3. Provide a name for your queue.
  4. Optionally, configure additional parameters such as visibility timeout, message retention period, delivery delay, and maximum message size according to your requirements. Default values are provided by the console.

Define an Access Policy:

Configure who can send and receive messages from the queue. Add the following policy to allow Amazon S3 to send event notifications to the SQS queue:

{
    "Sid": "s3_send_statement",
    "Effect": "Allow",
    "Principal": {
        "Service": "s3.amazonaws.com"
    },
    "Action": [
        "SQS:SendMessage"
    ],
    "Resource": "arn:aws:sqs:AWS_REGION:AWS_ACCOUNT_ID:SQS_NAME",
    "Condition": {
        "ArnLike": {
            "aws:SourceArn": "arn:aws:s3:*:*:S3_BUCKET_NAME"
        },
        "StringEquals": {
            "aws:SourceAccount": "AWS_ACCOUNT_ID"
        }
    }
}

Create S3 Event Notification

Configure your S3 bucket to send event notifications to the SQS queue using the S3 bucket’s event notification feature. This involves specifying the SQS queue’s ARN and selecting the events (like object creation or deletion) that will trigger notifications.

Configure IAM for the Edge Delta Agent

You must configure the necessary AWS resources to read logs from an AWS S3 bucket:

  1. Create an IAM user or role to access the AWS S3 bucket. To learn how to create an IAM user, review this document from AWS.
  2. Attach the appropriate policies to the newly created IAM user or role. The policy should grant the necessary permissions for reading from S3 and receiving notifications via SQS.

The custom policy lists 4 permissions:

  • DeleteMessage (for SQS)
  • DeleteMessageBatch(for SQS)
  • ReceiveMessage (for SQS)
  • GetObject (for S3)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account-number>:role/<role-name>"
            },
            "Action": [
                "sqs:DeleteMessage",
                "sqs:DeleteMessageBatch",
                "sqs:ReceiveMessage",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name/*",
                "arn:aws:sqs:<region>:<account-number>:<sqs-queue-name>"
            ]
        }
    ]
}

Next, you configure a pipeline with an S3 source node.