Edge Delta Amazon S3 Output

Archive data in Amazon S3.

See the latest version here.

Overview

This output type sends logs to an AWS S3 endpoint.

Create an IAM User and Attach a Custom Policy

Before you configure your Edge Delta account to sends logs to an AWS S3 endpoint, you must first access the AWS console to:

  1. Create an IAM user to access the AWS S3 bucket. To learn how to create an IAM user, review this document from AWS.
  2. Attach the custom policy to the newly created IAM user. To learn how to create and add a custom policy, review this document from AWS.

The custom policy lists 3 permissions:

  • PutObject
  • GetObject
  • ListBucket

If you want to create an S3 archive for rehydration purposes only, then at a minimum, your custom policy must include GetObject.

All other permissions are only required for archiving purposes. As a result, if you prefer, you can create 2 different S3 archive integrations with different custom policies.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account-number>:role/<role-name>"
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name",
                "arn:aws:s3:::bucket-name/*"
            ]
        }
    ]
}

Example

outputs:
    archives:
        - name: my-s3
        type: s3
        aws_key_id: '{{ Env "AWS_KEY_ID" }}'
        aws_sec_key: '{{ Env "AWS_SECRET_KEY" }}'
        bucket: testbucket
        region: us-east-2
        - name: my-s3-assumes-role
        type: s3
        role_arn: "arn:aws:iam::1234567890:role/ed-s3-archiver-role"
        external_id: "053cf606-8e80-47bf-b849-8cd1cc826cfc"
        bucket: testbucket
        region: us-east-2
        - name: my-s3-archiver
        type: s3
        aws_key_id: '{{ Env "AWS_KEY_ID" }}'
        aws_sec_key: '{{ Env "AWS_SECRET_KEY" }}'
        bucket: testbucket
        region: us-east-2 
        disable_metadata_ingestion: true
        path_prefix:
            order:
            - Year
            - Month
            - Day
            - Hour
            - 5 Minute
            - OtherTags.role
            format: year=%s/month=%s/day=%s/hour=%s/minute=%s/role=%s/

Parameters

name

Required

Enter a descriptive name for the output or integration.

For outputs, this name will be used to map this destination to a workflow.

name: s3

integration_name

Optional

This parameter refers to the organization-level integration created in the Integrations page.

If you need to add multiple instances of the same integration into the config, then you can add a custom name to each instance via the name parameter. In this situation, the name should be used to refer to the specific instance of the destination in the workflows.

integration_name: orgs-aws-s3

type: s3

Required

Enter s3.

type: s3

bucket

Required

Enter the target S3 bucket to send the archived logs.

bucket: "testbucket"

region

Required

Enter the specified S3 bucket’s region.

region: "us-east-2"

path_prefix

The path_prefix parameter is used to override the default path structure of <Year>/<Month>/<Day>/<Hour>/<Tag>/. The following tags can be used:

  • “Year”
  • “Month”
  • “Day”
  • “<any number that can divide 60> Minute”
  • “Hour”
  • “Tag”
  • “Host”
  • “OtherTags.
  • “LogTags.

Amazon Elastic Container Service:

  • “ECSCluster”
  • “ECSContainerName”
  • “ECSTaskFamily”
  • “ECSTaskVersion”

Kubernetes:

  • “K8sNamespace”, “K8sControllerKind”, “K8sControllerLogicalName”, “K8sPodName”, “K8sContainerName” and “K8sContainerImage”

Docker:

  • “DockerContainerName”
  • “DockerImageName”

The order child parameter is used to define the path structure.

The format child parameter should have exactly same amount of “%s"s as order count and templating will be done using order.

Curly braces are prohibited and this format is not supported in rehydrations so the source for rehydration cannot be an integration using a custom path_prefix format. The following format should be used for some Big Data applications such BigQuery, AWS Athena etc: format: year=%s/month=%s/day=%s/hour=%s/minute=%s/role=%s/

outputs:
    archives:
        - name: <archive name>
        type: s3
        aws_key_id: '{{ Env "AWS_KEY_ID" }}'
        aws_sec_key: '{{ Env "AWS_SECRET_KEY" }}'
        bucket: <bucket name>
        region: <region>
        disable_metadata_ingestion: true|false
        path_prefix:
            order:
            - Year
            - Month
            - Day
            - Hour
            - 5 Minute
            - OtherTags.role
            format: year=%s/month=%s/day=%s/hour=%s/minute=%s/role=%s/

aws_key_id

Optional

Enter the AWS key ID that has the PutObject permission to target the bucket. If you use role-based AWS authentication where keys are not provided, then you should keep this field empty; however, you must still attach the custom policy.

aws_key_id: '{{ Env "TEST_AWS_KEY_ID" }}'

aws_sec_key

Optional

Enter the AWS secret key ID that has the PutObject permission to target the bucket. If you use role-based AWS authentication where keys are not provided, then you should keep this field empty; however, you must still attach the custom policy.

aws_sec_key: "awssecret123"

role_arn

Optional

Enter the ARN that has permissions to use the desired IAM Role

role_arn: "arn:aws:iam::1234567890:role/ed-s3-archiver-role"

external_id

Optional

Enter the external ID associated with the desired IAM role.

external_id: "053cf606-8e80-47bf-b849-8cd1cc826cfc"

compression

Optional

Enter a compression type for archiving purposes.

You can enter gzip, zstd, snappy, or uncompressed.

compression: gzip

encoding

Optional

Enter an encoding type for archiving purposes.

You can enter json or parquet.

encoding: parquet

use_native_compression

Optional

Enter true or false to compress parquet-encoded data.

This option will not compress metadata.

This option can be useful with big data cloud applications, such as AWS Athena and Google BigQuery.

To use this parameter, you must set the encoding parameter to parquet.

use_native_compression: true

buffer_ttl

Optional

Enter a length of time to retry failed streaming data.

After this length of time is reached, the failed streaming data will no longer be tried.

buffer_ttl: 2h

buffer_path

Optional

Enter a folder path to temporarily store failed streaming data.

The failed streaming data will be retried until the data reaches its destinations or until the Buffer TTL value is reached.

If you enter a path that does not exist, then the agent will create directories, as needed.

buffer_path: /var/log/edgedelta/pushbuffer/

buffer_max_bytesize

Optional

Enter the maximum size of failed streaming data that you want to retry.

If the failed streaming data is larger than this size, then the failed streaming data will not be retried.

buffer_max_bytesize: 100MB

disable_metadata_ingestion

Optional

Enter true or false to disable metadata file ingestion.

Typically, metadata is used for rehydration analysis.

disable_metadata_ingestion: true