Edge Delta S3 Output

Send items to an S3 destination.

Overview

The S3 Output sends items to an S3 destination. These items are raw archive bytes that are buffered with the archive buffer processor.

  • incoming_data_types: log

Configure S3

Before you configure your Edge Delta account to sends logs to an AWS S3 endpoint, you must configure S3:

  1. Create an IAM user to access the AWS S3 bucket. To learn how to create an IAM user, review this document from AWS.
  2. Attach the following custom policy to the newly created IAM user. To learn how to create and add a custom policy, review this document from AWS.

The custom policy lists 3 permissions:

  • PutObject
  • GetObject
  • ListBucket

If you want to create an S3 archive for rehydration purposes only, then at a minimum, your custom policy must include GetObject. All other permissions are only required for archiving purposes. As a result, if you prefer, you can create 2 different S3 archive integrations with different custom policies.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account-number>:role/<role-name>"
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name",
                "arn:aws:s3:::bucket-name/*"
            ]
        }
    ]
}

Example Edge Delta Pipeline configuration

nodes:
  - name: my_s3
    type: s3_output
    bucket: <REDACTED>
    region: <REDACTED>
    aws_key_id: <REDACTED>
    aws_sec_key: <REDACTED>
    compression: zstd
    encoding: parquet
    use_native_compression: true
    path_prefix:
      order:
      - Year
      - Month
      - Day
      - Hour
      - 2 Minute
      - tag
      - host
      format: ver=parquet/year=%s/month=%s/day=%s/hour=%s/min=%s/tag=%s/host=%s/

Required Parameters

name

A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the yaml using the name. It must be unique across all nodes. It is a yaml list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: s3_output

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

bucket

The bucket parameter defines the target bucket to use. It is specified as a string and is required.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>

region

The region parameter specifies the region where the cluster is found. It is specified as a string and is required.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>

Optional parameters

aws_key_id

The aws_key_id parameter is the AWS key ID that has PutObject permission to target the bucket. It is used with aws_sec_key. It is specified as a string and is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    aws_key_id: <key>
    aws_sec_key: <secure key>

aws_sec_key

The aws_sec_key parameter is the AWS secret key ID that has PutObject permission to target the bucket. It is used with aws_key_id. It is specified as a string and is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    aws_key_id: <key>
    aws_sec_key: <secure key>

buffer_max_bytesize

The buffer_max_bytesize parameter configures the maximum byte size for total unsuccessful items. If the limit is reached, the remaining items are discarded until the buffer space becomes available. It is specified as a datasize.Size, has a default of 0 indicating no size limit, and it is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    buffer_max_bytesize: 2048

buffer_path

The buffer_path parameter configures the path to store unsuccessful items. Unsuccessful items are stored there to be retried back (exactly once delivery). It is specified as a string and it is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    buffer_path: <path to unsuccessful items folder>

buffer_ttl

The buffer_ttl parameter configures the time-to-Live for unsuccessful items, which indicates when to discard them. It is specified as a duration, has a default of 10m, and it is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    buffer_ttl: 20m

compression

The compression parameter specifies the compression format. It can be gzip, zstd, snappy or uncompressed. It is specified as a string, has a default of gzip, and it is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    compression: gzip | zstd | snappy | uncompressed

disable_compaction

This parameter configures whether to disable compaction by the compactor agent for data from this node before it is sent to the data destination. It is specified as a boolean, the default is false and it is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    disable_compaction: true

endpoint

The endpoint parameter specifies a customized s3 server endpoint. It is specified as a string and is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    endpoint: <server endpoint>

encoding

The encoding parameter specifies the encoding format. It can be json or parquet. It is specified as a string, has a default of json, and it is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    encoding: json | parquet

external_id

The external_id parameter is a unique identifier to avoid a confused deputy attack. It is specified as a string and is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    external_id: <ID> 

features

The features parameter defines which data types the agent sends to a streaming destination. You can specify one or more of the following, some of which are enabled by default if specific features are not set:

  • metric (default)
  • edac (default)
  • cluster (default)
  • cluster_pattern
  • cluster_sample
  • log (default)
  • topk (default)
nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    features: metric

flush_interval

The flush_interval parameter specifies the duration to flush (or force) data to the destination, including buffered data. It is specified as a duration and is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    flush_interval: 10m

max_byte_limit

The max_byte_limit parameter specifies the maximum bytes before flushing buffered raw data to archive destination. It is specified with a data size and is optional. If not specified for this node the setting in the agent settings is used.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    max_byte_limit: 32MB

path_prefix

The path_prefix parameter configures the path prefix using order and format child parameters. It is optional.

The order child parameter lists the formatting items that will define the path prefix:

  • You can refer to Year, Month, Day, <any number that can divide 60> Minute, Hour, tag, host, OtherTags.<item related tags> and LogFields.<log related tags>.
  • For ECS, ecs_cluster, ecs_container_name, ecs_task_family and ecs_task_version are available.
  • For K8s, k8s_namespace, k8s_controller_kind, k8s_controller_logical_name, k8s_pod_name, k8s_container_name and k8s_container_image are available.
  • For Docker, docker_container_name and docker_image_name are available

The format child parameter specifies a format string that has %s as placeholders per each order item.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    path_prefix:
      order:
      - Year
      - Month
      - Day
      - Hour
      - 2 Minute
      - tag
      - host
    format: ver=parquet/year=%s/month=%s/day=%s/hour=%s/min=%s/tag=%s/host=%s/

role_arn

The role_arn parameter is used if authentication and authorization is performed using an assumed AWS IAM role. It should consist of the account ID and role name. A role_arn is optional for a data destination depending on the access configuration.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    role_arn: <role ARN>

use_native_compression

The use_native_compression parameter configures whether, for parquet encoding, to only compress data segments for each archive file, not the whole file. It is specified as a Boolean, has a default of false, and it is optional.

nodes:
  - name: <node name>
    type: s3_output
    bucket: <bucket to target>
    region: <s3 region>
    use_native_compression: true