Edge Delta OpenSearch Output

Stream data to OpenSearch.

9 minute read

Version 2 is no longer supported. This section is for historical reference only.

OpenSearch is an open source toolset for search, analytics, and observability applications. Amazon’s OpenSearch Service is a managed service managing OpenSearch clusters in the AWS Cloud. You can configure OpenSearch to use it as a streaming destination for Edge Delta.

Step 1: Create a Lifecycle Policy

Index lifecycle policies manage indices based on your performance, resiliency, and retention requirements. Edge Delta provides a simple lifecycle policy, which creates a new index every day and maintains data from the last 15 days.

The following index lifecycle policy has pre-populated settings but you can change fields such as the retention period:

{
    "policy": {
        "description": "A simple default policy that rollover the index and delete after 15 days.",
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "rollover": {
                            "min_size": "5gb",
                            "min_index_age": "1d"
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "15d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "delete": {}
                    }
                ],
                "transitions": []
            }
        ],
        "ism_template": [{
      "index_patterns": ["ed-agent-log-*"],
      "priority": 100
    }]
    }
}

Step 2: Create an Index Template

An index template is useful to configure Elastic indices before the indices are created.

While the Edge Delta agent can be configured to stream various types of observations to the Elasticsearch destination, we recommend that you create the target index with the recommend index template.

Create an index template such as the following example with field mappings to the lifecycle policy you created in step 1:

{
	"index_patterns": [
		"ed-agent-log-*"
	],
	"template": {
		"settings": {
			"index": {
				"number_of_shards": "1",
				"number_of_replicas": "1"
			}
		},
		"mappings": {
			"properties": {
				"msg": {
					"type": "text"
				},
				"alert_def_id": {
					"type": "keyword"
				},
				"k8s_namespace": {
					"type": "keyword"
				},
				"merge_level": {
					"type": "keyword"
				},
				"ecs_task_family": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": true,
					"store": false,
					"type": "keyword",
					"index_options": "docs",
					"split_queries_on_whitespace": false,
					"doc_values": true
				},
				"k8s_controller_kind": {
					"type": "keyword"
				},
				"k8s_container_image": {
					"type": "keyword"
				},
				"title": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": true,
					"store": false,
					"type": "keyword",
					"index_options": "docs",
					"split_queries_on_whitespace": false,
					"doc_values": false
				},
				"type": {
					"type": "keyword"
				},
				"src_name": {
					"type": "keyword"
				},
				"k8s_container_name": {
					"type": "keyword"
				},
				"score": {
					"type": "double"
				},
				"sub_type": {
					"type": "keyword"
				},
				"host": {
					"type": "keyword"
				},
				"capture_flush_mode": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": false,
					"store": false,
					"type": "keyword",
					"split_queries_on_whitespace": false,
					"doc_values": false
				},
				"tag": {
					"type": "keyword"
				},
				"k8s_controller_logical_name": {
					"type": "keyword"
				},
				"timestamp_end": {
					"type": "date"
				},
				"value": {
					"type": "double"
				},
				"timestamp": {
					"index": true,
					"ignore_malformed": false,
					"store": false,
					"type": "date",
					"doc_values": true
				},
				"app": {
					"type": "keyword"
				},
				"capture_size": {
					"coerce": true,
					"index": false,
					"ignore_malformed": false,
					"store": false,
					"type": "long",
					"doc_values": false
				},
				"ecs_task_version": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": true,
					"store": false,
					"type": "keyword",
					"split_queries_on_whitespace": false,
					"index_options": "docs",
					"doc_values": true
				},
				"stat_type": {
					"type": "keyword"
				},
				"docker_container_name": {
					"type": "keyword"
				},
				"conf_id": {
					"type": "keyword"
				},
				"edac_id": {
					"type": "keyword"
				},
				"ip": {
					"type": "ip"
				},
				"k8s_pod_name": {
					"type": "keyword"
				},
				"logical_source": {
					"type": "keyword"
				},
				"environment": {
					"type": "keyword"
				},
				"event_id": {
					"type": "keyword"
				},
				"capture_duration": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": false,
					"store": false,
					"type": "keyword",
					"split_queries_on_whitespace": false,
					"doc_values": false
				},
				"ecs_container": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": true,
					"store": false,
					"type": "keyword",
					"index_options": "docs",
					"split_queries_on_whitespace": false,
					"doc_values": true
				},
				"capture_bytesize": {
					"coerce": true,
					"index": false,
					"ignore_malformed": false,
					"store": false,
					"type": "long",
					"doc_values": false
				},
				"group_id": {
					"type": "keyword"
				},
				"org_id": {
					"type": "keyword"
				},
				"name": {
					"type": "keyword"
				},
				"alert_def_name": {
					"type": "keyword"
				},
				"ecs_cluster": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": true,
					"store": false,
					"type": "keyword",
					"index_options": "docs",
					"split_queries_on_whitespace": false,
					"doc_values": true
				},
				"threshold_description": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": false,
					"store": false,
					"type": "keyword",
					"split_queries_on_whitespace": false,
					"doc_values": false
				},
				"threshold_type": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": false,
					"store": false,
					"type": "keyword",
					"split_queries_on_whitespace": false,
					"doc_values": false
				},
				"src_type": {
					"type": "keyword"
				},
				"region": {
					"type": "keyword"
				},
				"properties": {
					"eager_global_ordinals": false,
					"norms": false,
					"index": false,
					"store": false,
					"type": "keyword",
					"split_queries_on_whitespace": false,
					"doc_values": false
				},
				"docker_image": {
					"type": "keyword"
				}
			}
		},
		"aliases": {
			"ed-agent-log": {}
		}
	},
	"composed_of": []
}

Step 3: Create the First Index

To generate a daily index, you must create the first index. This first index will inherit field mappings and policies from the template.

Step 4: Create a Security API Configuration (Optional)

The OpenSearch security API can be used to manage access control to your search resources. You can use it to define role permissions for Edge Delta by providing cluster permissions and index permissions to the ed_agent role. The following example is a configuration that you can customize:

{
  "ed-agent" : {
    "reserved" : false,
    "hidden" : false,
    "cluster_permissions" : [
      "indices:data/write/bulk",
      "indices:data/write/bulk*"
    ],
    "index_permissions" : [
      {
        "index_patterns" : [
          "ed-agent-log"
        ],
        "dls" : "",
        "fls" : [ ],
        "masked_fields" : [ ],
        "allowed_actions" : [
          "indices:data/write/bulk",
          "indices:data/write/bulk*",
          "indices:data/write/index"
        ]
      }
    ],
    "tenant_permissions" : [ ],
    "static" : false
  }
}

Step 5: Configure the Edge Delta Agent

Finally, you deploy an agent and configure a streaming output that points to the new index. The following example illustrates three OpenSearch data destination configurations:

outputs:
  streams:  
    - name: elastic-opensearch
      type: elastic
      index: "index name"
      region: "us-west-2"
      address:
        - opensearch_domain_endpoint 
    - name: elastic-opensearch-with-rolearn
      type: elastic
      index: "index name" 
      region: "us-west-2" 
	  worker_count: 3
      role_arn: "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>" 
      external_id: "external_id" 
      address:
        - opensearch_domain_endpoint
      custom_tags:
        "app": "test"  
        "region": "us-west-2"
        "File Path": "{{.FileGlobPath}}"
        "K8s PodName": "{{.K8sPodName}}"
        "K8s Namespace": "{{.K8sNamespace}}"
        "K8s ControllerKind": "{{.K8sControllerKind}}"
        "K8s ContainerName": "{{.K8sContainerName}}"
        "K8s ContainerImage": "{{.K8sContainerImage}}"
        "K8s ControllerLogicalName": "{{.K8sControllerLogicalName}}"
        "ECSCluster": "{{.ECSCluster}}"
        "ECSContainerName": "{{.ECSContainerName}}"
        "ECSTaskVersion": "{{.ECSTaskVersion}}"
        "ECSTaskFamily": "{{.ECSTaskFamily}}"
        "DockerContainerName": "{{.DockerContainerName}}"
        "ConfigID": "{{.ConfigID}}"
        "Host": "{{.Host}}"
        "Source": "{{.Source}}"
        "SourceType": "{{.SourceType}}"
        "Tag": "{{.Tag}}"
        "logical_source": '{{ index .CustomTags "logicalSource" }}' 
        "url": '{{ index .ObservationTags "url" }}' 
        "cluster": '{{ index .ObservationTags "cluster" }}' 
        "level": '{{ index .ObservationTags "level" }}' 
    - name: elastic-send-as-is-with-options
      type: elastic
      index: "index name"
      region: "us-west-2"
      role_arn: "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>"
      external_id: "external_id"
      address:
        - opensearch_domain_endpoint
      features: log
      send_as_is: true
      send_as_is_options: 
        nest_under: msg
        include_ed_metadata: true
        on_failure_options:
          sub_field_name: "nested_field"

Required Parameters

name

The name parameter specifies a name for the data destination. You refer to this name in other places, for example to refer to a specific destination in a workflow. Names must be unique within the outputs section. It is a yaml list element so it begins with a - and a space followed by the string. A name is required for a data destinations.

outputs:
  streams:
    - name: <data destination name>

type: elastic

The type parameter specifies a vendor or technology for the streaming data destination. It is a closed list element that requires one of the options. See the supported types here{target="_blank"}. A type is required for a streaming data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <destination type>

For OpenSearch you use the elastic type.

index

The index parameter specifies which index to send data to in Elastic or OpenSearch. It is written as a string. An index is required for an Elastic or OpenSearch data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: "<index name>"

region

The region parameter specifies the region where the cluster is hosted. It is specified as a string. A region is required for a managed OpenSearch data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: "<index name>"
      region: "<region name>"

address

The address parameter specifies the endpoint for an Elastic data destination: either an Elastic node or an OpenSearch domain endpoint. An address is a required parameter for an elastic type data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: "<index name>"
      address: 
        - <opensearch domain endpoint | elastic node>

Optional Parameters

custom_tags

The custom_tags parameter specifies custom tags to add to the output. They are written as key: value pairs with the key being the custom tag name and the value being the source of data for the tag. The source can be explicitly defined such as "region": "us-west-2" or it can be a variable such as "File Path": "{{.FileGlobPath}}". Custom tags are optional for a data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      custom_tags: 
        "<custom tag>": "<custom tag data source>"

external_id

The external_id parameter specifies a unique identifier for authentication to avoid confused deputy attacks. It is written as a string. An external_id is optional for a data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      external_id: "<external ID>"

features

The features parameter specifies which types of data collected or generated by the agent to send to the output. It is written as a comma separated list. All streaming destinations support a features field but not all of them support the full list of datasets. For example, some destinations only support metrics. The features you can include are listed here. A feature is optional for a data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      features: <feature 1>, <feature 2>

password

The password parameter is used with the user parameter to authenticate and authorize access to the streaming destination, depending on how access has been configured. It should refer to a secret environment variable. A password is optional for a data destination, depending on the access configuration.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: <index name>
      user: <user name>
      password: '{{ Env "ELASTIC_PWD" }}'

role_arn

The role_arn parameter is used if authentication and authorization is performed using an assumed AWS IAM role. It should consist of the account ID and role name. A role_arn is optional for a data destination depending on the access configuration.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: <index name>
      role_arn: "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>"

send_as_is

The send_as_is parameter is used to configure child options for JSON message formats. It is a Boolean parameter. If send_as_is is enabled, you can use send_as_is_options child parameters:

nest_under can be used to nest all the content of JSON log under a custom field. For example, if set to msg, a log message like this: {pid: 1223, pname: os_stat_check} would be sent to elastic like this: {tag: "prod", src_type: "File",..., msg.pid: 1223, msg.pname: os_stat_check}. Top level fields are ED metadata fields and msg.* contains the log JSON
include_ed_metadata is a Boolean used to send all ed metadata fields at top level fields in JSON, the default is false.
on_failure_options is used to handle incoming raw data not in JSON format. It specifies a sub_field_name under which a JSON object is created and the raw message is populated

A send_as_is parameter is optional for a data destination.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: <index name>
      send_as_is: true
      send_as_is_options:
        nest_under: msg
        include_ed_metadata: true
        on_failure_options:
          sub_field_name: "<nested field name>"

user

The user parameter is used with the password parameter to authenticate and authorize access to the streaming destination, depending on how access has been configured. It is written as a string. A user is optional for a data destination depending on the access configuration.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: <index name>
      user: <user name>
      password: '{{ Env "ELASTIC_PWD" }}'

worker_count

The worker_count parameter is used to specify the number of worker nodes to use for processing traffic. It is written as an integer. A worker_count is optional and the default is 2.

outputs:
  streams:
    - name: <data destination name>
      type: <data destination type>
      index: <index name>
      worker_count: 3

Troubleshooting OpenSearch

See Troubleshooting Elastic.