Edge Delta OpenSearch Output
9 minute read
OpenSearch is an open source toolset for search, analytics, and observability applications. Amazon’s OpenSearch Service is a managed service managing OpenSearch clusters in the AWS Cloud. You can configure OpenSearch to use it as a streaming destination for Edge Delta.
Step 1: Create a Lifecycle Policy
Index lifecycle policies manage indices based on your performance, resiliency, and retention requirements. Edge Delta provides a simple lifecycle policy, which creates a new index every day and maintains data from the last 15 days.
The following index lifecycle policy has pre-populated settings but you can change fields such as the retention period:
{
"policy": {
"description": "A simple default policy that rollover the index and delete after 15 days.",
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"rollover": {
"min_size": "5gb",
"min_index_age": "1d"
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "15d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
],
"transitions": []
}
],
"ism_template": [{
"index_patterns": ["ed-agent-log-*"],
"priority": 100
}]
}
}
Step 2: Create an Index Template
An index template is useful to configure Elastic indices before the indices are created.
While the Edge Delta agent can be configured to stream various types of observations to the Elasticsearch destination, we recommend that you create the target index with the recommend index template.
Create an index template such as the following example with field mappings to the lifecycle policy you created in step 1:
{
"index_patterns": [
"ed-agent-log-*"
],
"template": {
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
}
},
"mappings": {
"properties": {
"msg": {
"type": "text"
},
"alert_def_id": {
"type": "keyword"
},
"k8s_namespace": {
"type": "keyword"
},
"merge_level": {
"type": "keyword"
},
"ecs_task_family": {
"eager_global_ordinals": false,
"norms": false,
"index": true,
"store": false,
"type": "keyword",
"index_options": "docs",
"split_queries_on_whitespace": false,
"doc_values": true
},
"k8s_controller_kind": {
"type": "keyword"
},
"k8s_container_image": {
"type": "keyword"
},
"title": {
"eager_global_ordinals": false,
"norms": false,
"index": true,
"store": false,
"type": "keyword",
"index_options": "docs",
"split_queries_on_whitespace": false,
"doc_values": false
},
"type": {
"type": "keyword"
},
"src_name": {
"type": "keyword"
},
"k8s_container_name": {
"type": "keyword"
},
"score": {
"type": "double"
},
"sub_type": {
"type": "keyword"
},
"host": {
"type": "keyword"
},
"capture_flush_mode": {
"eager_global_ordinals": false,
"norms": false,
"index": false,
"store": false,
"type": "keyword",
"split_queries_on_whitespace": false,
"doc_values": false
},
"tag": {
"type": "keyword"
},
"k8s_controller_logical_name": {
"type": "keyword"
},
"timestamp_end": {
"type": "date"
},
"value": {
"type": "double"
},
"timestamp": {
"index": true,
"ignore_malformed": false,
"store": false,
"type": "date",
"doc_values": true
},
"app": {
"type": "keyword"
},
"capture_size": {
"coerce": true,
"index": false,
"ignore_malformed": false,
"store": false,
"type": "long",
"doc_values": false
},
"ecs_task_version": {
"eager_global_ordinals": false,
"norms": false,
"index": true,
"store": false,
"type": "keyword",
"split_queries_on_whitespace": false,
"index_options": "docs",
"doc_values": true
},
"stat_type": {
"type": "keyword"
},
"docker_container_name": {
"type": "keyword"
},
"conf_id": {
"type": "keyword"
},
"edac_id": {
"type": "keyword"
},
"ip": {
"type": "ip"
},
"k8s_pod_name": {
"type": "keyword"
},
"logical_source": {
"type": "keyword"
},
"environment": {
"type": "keyword"
},
"event_id": {
"type": "keyword"
},
"capture_duration": {
"eager_global_ordinals": false,
"norms": false,
"index": false,
"store": false,
"type": "keyword",
"split_queries_on_whitespace": false,
"doc_values": false
},
"ecs_container": {
"eager_global_ordinals": false,
"norms": false,
"index": true,
"store": false,
"type": "keyword",
"index_options": "docs",
"split_queries_on_whitespace": false,
"doc_values": true
},
"capture_bytesize": {
"coerce": true,
"index": false,
"ignore_malformed": false,
"store": false,
"type": "long",
"doc_values": false
},
"group_id": {
"type": "keyword"
},
"org_id": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"alert_def_name": {
"type": "keyword"
},
"ecs_cluster": {
"eager_global_ordinals": false,
"norms": false,
"index": true,
"store": false,
"type": "keyword",
"index_options": "docs",
"split_queries_on_whitespace": false,
"doc_values": true
},
"threshold_description": {
"eager_global_ordinals": false,
"norms": false,
"index": false,
"store": false,
"type": "keyword",
"split_queries_on_whitespace": false,
"doc_values": false
},
"threshold_type": {
"eager_global_ordinals": false,
"norms": false,
"index": false,
"store": false,
"type": "keyword",
"split_queries_on_whitespace": false,
"doc_values": false
},
"src_type": {
"type": "keyword"
},
"region": {
"type": "keyword"
},
"properties": {
"eager_global_ordinals": false,
"norms": false,
"index": false,
"store": false,
"type": "keyword",
"split_queries_on_whitespace": false,
"doc_values": false
},
"docker_image": {
"type": "keyword"
}
}
},
"aliases": {
"ed-agent-log": {}
}
},
"composed_of": []
}
Step 3: Create the First Index
To generate a daily index, you must create the first index. This first index will inherit field mappings and policies from the template.
Step 4: Create a Security API Configuration (Optional)
The OpenSearch security API can be used to manage access control to your search resources. You can use it to define role permissions for Edge Delta by providing cluster permissions and index permissions to the ed_agent role. The following example is a configuration that you can customize:
{
"ed-agent" : {
"reserved" : false,
"hidden" : false,
"cluster_permissions" : [
"indices:data/write/bulk",
"indices:data/write/bulk*"
],
"index_permissions" : [
{
"index_patterns" : [
"ed-agent-log"
],
"dls" : "",
"fls" : [ ],
"masked_fields" : [ ],
"allowed_actions" : [
"indices:data/write/bulk",
"indices:data/write/bulk*",
"indices:data/write/index"
]
}
],
"tenant_permissions" : [ ],
"static" : false
}
}
Step 5: Configure the Edge Delta Agent
Finally, you deploy an agent and configure a streaming output that points to the new index. The following example illustrates three OpenSearch data destination configurations:
outputs:
streams:
- name: elastic-opensearch
type: elastic
index: "index name"
region: "us-west-2"
address:
- opensearch_domain_endpoint
- name: elastic-opensearch-with-rolearn
type: elastic
index: "index name"
region: "us-west-2"
worker_count: 3
role_arn: "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>"
external_id: "external_id"
address:
- opensearch_domain_endpoint
custom_tags:
"app": "test"
"region": "us-west-2"
"File Path": "{{.FileGlobPath}}"
"K8s PodName": "{{.K8sPodName}}"
"K8s Namespace": "{{.K8sNamespace}}"
"K8s ControllerKind": "{{.K8sControllerKind}}"
"K8s ContainerName": "{{.K8sContainerName}}"
"K8s ContainerImage": "{{.K8sContainerImage}}"
"K8s ControllerLogicalName": "{{.K8sControllerLogicalName}}"
"ECSCluster": "{{.ECSCluster}}"
"ECSContainerName": "{{.ECSContainerName}}"
"ECSTaskVersion": "{{.ECSTaskVersion}}"
"ECSTaskFamily": "{{.ECSTaskFamily}}"
"DockerContainerName": "{{.DockerContainerName}}"
"ConfigID": "{{.ConfigID}}"
"Host": "{{.Host}}"
"Source": "{{.Source}}"
"SourceType": "{{.SourceType}}"
"Tag": "{{.Tag}}"
"logical_source": '{{ index .CustomTags "logicalSource" }}'
"url": '{{ index .ObservationTags "url" }}'
"cluster": '{{ index .ObservationTags "cluster" }}'
"level": '{{ index .ObservationTags "level" }}'
- name: elastic-send-as-is-with-options
type: elastic
index: "index name"
region: "us-west-2"
role_arn: "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>"
external_id: "external_id"
address:
- opensearch_domain_endpoint
features: log
send_as_is: true
send_as_is_options:
nest_under: msg
include_ed_metadata: true
on_failure_options:
sub_field_name: "nested_field"
Required Parameters
name
The name
parameter specifies a name for the data destination. You refer to this name in other places, for example to refer to a specific destination in a workflow. Names must be unique within the outputs section. It is a yaml list element so it begins with a - and a space followed by the string. A name
is required for a data destinations.
outputs:
streams:
- name: <data destination name>
type: elastic
The type
parameter specifies a vendor or technology for the streaming data destination. It is a closed list element that requires one of the options. See the supported types here{target="_blank"}. A type
is required for a streaming data destination.
outputs:
streams:
- name: <data destination name>
type: <destination type>
For OpenSearch you use the elastic type.
index
The index
parameter specifies which index to send data to in Elastic or OpenSearch. It is written as a string. An index
is required for an Elastic or OpenSearch data destination.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: "<index name>"
region
The region
parameter specifies the region where the cluster is hosted. It is specified as a string. A region
is required for a managed OpenSearch data destination.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: "<index name>"
region: "<region name>"
address
The address
parameter specifies the endpoint for an Elastic data destination: either an Elastic node or an OpenSearch domain endpoint. An address
is a required parameter for an elastic
type data destination.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: "<index name>"
address:
- <opensearch domain endpoint | elastic node>
Optional Parameters
custom_tags
The custom_tags
parameter specifies custom tags to add to the output. They are written as key: value pairs with the key being the custom tag name and the value being the source of data for the tag. The source can be explicitly defined such as "region": "us-west-2"
or it can be a variable such as "File Path": "{{.FileGlobPath}}"
. Custom tags are optional for a data destination.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
custom_tags:
"<custom tag>": "<custom tag data source>"
external_id
The external_id
parameter specifies a unique identifier for authentication to avoid confused deputy attacks. It is written as a string. An external_id
is optional for a data destination.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
external_id: "<external ID>"
features
The features
parameter specifies which types of data collected or generated by the agent to send to the output. It is written as a comma separated list. All streaming destinations support a features
field but not all of them support the full list of datasets. For example, some destinations only support metrics
. The features you can include are listed here. A feature
is optional for a data destination.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
features: <feature 1>, <feature 2>
password
The password
parameter is used with the user
parameter to authenticate and authorize access to the streaming destination, depending on how access has been configured. It should refer to a secret environment variable. A password
is optional for a data destination, depending on the access configuration.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: <index name>
user: <user name>
password: '{{ Env "ELASTIC_PWD" }}'
role_arn
The role_arn
parameter is used if authentication and authorization is performed using an assumed AWS IAM role. It should consist of the account ID and role name. A role_arn
is optional for a data destination depending on the access configuration.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: <index name>
role_arn: "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>"
send_as_is
The send_as_is
parameter is used to configure child options for JSON message formats. It is a Boolean parameter. If send_as_is
is enabled, you can use send_as_is_options
child parameters:
nest_under
can be used to nest all the content of JSON log under a custom field. For example, if set tomsg
, a log message like this:{pid: 1223, pname: os_stat_check}
would be sent to elastic like this:{tag: "prod", src_type: "File",..., msg.pid: 1223, msg.pname: os_stat_check}
. Top level fields are ED metadata fields andmsg.*
contains the log JSONinclude_ed_metadata
is a Boolean used to send all ed metadata fields at top level fields in JSON, the default isfalse
.on_failure_options
is used to handle incoming raw data not in JSON format. It specifies asub_field_name
under which a JSON object is created and the raw message is populated
A send_as_is
parameter is optional for a data destination.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: <index name>
send_as_is: true
send_as_is_options:
nest_under: msg
include_ed_metadata: true
on_failure_options:
sub_field_name: "<nested field name>"
user
The user
parameter is used with the password
parameter to authenticate and authorize access to the streaming destination, depending on how access has been configured. It is written as a string. A user
is optional for a data destination depending on the access configuration.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: <index name>
user: <user name>
password: '{{ Env "ELASTIC_PWD" }}'
worker_count
The worker_count
parameter is used to specify the number of worker nodes to use for processing traffic. It is written as an integer. A worker_count
is optional and the default is 2.
outputs:
streams:
- name: <data destination name>
type: <data destination type>
index: <index name>
worker_count: 3