Edge Delta Google Cloud Services Output
5 minute read
Overview
The GCS Output archives items in a GCS destination. These items are raw archive bytes that are buffered with the archive buffer processor.

Configuring GCS
Before you can create an output, you must have a GCS HMAC access key for a service account that contains the Storage Admin HMAC role.
Step 1: Create a Service Account
- In the Google Cloud Console, expand the left-side navigation.
- Hover over IAM & Admin, and then click Service Accounts.
- In the top bar, click Create Service Account.
- Under Service account details, complete the empty fields, and then click Create and Continue. Copy the name for this service account. You will need this information for a later step.
- Under Grant this service account access to project, in the drop-down menu, use the search filter to locate and select Storage HMAC Key Admin, then click Continue.
- Click Done.
Step 2: Create a GCS HMAC Key
- In the Google Cloud Console, expand the left-side navigation.
- Under Storage, locate and hover over Cloud Storage, and then click Settings.
- Under Settings, click Interoperability.
- Click Create a Key for a Service Account.
- Select the newly created service account, and click Create Key.
- Copy and store the Access Key and Secret key, and then click Close.
- On the left-side navigation, click Buckets.
- Locate and select the desired bucket.
- Click Permissions.
- In the table that appears, click Grant Access.
- In the right-side window that appears, under Add principals, enter the name of the newly created service account.
- In Select role, use the search filter to locate and select Storage Admin.
- Click Save.
See this document from Google on managing HMAC keys.
Step 3: Configure the Edge Delta Agent
Next, you configure the Edge Delta agent.
Example Configuration
nodes:
- name: my_gcs
type: gcs_output
bucket: <REDACTED>
hmac_access_key: <REDACTED>
hmac_secret: <REDACTED>
compression: zstd
encoding: parquet
use_native_compression: true
path_prefix:
order:
- Year
- Month
- Day
- Hour
- 2 Minute
- tag
- host
format: ver=parquet/year=%s/month=%s/day=%s/hour=%s/min=%s/tag=%s/host=%s/
Required Parameters
name
A descriptive name for the node. This is the name that will appear in Visual Pipelines and you can reference this node in the yaml using the name. It must be unique across all nodes. It is a yaml list element so it begins with a -
and a space followed by the string. It is a required parameter for all nodes.
nodes:
- name: <node name>
type: <node type>
type: gcs_output
The type
parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.
nodes:
- name: <node name>
type: <node type>
bucket
The bucket
parameter defines the target bucket to use. It is specified as a string and is required.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
hmac_access_key
The hmac_access_key
parameter is the GCS HMAC access key that has permissions to upload files to the bucket. It is used with hmac_secret
. It is specified as a string and is required.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
hmac_access_key: <access key>
hmac_secret: <key secret>
hmac_secret
The hmac_secret
parameter is the GCS HMAC secret associated with the access key. It is used with hmac_access_key
. It is specified as a string and is required.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
hmac_access_key: <access key>
hmac_secret: <key secret>
Optional Parameters
archiver_enabled
The archiver_enabled
parameter configures whether archiver agents will be used for sending archive bytes. It is specified as a Boolean with the default of false
and it is optional.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
archiver_enabled: true
buffer_max_bytesize
The buffer_max_bytesize
parameter configures the maximum byte size for total unsuccessful items. If the limit is reached, the remaining items are discarded until the buffer space becomes available. It is specified as a datasize.Size, has a default of 0
indicating no size limit, and it is optional.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
buffer_max_bytesize: 2048
buffer_path
The buffer_path
parameter configures the path to store unsuccessful items. Unsuccessful items are stored there to be retried back (exactly once delivery). It is specified as a string and it is optional.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
buffer_path: <path to unsuccessful items folder>
buffer_ttl
The buffer_ttl
parameter configures the time-to-Live for unsuccessful items, which indicates when to discard them. It is specified as a duration, has a default of 10m
, and it is optional.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
buffer_ttl: 20m
compression
The compression
parameter specifies the compression format. It can be gzip
, zstd
, snappy
or uncompressed
. It is specified as a string, has a default of gzip
, and it is optional.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
compression: gzip | zstd | snappy | uncompressed
encoding
The encoding
parameter specifies the encoding format. It can be json
or parquet
. It is specified as a string, has a default of json
, and it is optional.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
encoding: json | parquet
path_prefix
The path_prefix
parameter configures the path prefix using order
and format
child parameters. It is optional.
The order
child parameter lists the formatting items that will define the path prefix:
- You can refer to
Year
,Month
,Day
,<any number that can divide 60> Minute
,Hour
,tag
,host
,OtherTags.<item related tags>
andLogFields.<log related tags>
. - For ECS,
ecs_cluster
,ecs_container_name
,ecs_task_family
andecs_task_version
are available. - For K8s,
k8s_namespace
,k8s_controller_kind
,k8s_controller_logical_name
,k8s_pod_name
,k8s_container_name
andk8s_container_image
are available. - For Docker,
docker_container_name
anddocker_image_name
are available
The format
child parameter specifies a format string that has %s
as placeholders per each order item.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
path_prefix:
order:
- Year
- Month
- Day
- Hour
- 2 Minute
- tag
- host
format: ver=parquet/year=%s/month=%s/day=%s/hour=%s/min=%s/tag=%s/host=%s/
use_native_compression
The use_native_compression
parameter configures whether, for parquet encoding, to only compress data segments for each archive file, not the whole file. It is specified as a Boolean, has a default of false
, and it is optional.
nodes:
- name: <node name>
type: gcs_output
bucket: <target bucket>
use_native_compression: true