Edge Delta Google Cloud BigQuery Destination

Configure the Google Cloud BigQuery destination node to send logs, metrics, and custom data to BigQuery tables for analytics and long-term storage.

Overview

The Google Cloud BigQuery destination node sends data to BigQuery tables for analytics, reporting, and long-term storage. BigQuery is Google Cloud’s serverless, highly scalable data warehouse that enables fast SQL queries using the processing power of Google’s infrastructure.

This node streams data directly into BigQuery tables using the BigQuery Storage Write API, making it ideal for real-time analytics on logs, metrics, and custom telemetry data.

Note: This node is currently in beta and is available for Enterprise tier accounts.

This node requires Edge Delta agent version v2.7.0 or higher.

Example Configuration

This configuration sends log data to a BigQuery table. The logs are streamed to the application_logs table within the logs_dataset dataset in the my-gcp-project GCP project. The node authenticates using a service account credentials file and uses 5 parallel workers for optimal throughput.

If your data requires transformation before writing to BigQuery (such as adding fields, filtering attributes, or reformatting data structure), route the data through a custom processor before the BigQuery destination. The custom processor supports standard OTTL statements to modify data items to match your BigQuery table schema.

nodes:
  - name: google_cloud_big_query
    type: google_cloud_big_query_output
    project_id: my-gcp-project
    dataset: logs_dataset
    table: application_logs
    credentials_path: /etc/credentials/gcp-bigquery.json
    parallel_worker_count: 5

Required Parameters

name

A descriptive name for the node. This is the name that will appear in pipeline builder and you can reference this node in the YAML using the name. It must be unique across all nodes. It is a YAML list element so it begins with a - and a space followed by the string. It is a required parameter for all nodes.

nodes:
  - name: <node name>
    type: <node type>

type: google_cloud_big_query_output

The type parameter specifies the type of node being configured. It is specified as a string from a closed list of node types. It is a required parameter.

nodes:
  - name: <node name>
    type: <node type>

project_id

The project_id parameter specifies the Google Cloud Platform project ID where your BigQuery dataset resides. This must match the project containing your target dataset and table. It is specified as a string and is required.

- name: <node name>
  type: google_cloud_big_query_output
  project_id: my-gcp-project
  dataset: <dataset name>
  table: <table name>

dataset

The dataset parameter specifies the BigQuery dataset name that contains the target table. The dataset must already exist in the specified project before streaming data. It is specified as a string and is required.

- name: <node name>
  type: google_cloud_big_query_output
  project_id: <project id>
  dataset: logs_dataset
  table: <table name>

table

The table parameter specifies the name of the BigQuery table to write data to. The table must already exist within the specified dataset with a schema compatible with the incoming data structure. It is specified as a string and is required.

- name: <node name>
  type: google_cloud_big_query_output
  project_id: <project id>
  dataset: <dataset name>
  table: application_logs

Optional Parameters

credentials_path

Path to a Google Cloud service account JSON credentials file. This file contains the authentication credentials for accessing BigQuery. If omitted, the node will attempt to use GKE Workload Identity or other ambient credentials (such as Compute Engine default service account).

- name: <node name>
  type: google_cloud_big_query_output
  project_id: <project id>
  dataset: <dataset name>
  table: <table name>
  credentials_path: /etc/credentials/bigquery-sa.json

Security Note: Ensure the credentials file is securely mounted and has restricted file permissions (e.g., chmod 600).

parallel_worker_count

The parallel_worker_count parameter sets the number of workers writing data to BigQuery in parallel. Increasing this value can improve throughput for high-volume data streams. It is specified as an integer, has a default of 5, and is optional.

- name: <node name>
  type: google_cloud_big_query_output
  project_id: <project id>
  dataset: <dataset name>
  table: <table name>
  parallel_worker_count: 10

buffer_max_bytesize

The buffer_max_bytesize parameter configures the maximum byte size for total unsuccessful items. If the limit is reached, the remaining items are discarded until the buffer space becomes available. It is specified as a datasize.Size, has a default of 0 indicating no size limit, and it is optional.

- name: <node name>
  type: google_cloud_big_query_output
  project_id: <project id>
  dataset: <dataset name>
  table: <table name>
  buffer_max_bytesize: 10485760  # 10MB

buffer_path

The buffer_path parameter configures the path to store unsuccessful items. Unsuccessful items are stored there to be retried back (exactly once delivery). It is specified as a string and it is optional.

- name: <node name>
  type: google_cloud_big_query_output
  project_id: <project id>
  dataset: <dataset name>
  table: <table name>
  buffer_path: /var/lib/edgedelta/buffers/bigquery

buffer_ttl

The buffer_ttl parameter configures the time-to-live for unsuccessful items, which indicates when to discard them. It is specified as a duration, has a default of 10m, and it is optional.

- name: <node name>
  type: google_cloud_big_query_output
  project_id: <project id>
  dataset: <dataset name>
  table: <table name>
  buffer_ttl: 30m

Authentication

The BigQuery destination supports three authentication methods:

  1. JSON Credentials File - Service account credentials file (default method)
  2. GKE Workload Identity - Keyless authentication for GKE clusters (recommended for Kubernetes)
  3. Compute Engine Default Service Account - Instance-level authentication for GCE

Required IAM Roles:

  • roles/bigquery.dataEditor - Write data to tables
  • roles/bigquery.jobUser - Create streaming insert jobs

For detailed authentication setup, see Send Data to Google Cloud BigQuery.

BigQuery Table Schema

The target BigQuery table must have a schema that matches the structure of incoming data items. Edge Delta automatically maps data item fields to table columns based on the data item structure.

Common Field Mappings:

Data Item FieldBigQuery Column TypeDescription
timestampTIMESTAMPEvent timestamp
bodySTRINGLog message body
attributes.*STRING, INT64, FLOAT64, BOOLCustom attributes
severitySTRINGLog severity level
resource.*STRINGResource metadata

Example Table Schema:

CREATE TABLE `project.dataset.logs_table` (
  timestamp TIMESTAMP NOT NULL,
  body STRING,
  severity STRING,
  attributes JSON,
  resource JSON
);

For metrics, the schema typically includes:

CREATE TABLE `project.dataset.metrics_table` (
  timestamp TIMESTAMP NOT NULL,
  metric_name STRING NOT NULL,
  metric_value FLOAT64 NOT NULL,
  attributes JSON,
  resource JSON
);

Use Cases

Long-Term Log Storage and Analytics

Stream application logs to BigQuery for long-term retention and ad-hoc SQL analysis. Query logs across months or years to identify trends, investigate historical incidents, or generate compliance reports.

- name: bigquery_log_archive
  type: google_cloud_big_query_output
  project_id: my-project
  dataset: log_archive
  table: application_logs_2024
  credentials_path: /etc/credentials/bq-archiver.json

Metrics Warehousing

Aggregate and store metrics data in BigQuery for custom reporting, dashboarding with Looker/Data Studio, or machine learning model training.

- name: bigquery_metrics
  type: google_cloud_big_query_output
  project_id: my-project
  dataset: metrics
  table: system_metrics
  parallel_worker_count: 10

Security Event Analysis

Store security-related logs and events in BigQuery for threat hunting, compliance auditing, and security analytics using SQL queries.

- name: bigquery_security_events
  type: google_cloud_big_query_output
  project_id: security-project
  dataset: security_logs
  table: auth_events
  credentials_path: /etc/credentials/security-bq.json

Troubleshooting

For comprehensive troubleshooting guidance including permission errors, schema mismatches, agent restart loops, and performance issues, see Troubleshooting Google Cloud BigQuery Destination.

See Also