Send Data to Google Cloud BigQuery

Configure Edge Delta to stream logs, metrics, and telemetry data to Google Cloud BigQuery for analytics and long-term storage.

Overview

You can send logs, metrics, and custom telemetry data from an Edge Delta pipeline to Google BigQuery by adding a Google Cloud BigQuery destination node. BigQuery is Google Cloud’s serverless, highly scalable data warehouse that enables fast SQL queries for analytics, reporting, and long-term storage.

See Google Cloud BigQuery Destination Node for complete configuration details.

Note: This node is currently in beta and available for Enterprise tier accounts. Requires Edge Delta agent version v2.7.0 or higher.

Prerequisites

Before configuring the BigQuery destination, ensure you have:

  1. Google Cloud Project with BigQuery API enabled
  2. BigQuery Dataset created in your project
  3. BigQuery Table with schema matching your data structure
  4. Service Account with appropriate IAM roles, or GKE Workload Identity configured

Configure BigQuery

1. Enable BigQuery API

Ensure the BigQuery API is enabled for your Google Cloud project:

gcloud services enable bigquery.googleapis.com --project=YOUR_PROJECT_ID

Or enable it via the GCP API Library.

2. Create Dataset and Table

Create a BigQuery dataset and table to store your data:

-- Create dataset
CREATE SCHEMA IF NOT EXISTS `my-project.logs_dataset`
OPTIONS(
  location="US"
);

-- Create table for logs
CREATE TABLE `my-project.logs_dataset.application_logs` (
  timestamp TIMESTAMP NOT NULL,
  body STRING,
  severity STRING,
  attributes JSON,
  resource JSON
);

For metrics data:

CREATE TABLE `my-project.metrics_dataset.system_metrics` (
  timestamp TIMESTAMP NOT NULL,
  metric_name STRING NOT NULL,
  metric_value FLOAT64 NOT NULL,
  attributes JSON,
  resource JSON
);

Schema Design Tips:

  • Use JSON columns for flexible attribute storage
  • Include TIMESTAMP column for time-series analysis
  • Add partitioning by timestamp for improved query performance
  • Consider clustering by frequently queried fields

3. Configure Authentication

Choose one of the following authentication methods:

Option A: Service Account with JSON Credentials (Default)

  1. Create a Service Account:

    gcloud iam service-accounts create bigquery-writer \
      --display-name="BigQuery Writer for Edge Delta" \
      --project=YOUR_PROJECT_ID
    
  2. Grant Required IAM Roles:

    # Allow writing data to BigQuery tables
    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
      --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/bigquery.dataEditor"
    
    # Allow creating BigQuery jobs for streaming inserts
    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
      --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/bigquery.jobUser"
    
  3. Create and Download JSON Key:

    gcloud iam service-accounts keys create bigquery-sa-key.json \
      --iam-account=bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com
    
  4. Store Credentials Securely:

    # Set restrictive permissions
    chmod 600 bigquery-sa-key.json
    

If running Edge Delta in Google Kubernetes Engine (GKE), use Workload Identity for secure, keyless authentication:

  1. Enable Workload Identity on your GKE cluster (if not already enabled):

    gcloud container clusters update CLUSTER_NAME \
      --workload-pool=PROJECT_ID.svc.id.goog \
      --region=REGION
    
  2. Create IAM Service Account (if not exists):

    gcloud iam service-accounts create bigquery-writer \
      --project=YOUR_PROJECT_ID
    
  3. Grant BigQuery Permissions:

    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
      --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/bigquery.dataEditor"
    
    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
      --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
      --role="roles/bigquery.jobUser"
    
  4. Bind Kubernetes Service Account to IAM Service Account:

    gcloud iam service-accounts add-iam-policy-binding \
      bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com \
      --role=roles/iam.workloadIdentityUser \
      --member="serviceAccount:YOUR_PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
    
  5. Annotate Kubernetes Service Account:

    kubectl annotate serviceaccount KSA_NAME \
      --namespace NAMESPACE \
      iam.gke.io/gcp-service-account=bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com
    

See GKE Workload Identity documentation for detailed setup.

Option C: Compute Engine Default Service Account

When running on GCP Compute Engine instances, the instance’s default service account can be used if it has the required BigQuery permissions. No credentials file is needed.

Configure Kubernetes (If Using JSON Credentials)

If Edge Delta agents run in Kubernetes and you’re using JSON credentials:

  1. Create Kubernetes Secret:

    kubectl create secret generic bigquery-credentials \
      --from-file=bigquery-sa-key.json \
      --namespace=edgedelta
    
  2. Mount Secret in Edge Delta Deployment:

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: edgedelta
      namespace: edgedelta
    spec:
      template:
        spec:
          containers:
          - name: edgedelta
            volumeMounts:
            - name: bigquery-creds
              mountPath: /etc/credentials
              readOnly: true
          volumes:
          - name: bigquery-creds
            secret:
              secretName: bigquery-credentials
    

Edge Delta Configuration

Configure the BigQuery destination node in your pipeline. See Google Cloud BigQuery Destination Node for all configuration parameters.

With JSON Credentials

nodes:
  - name: google_cloud_big_query
    type: google_cloud_big_query_output
    project_id: my-gcp-project
    dataset: logs_dataset
    table: application_logs
    credentials_path: /etc/credentials/bigquery-sa-key.json
    parallel_worker_count: 5

Using Workload Identity (GKE)

nodes:
  - name: google_cloud_big_query
    type: google_cloud_big_query_output
    project_id: my-gcp-project
    dataset: logs_dataset
    table: application_logs
    # credentials_path: Not specified for Workload Identity
    parallel_worker_count: 5

With Data Transformation

If your data requires transformation before writing to BigQuery, route it through a custom processor that supports OTTL statements:

nodes:
  - name: transform_for_bigquery
    type: sequence
    processors:
      - type: ottl_transform
        statements: |
          set(attributes["project"], "production")
          set(attributes["environment"], "prod")          

  - name: google_cloud_big_query
    type: google_cloud_big_query_output
    project_id: my-gcp-project
    dataset: logs_dataset
    table: application_logs
    credentials_path: /etc/credentials/bigquery-sa-key.json
    parallel_worker_count: 10

workflows:
  - from: my_source
    to: transform_for_bigquery
  - from: transform_for_bigquery
    to: google_cloud_big_query

Data Schema Mapping

Edge Delta automatically maps data item fields to BigQuery table columns. Ensure your BigQuery table schema matches the structure of incoming data:

Common Field Mappings

Data Item FieldBigQuery Column TypeDescription
timestampTIMESTAMPEvent timestamp
bodySTRINGLog message body or raw content
attributes.*STRING, INT64, FLOAT64, BOOL, JSONCustom attributes and metadata
severitySTRINGLog severity level (INFO, WARN, ERROR)
resource.*JSONResource metadata (host, pod, container)

Flexible Schema with JSON

For dynamic or evolving schemas, use JSON columns:

CREATE TABLE `my-project.logs_dataset.flexible_logs` (
  timestamp TIMESTAMP NOT NULL,
  body STRING,
  attributes JSON,
  resource JSON
) PARTITION BY DATE(timestamp);

Query JSON fields in BigQuery:

SELECT
  timestamp,
  JSON_VALUE(attributes, '$.user_id') as user_id,
  JSON_VALUE(attributes, '$.action') as action
FROM `my-project.logs_dataset.flexible_logs`
WHERE DATE(timestamp) = CURRENT_DATE()

Performance Optimization

Parallel Workers

Increase parallel_worker_count for high-volume data streams:

parallel_worker_count: 10  # Default is 5

Buffering Configuration

Configure buffering for reliability during transient failures:

buffer_max_bytesize: 10485760  # 10MB
buffer_path: /var/lib/edgedelta/buffers/bigquery
buffer_ttl: 30m

BigQuery Table Optimization

  • Partitioning: Partition by timestamp for faster queries and cost reduction
  • Clustering: Cluster by frequently filtered fields
  • Streaming Buffer: Allow 90 minutes for data to leave streaming buffer before running MERGE/UPDATE operations

Verification

After configuring the destination, verify data is flowing to BigQuery:

1. Check Edge Delta Logs

# For Kubernetes
kubectl logs -n edgedelta -l app=edgedelta | grep bigquery

# For Docker
docker logs edgedelta-agent | grep bigquery

Look for successful write confirmations or error messages.

2. Query BigQuery Table

-- Check recent data
SELECT *
FROM `my-project.logs_dataset.application_logs`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 5 MINUTE)
ORDER BY timestamp DESC
LIMIT 10;

-- Count records by hour
SELECT
  TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
  COUNT(*) as record_count
FROM `my-project.logs_dataset.application_logs`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY hour
ORDER BY hour DESC;

3. Monitor BigQuery Streaming

View streaming statistics in the BigQuery console:

  • Navigate to your dataset and table
  • Check “Table Details” for streaming buffer information
  • Monitor “Storage” tab for data volume

Use Cases

Long-Term Log Storage and Analytics

Store application logs in BigQuery for long-term retention and ad-hoc SQL analysis:

- name: log_archive_bigquery
  type: google_cloud_big_query_output
  project_id: my-project
  dataset: log_archive
  table: application_logs_2024
  credentials_path: /etc/credentials/bq-archiver.json

Query historical logs for trend analysis, compliance reporting, or incident investigation across months or years of data.

Metrics Warehousing

Aggregate and store metrics data for custom reporting and dashboarding:

- name: metrics_bigquery
  type: google_cloud_big_query_output
  project_id: my-project
  dataset: metrics
  table: system_metrics
  parallel_worker_count: 10

Integrate with Looker, Data Studio, or Tableau for visualization and business intelligence.

Security Event Analysis

Store security-related logs and events for threat hunting and compliance:

- name: security_events_bigquery
  type: google_cloud_big_query_output
  project_id: security-project
  dataset: security_logs
  table: auth_events
  credentials_path: /etc/credentials/security-bq.json

Run SQL queries to detect anomalies, track authentication patterns, and generate security reports.

Next Steps

For additional help, visit Edge Delta Support.