Send Data to Google Cloud BigQuery
6 minute read
Overview
You can send logs, metrics, and custom telemetry data from an Edge Delta pipeline to Google BigQuery by adding a Google Cloud BigQuery destination node. BigQuery is Google Cloud’s serverless, highly scalable data warehouse that enables fast SQL queries for analytics, reporting, and long-term storage.
See Google Cloud BigQuery Destination Node for complete configuration details.
Note: This node is currently in beta and available for Enterprise tier accounts. Requires Edge Delta agent version v2.7.0 or higher.
Prerequisites
Before configuring the BigQuery destination, ensure you have:
- Google Cloud Project with BigQuery API enabled
- BigQuery Dataset created in your project
- BigQuery Table with schema matching your data structure
- Service Account with appropriate IAM roles, or GKE Workload Identity configured
Configure BigQuery
1. Enable BigQuery API
Ensure the BigQuery API is enabled for your Google Cloud project:
gcloud services enable bigquery.googleapis.com --project=YOUR_PROJECT_ID
Or enable it via the GCP API Library.
2. Create Dataset and Table
Create a BigQuery dataset and table to store your data:
-- Create dataset
CREATE SCHEMA IF NOT EXISTS `my-project.logs_dataset`
OPTIONS(
location="US"
);
-- Create table for logs
CREATE TABLE `my-project.logs_dataset.application_logs` (
timestamp TIMESTAMP NOT NULL,
body STRING,
severity STRING,
attributes JSON,
resource JSON
);
For metrics data:
CREATE TABLE `my-project.metrics_dataset.system_metrics` (
timestamp TIMESTAMP NOT NULL,
metric_name STRING NOT NULL,
metric_value FLOAT64 NOT NULL,
attributes JSON,
resource JSON
);
Schema Design Tips:
- Use
JSONcolumns for flexible attribute storage - Include
TIMESTAMPcolumn for time-series analysis - Add partitioning by timestamp for improved query performance
- Consider clustering by frequently queried fields
3. Configure Authentication
Choose one of the following authentication methods:
Option A: Service Account with JSON Credentials (Default)
Create a Service Account:
gcloud iam service-accounts create bigquery-writer \ --display-name="BigQuery Writer for Edge Delta" \ --project=YOUR_PROJECT_IDGrant Required IAM Roles:
# Allow writing data to BigQuery tables gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/bigquery.dataEditor" # Allow creating BigQuery jobs for streaming inserts gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/bigquery.jobUser"Create and Download JSON Key:
gcloud iam service-accounts keys create bigquery-sa-key.json \ --iam-account=bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.comStore Credentials Securely:
# Set restrictive permissions chmod 600 bigquery-sa-key.json
Option B: GKE Workload Identity (Recommended for GKE)
If running Edge Delta in Google Kubernetes Engine (GKE), use Workload Identity for secure, keyless authentication:
Enable Workload Identity on your GKE cluster (if not already enabled):
gcloud container clusters update CLUSTER_NAME \ --workload-pool=PROJECT_ID.svc.id.goog \ --region=REGIONCreate IAM Service Account (if not exists):
gcloud iam service-accounts create bigquery-writer \ --project=YOUR_PROJECT_IDGrant BigQuery Permissions:
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/bigquery.dataEditor" gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ --member="serviceAccount:bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/bigquery.jobUser"Bind Kubernetes Service Account to IAM Service Account:
gcloud iam service-accounts add-iam-policy-binding \ bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com \ --role=roles/iam.workloadIdentityUser \ --member="serviceAccount:YOUR_PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"Annotate Kubernetes Service Account:
kubectl annotate serviceaccount KSA_NAME \ --namespace NAMESPACE \ iam.gke.io/gcp-service-account=bigquery-writer@YOUR_PROJECT_ID.iam.gserviceaccount.com
See GKE Workload Identity documentation for detailed setup.
Option C: Compute Engine Default Service Account
When running on GCP Compute Engine instances, the instance’s default service account can be used if it has the required BigQuery permissions. No credentials file is needed.
Configure Kubernetes (If Using JSON Credentials)
If Edge Delta agents run in Kubernetes and you’re using JSON credentials:
Create Kubernetes Secret:
kubectl create secret generic bigquery-credentials \ --from-file=bigquery-sa-key.json \ --namespace=edgedeltaMount Secret in Edge Delta Deployment:
apiVersion: apps/v1 kind: DaemonSet metadata: name: edgedelta namespace: edgedelta spec: template: spec: containers: - name: edgedelta volumeMounts: - name: bigquery-creds mountPath: /etc/credentials readOnly: true volumes: - name: bigquery-creds secret: secretName: bigquery-credentials
Edge Delta Configuration
Configure the BigQuery destination node in your pipeline. See Google Cloud BigQuery Destination Node for all configuration parameters.
With JSON Credentials
nodes:
- name: google_cloud_big_query
type: google_cloud_big_query_output
project_id: my-gcp-project
dataset: logs_dataset
table: application_logs
credentials_path: /etc/credentials/bigquery-sa-key.json
parallel_worker_count: 5
Using Workload Identity (GKE)
nodes:
- name: google_cloud_big_query
type: google_cloud_big_query_output
project_id: my-gcp-project
dataset: logs_dataset
table: application_logs
# credentials_path: Not specified for Workload Identity
parallel_worker_count: 5
With Data Transformation
If your data requires transformation before writing to BigQuery, route it through a custom processor that supports OTTL statements:
nodes:
- name: transform_for_bigquery
type: sequence
processors:
- type: ottl_transform
statements: |
set(attributes["project"], "production")
set(attributes["environment"], "prod")
- name: google_cloud_big_query
type: google_cloud_big_query_output
project_id: my-gcp-project
dataset: logs_dataset
table: application_logs
credentials_path: /etc/credentials/bigquery-sa-key.json
parallel_worker_count: 10
workflows:
- from: my_source
to: transform_for_bigquery
- from: transform_for_bigquery
to: google_cloud_big_query
Data Schema Mapping
Edge Delta automatically maps data item fields to BigQuery table columns. Ensure your BigQuery table schema matches the structure of incoming data:
Common Field Mappings
| Data Item Field | BigQuery Column Type | Description |
|---|---|---|
timestamp | TIMESTAMP | Event timestamp |
body | STRING | Log message body or raw content |
attributes.* | STRING, INT64, FLOAT64, BOOL, JSON | Custom attributes and metadata |
severity | STRING | Log severity level (INFO, WARN, ERROR) |
resource.* | JSON | Resource metadata (host, pod, container) |
Flexible Schema with JSON
For dynamic or evolving schemas, use JSON columns:
CREATE TABLE `my-project.logs_dataset.flexible_logs` (
timestamp TIMESTAMP NOT NULL,
body STRING,
attributes JSON,
resource JSON
) PARTITION BY DATE(timestamp);
Query JSON fields in BigQuery:
SELECT
timestamp,
JSON_VALUE(attributes, '$.user_id') as user_id,
JSON_VALUE(attributes, '$.action') as action
FROM `my-project.logs_dataset.flexible_logs`
WHERE DATE(timestamp) = CURRENT_DATE()
Performance Optimization
Parallel Workers
Increase parallel_worker_count for high-volume data streams:
parallel_worker_count: 10 # Default is 5
Buffering Configuration
Configure buffering for reliability during transient failures:
buffer_max_bytesize: 10485760 # 10MB
buffer_path: /var/lib/edgedelta/buffers/bigquery
buffer_ttl: 30m
BigQuery Table Optimization
- Partitioning: Partition by timestamp for faster queries and cost reduction
- Clustering: Cluster by frequently filtered fields
- Streaming Buffer: Allow 90 minutes for data to leave streaming buffer before running MERGE/UPDATE operations
Verification
After configuring the destination, verify data is flowing to BigQuery:
1. Check Edge Delta Logs
# For Kubernetes
kubectl logs -n edgedelta -l app=edgedelta | grep bigquery
# For Docker
docker logs edgedelta-agent | grep bigquery
Look for successful write confirmations or error messages.
2. Query BigQuery Table
-- Check recent data
SELECT *
FROM `my-project.logs_dataset.application_logs`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 5 MINUTE)
ORDER BY timestamp DESC
LIMIT 10;
-- Count records by hour
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
COUNT(*) as record_count
FROM `my-project.logs_dataset.application_logs`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY hour
ORDER BY hour DESC;
3. Monitor BigQuery Streaming
View streaming statistics in the BigQuery console:
- Navigate to your dataset and table
- Check “Table Details” for streaming buffer information
- Monitor “Storage” tab for data volume
Use Cases
Long-Term Log Storage and Analytics
Store application logs in BigQuery for long-term retention and ad-hoc SQL analysis:
- name: log_archive_bigquery
type: google_cloud_big_query_output
project_id: my-project
dataset: log_archive
table: application_logs_2024
credentials_path: /etc/credentials/bq-archiver.json
Query historical logs for trend analysis, compliance reporting, or incident investigation across months or years of data.
Metrics Warehousing
Aggregate and store metrics data for custom reporting and dashboarding:
- name: metrics_bigquery
type: google_cloud_big_query_output
project_id: my-project
dataset: metrics
table: system_metrics
parallel_worker_count: 10
Integrate with Looker, Data Studio, or Tableau for visualization and business intelligence.
Security Event Analysis
Store security-related logs and events for threat hunting and compliance:
- name: security_events_bigquery
type: google_cloud_big_query_output
project_id: security-project
dataset: security_logs
table: auth_events
credentials_path: /etc/credentials/security-bq.json
Run SQL queries to detect anomalies, track authentication patterns, and generate security reports.
Next Steps
- Google Cloud BigQuery Destination Node Documentation
- Troubleshooting BigQuery Destination
- Custom Processor for Data Transformation
- BigQuery Best Practices
For additional help, visit Edge Delta Support.