Troubleshooting Google Cloud BigQuery Destination
8 minute read
Overview
This guide helps diagnose and resolve common issues when using the Google Cloud BigQuery destination node in Edge Delta pipelines. BigQuery is a serverless data warehouse that requires proper authentication, permissions, and configuration for reliable data streaming.
Permission and Authentication Issues
Symptoms
permission deniederrors in agent logsBigQuery API has not been used in projecterrors403 ForbiddenHTTP errors- Data not appearing in BigQuery tables
Root Causes and Solutions
1. Missing IAM Roles
Problem: Service account lacks required BigQuery permissions.
Solution:
Verify and grant the required IAM roles:
# Check current roles
gcloud projects get-iam-policy YOUR_PROJECT_ID \
--flatten="bindings[].members" \
--filter="bindings.members:serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com"
# Grant required roles
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.dataEditor"
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.jobUser"
Required IAM Roles:
roles/bigquery.dataEditor- Write data to tablesroles/bigquery.jobUser- Create streaming insert jobs
Verification:
# Test authentication
gcloud auth activate-service-account --key-file=/path/to/credentials.json
gcloud projects list
2. BigQuery API Not Enabled
Problem: BigQuery API is not enabled for the project.
Solution:
Enable the BigQuery API:
gcloud services enable bigquery.googleapis.com --project=YOUR_PROJECT_ID
Or enable via the GCP API Library.
Verification:
gcloud services list --enabled --project=YOUR_PROJECT_ID | grep bigquery
3. Invalid Credentials File
Problem: Credentials file path is incorrect, file is corrupted, or has wrong permissions.
Solution:
Verify credentials file:
# Check file exists and is readable
ls -la /etc/credentials/bigquery-sa-key.json
# Validate JSON format
jq . /etc/credentials/bigquery-sa-key.json
# Test credentials
gcloud auth activate-service-account --key-file=/etc/credentials/bigquery-sa-key.json
Ensure file permissions are restrictive:
chmod 600 /etc/credentials/bigquery-sa-key.json
For Kubernetes deployments, verify the secret exists and is mounted:
# Check secret exists
kubectl get secret bigquery-credentials -n edgedelta
# Check pod volume mounts
kubectl describe pod -n edgedelta -l app=edgedelta | grep -A 10 "Mounts:"
4. Workload Identity Misconfiguration
Problem: GKE Workload Identity binding is incorrect or missing.
Solution:
Verify Workload Identity setup:
# Check Kubernetes service account annotation
kubectl get serviceaccount KSA_NAME -n NAMESPACE -o yaml | grep iam.gke.io
# Verify IAM policy binding
gcloud iam service-accounts get-iam-policy \
YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com \
--project=YOUR_PROJECT_ID
# Test authentication from pod
kubectl run -it --rm debug --image=google/cloud-sdk:slim \
--serviceaccount=KSA_NAME --namespace=NAMESPACE \
-- gcloud auth list
Ensure the binding exists:
gcloud iam service-accounts add-iam-policy-binding \
YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com \
--role=roles/iam.workloadIdentityUser \
--member="serviceAccount:YOUR_PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
Dataset and Table Issues
Symptoms
Not found: DataseterrorsNot found: Tableerrors404 Not FoundHTTP errors- Agent fails to start or immediately crashes
Root Causes and Solutions
1. Dataset or Table Does Not Exist
Problem: Dataset or table specified in configuration doesn’t exist in BigQuery.
Solution:
Create the dataset and table before starting the agent:
-- Create dataset
CREATE SCHEMA IF NOT EXISTS `my-project.logs_dataset`
OPTIONS(
location="US"
);
-- Create table
CREATE TABLE `my-project.logs_dataset.application_logs` (
timestamp TIMESTAMP NOT NULL,
body STRING,
severity STRING,
attributes JSON,
resource JSON
);
Verification:
# List datasets
bq ls --project_id=YOUR_PROJECT_ID
# List tables in dataset
bq ls --project_id=YOUR_PROJECT_ID logs_dataset
# Show table schema
bq show --project_id=YOUR_PROJECT_ID logs_dataset.application_logs
2. Case-Sensitive Naming
Problem: project_id, dataset, or table parameters have incorrect case.
Solution:
BigQuery resource names are case-sensitive. Verify exact names:
nodes:
- name: google_cloud_big_query
type: google_cloud_big_query_output
project_id: my-project # Must match exactly
dataset: logs_dataset # Must match exactly
table: application_logs # Must match exactly
Verification:
# Get exact names
bq ls --project_id=YOUR_PROJECT_ID
bq ls --project_id=YOUR_PROJECT_ID YOUR_DATASET
3. Service Account Lacks Dataset Access
Problem: Service account doesn’t have access to the specific dataset.
Solution:
Grant dataset-level permissions:
# Grant access at dataset level
bq update --source \
--dataset_access_role=roles/bigquery.dataEditor \
--iam_account=YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com \
YOUR_PROJECT_ID:YOUR_DATASET
Agent Restart and Crash Issues
Symptoms
- Agent continuously restarts
panic: runtime error: invalid memory address or nil pointer dereferencein logsFATALerrors during BigQuery node initialization- Container/pod crash loops
Root Causes and Solutions
1. Missing Required Parameters
Problem: Required parameters (project_id, dataset, table) are not specified or are empty.
Solution:
Ensure all required parameters are present and have values:
nodes:
- name: google_cloud_big_query
type: google_cloud_big_query_output
project_id: my-project # Required - must not be empty
dataset: logs_dataset # Required - must not be empty
table: application_logs # Required - must not be empty
credentials_path: /etc/credentials/bigquery-sa-key.json
Verification:
Check agent logs for initialization errors:
# Kubernetes
kubectl logs -n edgedelta -l app=edgedelta --tail=100 | grep -i "bigquery\|fatal\|panic"
# Docker
docker logs edgedelta-agent 2>&1 | grep -i "bigquery\|fatal\|panic"
2. Nil Pointer Dereference Bug
Problem: Agent version prior to v2.7.0 with missing service account permissions causes nil pointer dereference.
Solution:
- Upgrade to Edge Delta agent v2.7.0 or higher (this bug is fixed in v2.7.0+):
# Check current version
kubectl get pods -n edgedelta -o jsonpath='{.items[0].spec.containers[0].image}'
# Or for Docker
docker inspect edgedelta-agent | grep -i version
- Verify service account permissions are correctly configured (see Permission Issues section above).
3. Credentials File Missing or Inaccessible
Problem: credentials_path points to a non-existent file or file the agent cannot read.
Solution:
Verify file accessibility from within the agent container/pod:
# Kubernetes
kubectl exec -n edgedelta POD_NAME -- ls -la /etc/credentials/
kubectl exec -n edgedelta POD_NAME -- cat /etc/credentials/bigquery-sa-key.json | jq .
# Docker
docker exec edgedelta-agent ls -la /etc/credentials/
docker exec edgedelta-agent cat /etc/credentials/bigquery-sa-key.json | jq .
Ensure the Kubernetes secret is properly mounted:
volumes:
- name: bigquery-creds
secret:
secretName: bigquery-credentials
items:
- key: bigquery-sa-key.json
path: bigquery-sa-key.json
mode: 0400
Schema Mismatch Issues
Symptoms
No such fielderrors in logsInvalid schema updateerrors- Data appears in BigQuery but with missing or incorrect fields
- Type mismatch errors
Root Causes and Solutions
1. Incompatible Field Types
Problem: Data item fields don’t match BigQuery column types.
Solution:
Use JSON columns for flexible schema:
-- Flexible schema with JSON
CREATE TABLE `my-project.logs_dataset.flexible_logs` (
timestamp TIMESTAMP NOT NULL,
body STRING,
attributes JSON, -- Flexible attributes
resource JSON -- Flexible resource metadata
);
Query JSON fields:
SELECT
timestamp,
JSON_VALUE(attributes, '$.user_id') as user_id,
JSON_VALUE(attributes, '$.level') as level
FROM `my-project.logs_dataset.flexible_logs`
2. Missing Required Columns
Problem: BigQuery table has NOT NULL columns that incoming data doesn’t populate.
Solution:
Make columns nullable or provide default values:
-- Add nullable columns
ALTER TABLE `my-project.logs_dataset.application_logs`
ADD COLUMN IF NOT EXISTS severity STRING;
-- Or drop NOT NULL constraint
ALTER TABLE `my-project.logs_dataset.application_logs`
ALTER COLUMN severity DROP NOT NULL;
Alternatively, use a custom processor to add missing fields:
- name: add_defaults
type: sequence
processors:
- type: ottl_transform
statements: |
set(attributes["severity"], "INFO") where attributes["severity"] == nil
set(attributes["timestamp"], Now()) where attributes["timestamp"] == nil
3. Field Name Case Mismatch
Problem: BigQuery column names don’t match data item field names (BigQuery is case-sensitive).
Solution:
Align naming conventions or transform field names:
- name: normalize_fields
type: sequence
processors:
- type: ottl_transform
statements: |
set(attributes["timestamp"], attributes["Timestamp"])
set(attributes["message"], attributes["Message"])
Verification:
Review sample data item structure and compare with BigQuery schema:
# View BigQuery table schema
bq show --schema --format=prettyjson YOUR_PROJECT:YOUR_DATASET.YOUR_TABLE
# Check incoming data structure in Edge Delta logs (if debug logging enabled)
Performance and Latency Issues
Symptoms
- Data appears in BigQuery with significant delay (>5 minutes)
- High latency between data generation and availability in BigQuery
context deadline exceedederrors- Slow agent performance or high CPU usage
Root Causes and Solutions
1. Insufficient Parallel Workers
Problem: Default worker count (5) is too low for high-volume data streams.
Solution:
Increase parallel_worker_count:
- name: google_cloud_big_query
type: google_cloud_big_query_output
project_id: my-project
dataset: logs_dataset
table: application_logs
credentials_path: /etc/credentials/bigquery-sa-key.json
parallel_worker_count: 10 # Increased from default 5
Recommendation: Start with 10 workers for high-volume streams, monitor and adjust.
2. Network Connectivity Issues
Problem: Poor network connectivity between agent and BigQuery API endpoints.
Solution:
Test network connectivity to BigQuery:
# Test DNS resolution
nslookage bigquery.googleapis.com
# Test HTTPS connectivity
curl -I https://bigquery.googleapis.com
# Check latency
ping -c 10 bigquery.googleapis.com
For Kubernetes, check egress network policies:
kubectl get networkpolicies -n edgedelta
3. BigQuery Quota Limits
Problem: Hitting BigQuery API rate limits or quotas.
Solution:
Check quotas in GCP Console:
- Navigate to IAM & Admin > Quotas
- Filter by “BigQuery API”
- Look for “Streaming inserts per second” and “Streaming inserts per day”
Request quota increases if needed, or reduce data volume:
# Add rate limiting in source nodes if needed
- name: my_source
type: kubernetes_logs
rate_limit:
max_items_per_second: 1000
4. Streaming Buffer Delay
Problem: Normal BigQuery streaming buffer delay (data takes time to become available for querying).
Solution:
This is expected BigQuery behavior:
- Data is immediately written to streaming buffer
- Typically available for querying within seconds to a few minutes
- Allow up to 90 minutes for data to fully commit and leave streaming buffer
Verification:
Check streaming buffer status:
-- View streaming buffer info
SELECT
*
FROM
`my-project.logs_dataset.__TABLES__`
WHERE
table_id = 'application_logs';
Data Not Appearing
Symptoms
- No errors in logs but data not in BigQuery
- Zero rows in BigQuery table
- Agent reports success but queries return no results
Root Causes and Solutions
1. Wrong Environment/Dataset
Problem: Data is going to a different dataset or project than expected.
Solution:
Verify configuration matches where you’re querying:
# Double-check these match your query
project_id: my-project
dataset: logs_dataset
table: application_logs
Query the correct location:
SELECT COUNT(*)
FROM `my-project.logs_dataset.application_logs`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
2. Filtering Dropping Data
Problem: Pipeline filters upstream of BigQuery destination are dropping all data.
Solution:
Review pipeline workflows and filters:
# Check agent logs for filter activity
kubectl logs -n edgedelta POD_NAME | grep -i "filter\|drop\|discard"
Temporarily remove filters to test:
workflows:
- from: my_source
to: google_cloud_big_query # Direct connection, no filters
3. No Data Flowing from Source
Problem: Source node isn’t generating or receiving data.
Solution:
Verify source node is active:
# Check for source activity in logs
kubectl logs -n edgedelta POD_NAME | grep -i "source\|input"
# Check metrics if available
kubectl port-forward -n edgedelta POD_NAME 6060:6060
curl http://localhost:6060/metrics | grep source
Test with a simple demo input:
- name: test_input
type: demo_input
events_per_sec: 1
log_type: apache_combined
workflows:
- from: test_input
to: google_cloud_big_query
Still Having Issues?
If problems persist after trying these solutions:
Enable Debug Logging:
logging: level: debugCollect Diagnostic Information:
# Agent logs kubectl logs -n edgedelta POD_NAME --previous > agent-logs.txt # Configuration kubectl get configmap -n edgedelta -o yaml > config.yaml # BigQuery permissions gcloud projects get-iam-policy YOUR_PROJECT_ID > iam-policy.txtContact Edge Delta Support at /contact-support with:
- Agent version
- Error messages from logs
- Configuration (sanitized)
- BigQuery project/dataset/table names
- Authentication method used