Troubleshooting Google Cloud BigQuery Destination

Comprehensive troubleshooting guide for resolving common issues with the Google Cloud BigQuery destination node.

Overview

This guide helps diagnose and resolve common issues when using the Google Cloud BigQuery destination node in Edge Delta pipelines. BigQuery is a serverless data warehouse that requires proper authentication, permissions, and configuration for reliable data streaming.

Permission and Authentication Issues

Symptoms

  • permission denied errors in agent logs
  • BigQuery API has not been used in project errors
  • 403 Forbidden HTTP errors
  • Data not appearing in BigQuery tables

Root Causes and Solutions

1. Missing IAM Roles

Problem: Service account lacks required BigQuery permissions.

Solution:

Verify and grant the required IAM roles:

# Check current roles
gcloud projects get-iam-policy YOUR_PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com"

# Grant required roles
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

Required IAM Roles:

  • roles/bigquery.dataEditor - Write data to tables
  • roles/bigquery.jobUser - Create streaming insert jobs

Verification:

# Test authentication
gcloud auth activate-service-account --key-file=/path/to/credentials.json
gcloud projects list

2. BigQuery API Not Enabled

Problem: BigQuery API is not enabled for the project.

Solution:

Enable the BigQuery API:

gcloud services enable bigquery.googleapis.com --project=YOUR_PROJECT_ID

Or enable via the GCP API Library.

Verification:

gcloud services list --enabled --project=YOUR_PROJECT_ID | grep bigquery

3. Invalid Credentials File

Problem: Credentials file path is incorrect, file is corrupted, or has wrong permissions.

Solution:

Verify credentials file:

# Check file exists and is readable
ls -la /etc/credentials/bigquery-sa-key.json

# Validate JSON format
jq . /etc/credentials/bigquery-sa-key.json

# Test credentials
gcloud auth activate-service-account --key-file=/etc/credentials/bigquery-sa-key.json

Ensure file permissions are restrictive:

chmod 600 /etc/credentials/bigquery-sa-key.json

For Kubernetes deployments, verify the secret exists and is mounted:

# Check secret exists
kubectl get secret bigquery-credentials -n edgedelta

# Check pod volume mounts
kubectl describe pod -n edgedelta -l app=edgedelta | grep -A 10 "Mounts:"

4. Workload Identity Misconfiguration

Problem: GKE Workload Identity binding is incorrect or missing.

Solution:

Verify Workload Identity setup:

# Check Kubernetes service account annotation
kubectl get serviceaccount KSA_NAME -n NAMESPACE -o yaml | grep iam.gke.io

# Verify IAM policy binding
gcloud iam service-accounts get-iam-policy \
  YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com \
  --project=YOUR_PROJECT_ID

# Test authentication from pod
kubectl run -it --rm debug --image=google/cloud-sdk:slim \
  --serviceaccount=KSA_NAME --namespace=NAMESPACE \
  -- gcloud auth list

Ensure the binding exists:

gcloud iam service-accounts add-iam-policy-binding \
  YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:YOUR_PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"

Dataset and Table Issues

Symptoms

  • Not found: Dataset errors
  • Not found: Table errors
  • 404 Not Found HTTP errors
  • Agent fails to start or immediately crashes

Root Causes and Solutions

1. Dataset or Table Does Not Exist

Problem: Dataset or table specified in configuration doesn’t exist in BigQuery.

Solution:

Create the dataset and table before starting the agent:

-- Create dataset
CREATE SCHEMA IF NOT EXISTS `my-project.logs_dataset`
OPTIONS(
  location="US"
);

-- Create table
CREATE TABLE `my-project.logs_dataset.application_logs` (
  timestamp TIMESTAMP NOT NULL,
  body STRING,
  severity STRING,
  attributes JSON,
  resource JSON
);

Verification:

# List datasets
bq ls --project_id=YOUR_PROJECT_ID

# List tables in dataset
bq ls --project_id=YOUR_PROJECT_ID logs_dataset

# Show table schema
bq show --project_id=YOUR_PROJECT_ID logs_dataset.application_logs

2. Case-Sensitive Naming

Problem: project_id, dataset, or table parameters have incorrect case.

Solution:

BigQuery resource names are case-sensitive. Verify exact names:

nodes:
  - name: google_cloud_big_query
    type: google_cloud_big_query_output
    project_id: my-project        # Must match exactly
    dataset: logs_dataset          # Must match exactly
    table: application_logs        # Must match exactly

Verification:

# Get exact names
bq ls --project_id=YOUR_PROJECT_ID
bq ls --project_id=YOUR_PROJECT_ID YOUR_DATASET

3. Service Account Lacks Dataset Access

Problem: Service account doesn’t have access to the specific dataset.

Solution:

Grant dataset-level permissions:

# Grant access at dataset level
bq update --source \
  --dataset_access_role=roles/bigquery.dataEditor \
  --iam_account=YOUR_SERVICE_ACCOUNT@YOUR_PROJECT_ID.iam.gserviceaccount.com \
  YOUR_PROJECT_ID:YOUR_DATASET

Agent Restart and Crash Issues

Symptoms

  • Agent continuously restarts
  • panic: runtime error: invalid memory address or nil pointer dereference in logs
  • FATAL errors during BigQuery node initialization
  • Container/pod crash loops

Root Causes and Solutions

1. Missing Required Parameters

Problem: Required parameters (project_id, dataset, table) are not specified or are empty.

Solution:

Ensure all required parameters are present and have values:

nodes:
  - name: google_cloud_big_query
    type: google_cloud_big_query_output
    project_id: my-project    # Required - must not be empty
    dataset: logs_dataset      # Required - must not be empty
    table: application_logs    # Required - must not be empty
    credentials_path: /etc/credentials/bigquery-sa-key.json

Verification:

Check agent logs for initialization errors:

# Kubernetes
kubectl logs -n edgedelta -l app=edgedelta --tail=100 | grep -i "bigquery\|fatal\|panic"

# Docker
docker logs edgedelta-agent 2>&1 | grep -i "bigquery\|fatal\|panic"

2. Nil Pointer Dereference Bug

Problem: Agent version prior to v2.7.0 with missing service account permissions causes nil pointer dereference.

Solution:

  1. Upgrade to Edge Delta agent v2.7.0 or higher (this bug is fixed in v2.7.0+):
# Check current version
kubectl get pods -n edgedelta -o jsonpath='{.items[0].spec.containers[0].image}'

# Or for Docker
docker inspect edgedelta-agent | grep -i version
  1. Verify service account permissions are correctly configured (see Permission Issues section above).

3. Credentials File Missing or Inaccessible

Problem: credentials_path points to a non-existent file or file the agent cannot read.

Solution:

Verify file accessibility from within the agent container/pod:

# Kubernetes
kubectl exec -n edgedelta POD_NAME -- ls -la /etc/credentials/
kubectl exec -n edgedelta POD_NAME -- cat /etc/credentials/bigquery-sa-key.json | jq .

# Docker
docker exec edgedelta-agent ls -la /etc/credentials/
docker exec edgedelta-agent cat /etc/credentials/bigquery-sa-key.json | jq .

Ensure the Kubernetes secret is properly mounted:

volumes:
- name: bigquery-creds
  secret:
    secretName: bigquery-credentials
    items:
    - key: bigquery-sa-key.json
      path: bigquery-sa-key.json
      mode: 0400

Schema Mismatch Issues

Symptoms

  • No such field errors in logs
  • Invalid schema update errors
  • Data appears in BigQuery but with missing or incorrect fields
  • Type mismatch errors

Root Causes and Solutions

1. Incompatible Field Types

Problem: Data item fields don’t match BigQuery column types.

Solution:

Use JSON columns for flexible schema:

-- Flexible schema with JSON
CREATE TABLE `my-project.logs_dataset.flexible_logs` (
  timestamp TIMESTAMP NOT NULL,
  body STRING,
  attributes JSON,        -- Flexible attributes
  resource JSON           -- Flexible resource metadata
);

Query JSON fields:

SELECT
  timestamp,
  JSON_VALUE(attributes, '$.user_id') as user_id,
  JSON_VALUE(attributes, '$.level') as level
FROM `my-project.logs_dataset.flexible_logs`

2. Missing Required Columns

Problem: BigQuery table has NOT NULL columns that incoming data doesn’t populate.

Solution:

Make columns nullable or provide default values:

-- Add nullable columns
ALTER TABLE `my-project.logs_dataset.application_logs`
ADD COLUMN IF NOT EXISTS severity STRING;

-- Or drop NOT NULL constraint
ALTER TABLE `my-project.logs_dataset.application_logs`
ALTER COLUMN severity DROP NOT NULL;

Alternatively, use a custom processor to add missing fields:

- name: add_defaults
  type: sequence
  processors:
    - type: ottl_transform
      statements: |
        set(attributes["severity"], "INFO") where attributes["severity"] == nil
        set(attributes["timestamp"], Now()) where attributes["timestamp"] == nil        

3. Field Name Case Mismatch

Problem: BigQuery column names don’t match data item field names (BigQuery is case-sensitive).

Solution:

Align naming conventions or transform field names:

- name: normalize_fields
  type: sequence
  processors:
    - type: ottl_transform
      statements: |
        set(attributes["timestamp"], attributes["Timestamp"])
        set(attributes["message"], attributes["Message"])        

Verification:

Review sample data item structure and compare with BigQuery schema:

# View BigQuery table schema
bq show --schema --format=prettyjson YOUR_PROJECT:YOUR_DATASET.YOUR_TABLE

# Check incoming data structure in Edge Delta logs (if debug logging enabled)

Performance and Latency Issues

Symptoms

  • Data appears in BigQuery with significant delay (>5 minutes)
  • High latency between data generation and availability in BigQuery
  • context deadline exceeded errors
  • Slow agent performance or high CPU usage

Root Causes and Solutions

1. Insufficient Parallel Workers

Problem: Default worker count (5) is too low for high-volume data streams.

Solution:

Increase parallel_worker_count:

- name: google_cloud_big_query
  type: google_cloud_big_query_output
  project_id: my-project
  dataset: logs_dataset
  table: application_logs
  credentials_path: /etc/credentials/bigquery-sa-key.json
  parallel_worker_count: 10  # Increased from default 5

Recommendation: Start with 10 workers for high-volume streams, monitor and adjust.

2. Network Connectivity Issues

Problem: Poor network connectivity between agent and BigQuery API endpoints.

Solution:

Test network connectivity to BigQuery:

# Test DNS resolution
nslookage bigquery.googleapis.com

# Test HTTPS connectivity
curl -I https://bigquery.googleapis.com

# Check latency
ping -c 10 bigquery.googleapis.com

For Kubernetes, check egress network policies:

kubectl get networkpolicies -n edgedelta

3. BigQuery Quota Limits

Problem: Hitting BigQuery API rate limits or quotas.

Solution:

Check quotas in GCP Console:

  • Navigate to IAM & Admin > Quotas
  • Filter by “BigQuery API”
  • Look for “Streaming inserts per second” and “Streaming inserts per day”

Request quota increases if needed, or reduce data volume:

# Add rate limiting in source nodes if needed
- name: my_source
  type: kubernetes_logs
  rate_limit:
    max_items_per_second: 1000

4. Streaming Buffer Delay

Problem: Normal BigQuery streaming buffer delay (data takes time to become available for querying).

Solution:

This is expected BigQuery behavior:

  • Data is immediately written to streaming buffer
  • Typically available for querying within seconds to a few minutes
  • Allow up to 90 minutes for data to fully commit and leave streaming buffer

Verification:

Check streaming buffer status:

-- View streaming buffer info
SELECT
  *
FROM
  `my-project.logs_dataset.__TABLES__`
WHERE
  table_id = 'application_logs';

Data Not Appearing

Symptoms

  • No errors in logs but data not in BigQuery
  • Zero rows in BigQuery table
  • Agent reports success but queries return no results

Root Causes and Solutions

1. Wrong Environment/Dataset

Problem: Data is going to a different dataset or project than expected.

Solution:

Verify configuration matches where you’re querying:

# Double-check these match your query
project_id: my-project
dataset: logs_dataset
table: application_logs

Query the correct location:

SELECT COUNT(*)
FROM `my-project.logs_dataset.application_logs`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);

2. Filtering Dropping Data

Problem: Pipeline filters upstream of BigQuery destination are dropping all data.

Solution:

Review pipeline workflows and filters:

# Check agent logs for filter activity
kubectl logs -n edgedelta POD_NAME | grep -i "filter\|drop\|discard"

Temporarily remove filters to test:

workflows:
  - from: my_source
    to: google_cloud_big_query  # Direct connection, no filters

3. No Data Flowing from Source

Problem: Source node isn’t generating or receiving data.

Solution:

Verify source node is active:

# Check for source activity in logs
kubectl logs -n edgedelta POD_NAME | grep -i "source\|input"

# Check metrics if available
kubectl port-forward -n edgedelta POD_NAME 6060:6060
curl http://localhost:6060/metrics | grep source

Test with a simple demo input:

- name: test_input
  type: demo_input
  events_per_sec: 1
  log_type: apache_combined

workflows:
  - from: test_input
    to: google_cloud_big_query

Still Having Issues?

If problems persist after trying these solutions:

  1. Enable Debug Logging:

    logging:
      level: debug
    
  2. Collect Diagnostic Information:

    # Agent logs
    kubectl logs -n edgedelta POD_NAME --previous > agent-logs.txt
    
    # Configuration
    kubectl get configmap -n edgedelta -o yaml > config.yaml
    
    # BigQuery permissions
    gcloud projects get-iam-policy YOUR_PROJECT_ID > iam-policy.txt
    
  3. Contact Edge Delta Support at /contact-support with:

    • Agent version
    • Error messages from logs
    • Configuration (sanitized)
    • BigQuery project/dataset/table names
    • Authentication method used

See Also