Troubleshooting Apache Kudu Destination

Comprehensive troubleshooting guide for resolving common issues with the Apache Kudu destination node.

8 minute read

Overview

This guide helps diagnose and resolve common issues when using the Apache Kudu destination node in Edge Delta pipelines. Apache Kudu is a distributed columnar storage system that requires proper configuration for optimal performance and reliability.

Connection Issues

Symptoms

Pipeline fails to start with connection timeout errors
Intermittent connection drops during data transmission
“Unable to connect to Kudu master” errors in logs

Root Causes and Solutions

1. Incorrect Master Server Addresses

Problem: The hosts parameter contains incorrect addresses or ports.

Solution:

# Verify your Kudu master addresses
nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  hosts:
    # Ensure these match your actual Kudu master servers
    - master1.example.com:7051  # Default Kudu master port
    - master2.example.com:7051
    - master3.example.com:7051

Verification Steps:

Test connectivity to each master server:
```
telnet master1.example.com 7051
```
Verify Kudu service status on master nodes:
```
sudo systemctl status kudu-master
```

2. Network Connectivity Issues

Problem: Firewall rules or network policies blocking connections.

Solution:

Ensure port 7051 (default Kudu master port) is open
Check for any network policies in Kubernetes environments
Verify security groups in cloud environments (AWS, Azure, GCP)

3. DNS Resolution Problems

Problem: Hostname resolution failures.

Solution:

Use IP addresses instead of hostnames for testing
Verify DNS configuration in your environment
Check /etc/hosts entries if using custom hostname mappings

4. TLS Configuration Mismatch

Problem: TLS settings don’t match Kudu cluster configuration.

Solution:

nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  hosts:
    - localhost:7051
  tls:
    enabled: true
    ca_file: /path/to/ca-cert.pem
    crt_file: /path/to/client-cert.pem
    key_file: /path/to/client-key.pem

Kerberos Authentication Issues

Apache Kudu clusters typically require Kerberos authentication. If you cannot connect to your Kudu cluster, authentication configuration is often the cause.

Symptoms

“Authentication failed” or “GSSAPI Error” messages
“Cannot connect to Kudu master” errors without clear network issues
“Unauthorized” or “Permission denied” responses
Connection timeouts despite network connectivity being confirmed

Root Causes and Solutions

1. Missing Kerberos Configuration

Problem: Attempting to connect to a Kerberos-protected Kudu cluster without authentication configuration.

Solution: Add the kudu_security block with Kerberos credentials:

nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  hosts:
    - kudu-master1.example.com:7051
  table_name: my_table
  kudu_security:
    auth_type: kerberos
    kerberos:
      principal: edgedelta-agent@EXAMPLE.COM
      keytab: /etc/security/keytabs/edgedelta.keytab
      realm: EXAMPLE.COM
      sasl_protocol_name: kudu
      krb5_conf_path: /etc/krb5.conf
    tls:
      ca_file: /etc/ssl/certs/kudu-ca.crt
  schema_mappings:
    # ... your mappings ...

See Kerberos Authentication for detailed setup instructions.

2. Clock Skew Between Agent and KDC

Problem: Kerberos requires synchronized time between clients and the Key Distribution Center (typically within 5 minutes).

Symptoms: Authentication fails with “Clock skew too great” or similar time-related errors.

Solution:

Verify time synchronization on the Edge Delta agent host:
```
timedatectl status
```

Enable and configure NTP:

sudo systemctl enable ntp
sudo systemctl start ntp

If time cannot be perfectly synchronized, adjust clockskew in /etc/krb5.conf:

[libdefaults]
    clockskew = 300  # 5 minutes tolerance (in seconds)

3. Keytab File Issues

Problem: The keytab file is missing, has incorrect permissions, or contains wrong credentials.

Solution:

Verify the keytab file exists and is readable:

ls -la /etc/security/keytabs/edgedelta.keytab

Check keytab contents:

klist -kt /etc/security/keytabs/edgedelta.keytab

Test authentication manually:

kinit -kt /etc/security/keytabs/edgedelta.keytab edgedelta-agent@EXAMPLE.COM
klist

Ensure proper permissions (readable by the Edge Delta agent process):

chmod 400 /etc/security/keytabs/edgedelta.keytab
chown edgedelta:edgedelta /etc/security/keytabs/edgedelta.keytab

4. Incorrect Principal or Realm

Problem: The principal name or realm in the configuration doesn’t match what’s registered in the KDC.

Solution:

Verify the exact principal name with your Kerberos administrator
Ensure the realm is uppercase (Kerberos realms are case-sensitive)
Check that the principal format matches: service/hostname@REALM or username@REALM

5. KDC Connectivity Issues

Problem: The agent cannot reach the Key Distribution Center.

Solution:

Test connectivity to the KDC (default port 88):
```
nc -zv kdc.example.com 88
```

Verify krb5.conf has correct KDC addresses:

[realms]
    EXAMPLE.COM = {
        kdc = kdc.example.com
        admin_server = kdc.example.com
    }

6. TLS Certificate Issues

Problem: When using TLS with Kerberos, certificate validation fails.

Solution:

kudu_security:
  auth_type: kerberos
  kerberos:
    # ... kerberos config ...
  tls:
    ca_file: /etc/ssl/certs/kudu-ca.crt  # Ensure this file exists and is valid

Verify the CA certificate:

openssl x509 -in /etc/ssl/certs/kudu-ca.crt -text -noout

Kubernetes-Specific Kerberos Issues

When running in Kubernetes, ensure keytab and krb5.conf files are properly mounted:

# Helm values.yaml
agent:
  extraVolumes:
    - name: kerberos-keytab
      secret:
        secretName: edgedelta-kerberos
        defaultMode: 0400
    - name: krb5-config
      configMap:
        name: krb5-config
  extraVolumeMounts:
    - name: kerberos-keytab
      mountPath: /etc/security/keytabs
      readOnly: true
    - name: krb5-config
      mountPath: /etc/krb5.conf
      subPath: krb5.conf
      readOnly: true

Verify mounts inside the pod:

kubectl exec -it <pod-name> -n edgedelta -- ls -la /etc/security/keytabs/
kubectl exec -it <pod-name> -n edgedelta -- cat /etc/krb5.conf

Schema Mismatch Errors

Symptoms

“Column not found” errors
“Type mismatch” exceptions
Data insertion failures with schema-related messages

Root Causes and Solutions

1. Incorrect Column Type Mapping

Problem: Specified column types don’t match actual Kudu table schema.

Solution:

schema_mappings:
  # Verify these types match your Kudu table exactly
  - column_name: timestamp
    column_type: int64  # Must match Kudu table definition
    expression: attributes["timestamp"]
  - column_name: value
    column_type: double  # Not 'float' if table uses DOUBLE
    expression: attributes["value"]

Verification: Use Kudu CLI to verify table schema:

kudu table describe <master_addresses> <table_name>

2. Missing Required Columns

Problem: Required columns in Kudu table not mapped in configuration.

Solution:

Ensure all non-nullable columns have mappings
Provide default values for optional columns when needed

schema_mappings:
  - column_name: id
    column_type: string
    expression: attributes["id"]
    required: true  # Must be provided
  - column_name: status
    column_type: string
    expression: attributes["status"]
    default_value: "active"  # Fallback value

3. Key Column Mismatch

Problem: Primary key columns not properly identified.

Solution:

schema_mappings:
  # Primary key columns must be marked and ordered correctly
  - column_name: partition_key
    column_type: string
    expression: attributes["partition"]
    is_key: true
    required: true
  - column_name: sort_key
    column_type: int64
    expression: attributes["timestamp"]
    is_key: true
    required: true

Performance Issues

Symptoms

Slow data ingestion rates
High latency in pipeline processing
Memory/CPU usage spikes
Timeouts during batch writes

Root Causes and Solutions

1. Suboptimal Batch Configuration

Problem: Batch size too small causing excessive write operations.

Solution:

nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  batch_config:
    rows_limit: 1000        # Increase from default 100
    row_size_limit: "10MB"  # Adjust based on data size
    flush_interval: "30s"   # Balance between latency and throughput
    flush_mode: auto

Tuning Guidelines:

High Volume: Increase rows_limit to 5000-10000
Low Latency: Decrease flush_interval to “5s” or less
Large Records: Increase row_size_limit appropriately

2. Insufficient Parallel Workers

Problem: Default worker count limiting throughput.

Solution:

nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  parallel_worker_count: 10  # Increase from default 2
  connection:
    max_connections: 20     # Should be >= parallel_worker_count

3. Connection Pool Exhaustion

Problem: Too few connections for workload.

Solution:

connection:
  max_connections: 30      # Increase for high concurrency
  timeout: "60s"           # Allow more time for busy clusters
  retry_attempts: 5        # More retries for transient issues
  retry_delay: "2s"        # Backoff between retries

4. Write Mode Inefficiency

Problem: Using upsert when insert would suffice.

Solution:

# Use insert mode for append-only workloads
mode: insert  # Faster than upsert for new records

When to use each mode:

insert: New records only, best performance
upsert: Updates or inserts, handles duplicates

Data Quality Issues

Symptoms

Data not appearing in Kudu tables
Incorrect values in columns
Missing or null values where data expected

Root Causes and Solutions

1. Expression Evaluation Failures

Problem: CEL/OTTL expressions not extracting data correctly.

Solution:

schema_mappings:
  # Test expressions carefully
  - column_name: user_id
    column_type: string
    # Ensure path exists in your data
    expression: attributes["user"]["id"]  # Nested access
  - column_name: timestamp
    column_type: int64
    # Convert to appropriate type
    expression: int(attributes["timestamp"])

2. Type Conversion Issues

Problem: Data types not converting properly.

Common Conversions:

# String to integer
expression: int(attributes["count"])

# String to boolean
expression: attributes["active"] == "true"

# Timestamp handling
expression: int(attributes["timestamp"] * 1000)  # Convert to milliseconds

Debugging Techniques

1. Enable Debug Logging

To capture detailed logs for troubleshooting, configure the Edge Delta agent’s logging level:

For agent-wide debug logging:

# In your agent configuration
agent:
  log_level: debug  # Options: trace, debug, info, warn, error

Monitor agent logs for Kudu-specific messages:

# View agent logs (location varies by deployment)
tail -f /var/log/edgedelta/edgedelta.log | grep -i kudu

2. Test with Small Batches

Start with minimal configuration to isolate issues:

batch_config:
  rows_limit: 10          # Small batch for testing
  flush_interval: "5s"    # Quick feedback

3. Monitor Kudu Metrics

Check Kudu master and tablet server metrics:

Write latency
Queue sizes
Error rates
Resource utilization

4. Use Kudu CLI Tools

Verify table operations independently:

# List tables
kudu table list <master_addresses>

# Scan table
kudu table scan <master_addresses> <table_name>

# Check table statistics
kudu table statistics <master_addresses> <table_name>

Best Practices

1. Schema Design

Keep primary keys simple and efficient
Use appropriate column types (avoid unnecessary precision)
Consider partitioning strategy for large tables

2. Resource Planning

Monitor Edge Delta agent resource usage
Scale Kudu cluster based on workload
Use appropriate instance types for Kudu nodes

3. Error Handling

Implement proper retry logic
Monitor failed writes
Set up alerts for persistent failures

4. Testing Strategy

Start with a test Kudu table
Validate schema mappings with sample data
Gradually increase load to production levels
Monitor performance metrics throughout

Common Error Messages

Error Message	Cause	Solution
“Unable to connect to leader master”	Network or configuration issue	Verify master addresses and connectivity
“Column ‘X’ not found in table schema”	Schema mismatch	Update schema_mappings to match table
“Invalid type for column ‘X’”	Type mismatch	Correct column_type in configuration
“Row too large”	Exceeds size limit	Increase row_size_limit or split data
“Timed out waiting for flush”	Slow writes	Increase timeout or optimize batch config
“Maximum number of attempts reached”	Persistent failures	Check Kudu cluster health and logs

Getting Help

If issues persist after following this guide:

Check Edge Delta agent logs for detailed error messages
Review Kudu master and tablet server logs
Contact Edge Delta support with:
- Configuration snippet
- Error messages
- Kudu cluster version and configuration
- Sample data structure

Troubleshooting Apache Kudu Destination

Overview

Connection Issues

Symptoms

Root Causes and Solutions

1. Incorrect Master Server Addresses

2. Network Connectivity Issues

3. DNS Resolution Problems

4. TLS Configuration Mismatch

Kerberos Authentication Issues

Symptoms

Root Causes and Solutions

1. Missing Kerberos Configuration

2. Clock Skew Between Agent and KDC

3. Keytab File Issues

4. Incorrect Principal or Realm

5. KDC Connectivity Issues

6. TLS Certificate Issues

Kubernetes-Specific Kerberos Issues

Schema Mismatch Errors

Symptoms

Root Causes and Solutions

1. Incorrect Column Type Mapping

2. Missing Required Columns

3. Key Column Mismatch

Performance Issues

Symptoms

Root Causes and Solutions

1. Suboptimal Batch Configuration

2. Insufficient Parallel Workers

3. Connection Pool Exhaustion

4. Write Mode Inefficiency

Data Quality Issues

Symptoms

Root Causes and Solutions

1. Expression Evaluation Failures

2. Type Conversion Issues

Debugging Techniques

1. Enable Debug Logging

2. Test with Small Batches

3. Monitor Kudu Metrics

4. Use Kudu CLI Tools

Best Practices

1. Schema Design

2. Resource Planning

3. Error Handling

4. Testing Strategy

Common Error Messages

Getting Help

Related Documentation

Edge Delta AI Assistant

Quick Topics

Recent Questions

Hi! I'm your Edge Delta AI Assistant

Current Context