Troubleshooting Apache Kudu Destination

Comprehensive troubleshooting guide for resolving common issues with the Apache Kudu destination node.

Overview

This guide helps diagnose and resolve common issues when using the Apache Kudu destination node in Edge Delta pipelines. Apache Kudu is a distributed columnar storage system that requires proper configuration for optimal performance and reliability.

Connection Issues

Symptoms

  • Pipeline fails to start with connection timeout errors
  • Intermittent connection drops during data transmission
  • “Unable to connect to Kudu master” errors in logs

Root Causes and Solutions

1. Incorrect Master Server Addresses

Problem: The hosts parameter contains incorrect addresses or ports.

Solution:

# Verify your Kudu master addresses
nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  hosts:
    # Ensure these match your actual Kudu master servers
    - master1.example.com:7051  # Default Kudu master port
    - master2.example.com:7051
    - master3.example.com:7051

Verification Steps:

  1. Test connectivity to each master server:
    telnet master1.example.com 7051
    
  2. Verify Kudu service status on master nodes:
    sudo systemctl status kudu-master
    

2. Network Connectivity Issues

Problem: Firewall rules or network policies blocking connections.

Solution:

  • Ensure port 7051 (default Kudu master port) is open
  • Check for any network policies in Kubernetes environments
  • Verify security groups in cloud environments (AWS, Azure, GCP)

3. DNS Resolution Problems

Problem: Hostname resolution failures.

Solution:

  • Use IP addresses instead of hostnames for testing
  • Verify DNS configuration in your environment
  • Check /etc/hosts entries if using custom hostname mappings

4. TLS Configuration Mismatch

Problem: TLS settings don’t match Kudu cluster configuration.

Solution:

nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  hosts:
    - localhost:7051
  tls:
    enabled: true
    ca_file: /path/to/ca-cert.pem
    crt_file: /path/to/client-cert.pem
    key_file: /path/to/client-key.pem

Schema Mismatch Errors

Symptoms

  • “Column not found” errors
  • “Type mismatch” exceptions
  • Data insertion failures with schema-related messages

Root Causes and Solutions

1. Incorrect Column Type Mapping

Problem: Specified column types don’t match actual Kudu table schema.

Solution:

schema_mappings:
  # Verify these types match your Kudu table exactly
  - column_name: timestamp
    column_type: int64  # Must match Kudu table definition
    expression: attributes["timestamp"]
  - column_name: value
    column_type: double  # Not 'float' if table uses DOUBLE
    expression: attributes["value"]

Verification: Use Kudu CLI to verify table schema:

kudu table describe <master_addresses> <table_name>

2. Missing Required Columns

Problem: Required columns in Kudu table not mapped in configuration.

Solution:

  • Ensure all non-nullable columns have mappings
  • Provide default values for optional columns when needed
schema_mappings:
  - column_name: id
    column_type: string
    expression: attributes["id"]
    required: true  # Must be provided
  - column_name: status
    column_type: string
    expression: attributes["status"]
    default_value: "active"  # Fallback value

3. Key Column Mismatch

Problem: Primary key columns not properly identified.

Solution:

schema_mappings:
  # Primary key columns must be marked and ordered correctly
  - column_name: partition_key
    column_type: string
    expression: attributes["partition"]
    is_key: true
    required: true
  - column_name: sort_key
    column_type: int64
    expression: attributes["timestamp"]
    is_key: true
    required: true

Performance Issues

Symptoms

  • Slow data ingestion rates
  • High latency in pipeline processing
  • Memory/CPU usage spikes
  • Timeouts during batch writes

Root Causes and Solutions

1. Suboptimal Batch Configuration

Problem: Batch size too small causing excessive write operations.

Solution:

nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  batch_config:
    rows_limit: 1000        # Increase from default 100
    row_size_limit: "10MB"  # Adjust based on data size
    flush_interval: "30s"   # Balance between latency and throughput
    flush_mode: auto

Tuning Guidelines:

  • High Volume: Increase rows_limit to 5000-10000
  • Low Latency: Decrease flush_interval to “5s” or less
  • Large Records: Increase row_size_limit appropriately

2. Insufficient Parallel Workers

Problem: Default worker count limiting throughput.

Solution:

nodes:
- name: my_apache_kudu
  type: apache_kudu_output
  parallel_worker_count: 10  # Increase from default 5
  connection:
    max_connections: 20     # Should be >= parallel_worker_count

3. Connection Pool Exhaustion

Problem: Too few connections for workload.

Solution:

connection:
  max_connections: 30      # Increase for high concurrency
  timeout: "60s"           # Allow more time for busy clusters
  retry_attempts: 5        # More retries for transient issues
  retry_delay: "2s"        # Backoff between retries

4. Write Mode Inefficiency

Problem: Using upsert when insert would suffice.

Solution:

# Use insert mode for append-only workloads
mode: insert  # Faster than upsert for new records

When to use each mode:

  • insert: New records only, best performance
  • upsert: Updates or inserts, handles duplicates

Data Quality Issues

Symptoms

  • Data not appearing in Kudu tables
  • Incorrect values in columns
  • Missing or null values where data expected

Root Causes and Solutions

1. Expression Evaluation Failures

Problem: CEL/OTTL expressions not extracting data correctly.

Solution:

schema_mappings:
  # Test expressions carefully
  - column_name: user_id
    column_type: string
    # Ensure path exists in your data
    expression: attributes["user"]["id"]  # Nested access
  - column_name: timestamp
    column_type: int64
    # Convert to appropriate type
    expression: int(attributes["timestamp"])

2. Type Conversion Issues

Problem: Data types not converting properly.

Common Conversions:

# String to integer
expression: int(attributes["count"])

# String to boolean
expression: attributes["active"] == "true"

# Timestamp handling
expression: int(attributes["timestamp"] * 1000)  # Convert to milliseconds

Debugging Techniques

1. Enable Debug Logging

To capture detailed logs for troubleshooting, configure the Edge Delta agent’s logging level:

For agent-wide debug logging:

# In your agent configuration
agent:
  log_level: debug  # Options: trace, debug, info, warn, error

Monitor agent logs for Kudu-specific messages:

# View agent logs (location varies by deployment)
tail -f /var/log/edgedelta/edgedelta.log | grep -i kudu

2. Test with Small Batches

Start with minimal configuration to isolate issues:

batch_config:
  rows_limit: 10          # Small batch for testing
  flush_interval: "5s"    # Quick feedback

3. Monitor Kudu Metrics

Check Kudu master and tablet server metrics:

  • Write latency
  • Queue sizes
  • Error rates
  • Resource utilization

4. Use Kudu CLI Tools

Verify table operations independently:

# List tables
kudu table list <master_addresses>

# Scan table
kudu table scan <master_addresses> <table_name>

# Check table statistics
kudu table statistics <master_addresses> <table_name>

Best Practices

1. Schema Design

  • Keep primary keys simple and efficient
  • Use appropriate column types (avoid unnecessary precision)
  • Consider partitioning strategy for large tables

2. Resource Planning

  • Monitor Edge Delta agent resource usage
  • Scale Kudu cluster based on workload
  • Use appropriate instance types for Kudu nodes

3. Error Handling

  • Implement proper retry logic
  • Monitor failed writes
  • Set up alerts for persistent failures

4. Testing Strategy

  1. Start with a test Kudu table
  2. Validate schema mappings with sample data
  3. Gradually increase load to production levels
  4. Monitor performance metrics throughout

Common Error Messages

Error Message Cause Solution
“Unable to connect to leader master” Network or configuration issue Verify master addresses and connectivity
“Column ‘X’ not found in table schema” Schema mismatch Update schema_mappings to match table
“Invalid type for column ‘X’” Type mismatch Correct column_type in configuration
“Row too large” Exceeds size limit Increase row_size_limit or split data
“Timed out waiting for flush” Slow writes Increase timeout or optimize batch config
“Maximum number of attempts reached” Persistent failures Check Kudu cluster health and logs

Getting Help

If issues persist after following this guide:

  1. Check Edge Delta agent logs for detailed error messages
  2. Review Kudu master and tablet server logs
  3. Contact Edge Delta support with:
    • Configuration snippet
    • Error messages
    • Kudu cluster version and configuration
    • Sample data structure