Troubleshooting Apache Kudu Destination
6 minute read
Overview
This guide helps diagnose and resolve common issues when using the Apache Kudu destination node in Edge Delta pipelines. Apache Kudu is a distributed columnar storage system that requires proper configuration for optimal performance and reliability.
Connection Issues
Symptoms
- Pipeline fails to start with connection timeout errors
- Intermittent connection drops during data transmission
- “Unable to connect to Kudu master” errors in logs
Root Causes and Solutions
1. Incorrect Master Server Addresses
Problem: The hosts
parameter contains incorrect addresses or ports.
Solution:
# Verify your Kudu master addresses
nodes:
- name: my_apache_kudu
type: apache_kudu_output
hosts:
# Ensure these match your actual Kudu master servers
- master1.example.com:7051 # Default Kudu master port
- master2.example.com:7051
- master3.example.com:7051
Verification Steps:
- Test connectivity to each master server:
telnet master1.example.com 7051
- Verify Kudu service status on master nodes:
sudo systemctl status kudu-master
2. Network Connectivity Issues
Problem: Firewall rules or network policies blocking connections.
Solution:
- Ensure port 7051 (default Kudu master port) is open
- Check for any network policies in Kubernetes environments
- Verify security groups in cloud environments (AWS, Azure, GCP)
3. DNS Resolution Problems
Problem: Hostname resolution failures.
Solution:
- Use IP addresses instead of hostnames for testing
- Verify DNS configuration in your environment
- Check
/etc/hosts
entries if using custom hostname mappings
4. TLS Configuration Mismatch
Problem: TLS settings don’t match Kudu cluster configuration.
Solution:
nodes:
- name: my_apache_kudu
type: apache_kudu_output
hosts:
- localhost:7051
tls:
enabled: true
ca_file: /path/to/ca-cert.pem
crt_file: /path/to/client-cert.pem
key_file: /path/to/client-key.pem
Schema Mismatch Errors
Symptoms
- “Column not found” errors
- “Type mismatch” exceptions
- Data insertion failures with schema-related messages
Root Causes and Solutions
1. Incorrect Column Type Mapping
Problem: Specified column types don’t match actual Kudu table schema.
Solution:
schema_mappings:
# Verify these types match your Kudu table exactly
- column_name: timestamp
column_type: int64 # Must match Kudu table definition
expression: attributes["timestamp"]
- column_name: value
column_type: double # Not 'float' if table uses DOUBLE
expression: attributes["value"]
Verification: Use Kudu CLI to verify table schema:
kudu table describe <master_addresses> <table_name>
2. Missing Required Columns
Problem: Required columns in Kudu table not mapped in configuration.
Solution:
- Ensure all non-nullable columns have mappings
- Provide default values for optional columns when needed
schema_mappings:
- column_name: id
column_type: string
expression: attributes["id"]
required: true # Must be provided
- column_name: status
column_type: string
expression: attributes["status"]
default_value: "active" # Fallback value
3. Key Column Mismatch
Problem: Primary key columns not properly identified.
Solution:
schema_mappings:
# Primary key columns must be marked and ordered correctly
- column_name: partition_key
column_type: string
expression: attributes["partition"]
is_key: true
required: true
- column_name: sort_key
column_type: int64
expression: attributes["timestamp"]
is_key: true
required: true
Performance Issues
Symptoms
- Slow data ingestion rates
- High latency in pipeline processing
- Memory/CPU usage spikes
- Timeouts during batch writes
Root Causes and Solutions
1. Suboptimal Batch Configuration
Problem: Batch size too small causing excessive write operations.
Solution:
nodes:
- name: my_apache_kudu
type: apache_kudu_output
batch_config:
rows_limit: 1000 # Increase from default 100
row_size_limit: "10MB" # Adjust based on data size
flush_interval: "30s" # Balance between latency and throughput
flush_mode: auto
Tuning Guidelines:
- High Volume: Increase
rows_limit
to 5000-10000 - Low Latency: Decrease
flush_interval
to “5s” or less - Large Records: Increase
row_size_limit
appropriately
2. Insufficient Parallel Workers
Problem: Default worker count limiting throughput.
Solution:
nodes:
- name: my_apache_kudu
type: apache_kudu_output
parallel_worker_count: 10 # Increase from default 5
connection:
max_connections: 20 # Should be >= parallel_worker_count
3. Connection Pool Exhaustion
Problem: Too few connections for workload.
Solution:
connection:
max_connections: 30 # Increase for high concurrency
timeout: "60s" # Allow more time for busy clusters
retry_attempts: 5 # More retries for transient issues
retry_delay: "2s" # Backoff between retries
4. Write Mode Inefficiency
Problem: Using upsert when insert would suffice.
Solution:
# Use insert mode for append-only workloads
mode: insert # Faster than upsert for new records
When to use each mode:
- insert: New records only, best performance
- upsert: Updates or inserts, handles duplicates
Data Quality Issues
Symptoms
- Data not appearing in Kudu tables
- Incorrect values in columns
- Missing or null values where data expected
Root Causes and Solutions
1. Expression Evaluation Failures
Problem: CEL/OTTL expressions not extracting data correctly.
Solution:
schema_mappings:
# Test expressions carefully
- column_name: user_id
column_type: string
# Ensure path exists in your data
expression: attributes["user"]["id"] # Nested access
- column_name: timestamp
column_type: int64
# Convert to appropriate type
expression: int(attributes["timestamp"])
2. Type Conversion Issues
Problem: Data types not converting properly.
Common Conversions:
# String to integer
expression: int(attributes["count"])
# String to boolean
expression: attributes["active"] == "true"
# Timestamp handling
expression: int(attributes["timestamp"] * 1000) # Convert to milliseconds
Debugging Techniques
1. Enable Debug Logging
To capture detailed logs for troubleshooting, configure the Edge Delta agent’s logging level:
For agent-wide debug logging:
# In your agent configuration
agent:
log_level: debug # Options: trace, debug, info, warn, error
Monitor agent logs for Kudu-specific messages:
# View agent logs (location varies by deployment)
tail -f /var/log/edgedelta/edgedelta.log | grep -i kudu
2. Test with Small Batches
Start with minimal configuration to isolate issues:
batch_config:
rows_limit: 10 # Small batch for testing
flush_interval: "5s" # Quick feedback
3. Monitor Kudu Metrics
Check Kudu master and tablet server metrics:
- Write latency
- Queue sizes
- Error rates
- Resource utilization
4. Use Kudu CLI Tools
Verify table operations independently:
# List tables
kudu table list <master_addresses>
# Scan table
kudu table scan <master_addresses> <table_name>
# Check table statistics
kudu table statistics <master_addresses> <table_name>
Best Practices
1. Schema Design
- Keep primary keys simple and efficient
- Use appropriate column types (avoid unnecessary precision)
- Consider partitioning strategy for large tables
2. Resource Planning
- Monitor Edge Delta agent resource usage
- Scale Kudu cluster based on workload
- Use appropriate instance types for Kudu nodes
3. Error Handling
- Implement proper retry logic
- Monitor failed writes
- Set up alerts for persistent failures
4. Testing Strategy
- Start with a test Kudu table
- Validate schema mappings with sample data
- Gradually increase load to production levels
- Monitor performance metrics throughout
Common Error Messages
Error Message | Cause | Solution |
---|---|---|
“Unable to connect to leader master” | Network or configuration issue | Verify master addresses and connectivity |
“Column ‘X’ not found in table schema” | Schema mismatch | Update schema_mappings to match table |
“Invalid type for column ‘X’” | Type mismatch | Correct column_type in configuration |
“Row too large” | Exceeds size limit | Increase row_size_limit or split data |
“Timed out waiting for flush” | Slow writes | Increase timeout or optimize batch config |
“Maximum number of attempts reached” | Persistent failures | Check Kudu cluster health and logs |
Getting Help
If issues persist after following this guide:
- Check Edge Delta agent logs for detailed error messages
- Review Kudu master and tablet server logs
- Contact Edge Delta support with:
- Configuration snippet
- Error messages
- Kudu cluster version and configuration
- Sample data structure