Troubleshooting Apache Kudu Destination
8 minute read
Overview
This guide helps diagnose and resolve common issues when using the Apache Kudu destination node in Edge Delta pipelines. Apache Kudu is a distributed columnar storage system that requires proper configuration for optimal performance and reliability.
Connection Issues
Symptoms
- Pipeline fails to start with connection timeout errors
- Intermittent connection drops during data transmission
- “Unable to connect to Kudu master” errors in logs
Root Causes and Solutions
1. Incorrect Master Server Addresses
Problem: The hosts parameter contains incorrect addresses or ports.
Solution:
# Verify your Kudu master addresses
nodes:
- name: my_apache_kudu
type: apache_kudu_output
hosts:
# Ensure these match your actual Kudu master servers
- master1.example.com:7051 # Default Kudu master port
- master2.example.com:7051
- master3.example.com:7051
Verification Steps:
- Test connectivity to each master server:
telnet master1.example.com 7051 - Verify Kudu service status on master nodes:
sudo systemctl status kudu-master
2. Network Connectivity Issues
Problem: Firewall rules or network policies blocking connections.
Solution:
- Ensure port 7051 (default Kudu master port) is open
- Check for any network policies in Kubernetes environments
- Verify security groups in cloud environments (AWS, Azure, GCP)
3. DNS Resolution Problems
Problem: Hostname resolution failures.
Solution:
- Use IP addresses instead of hostnames for testing
- Verify DNS configuration in your environment
- Check
/etc/hostsentries if using custom hostname mappings
4. TLS Configuration Mismatch
Problem: TLS settings don’t match Kudu cluster configuration.
Solution:
nodes:
- name: my_apache_kudu
type: apache_kudu_output
hosts:
- localhost:7051
tls:
enabled: true
ca_file: /path/to/ca-cert.pem
crt_file: /path/to/client-cert.pem
key_file: /path/to/client-key.pem
Kerberos Authentication Issues
Apache Kudu clusters typically require Kerberos authentication. If you cannot connect to your Kudu cluster, authentication configuration is often the cause.
Symptoms
- “Authentication failed” or “GSSAPI Error” messages
- “Cannot connect to Kudu master” errors without clear network issues
- “Unauthorized” or “Permission denied” responses
- Connection timeouts despite network connectivity being confirmed
Root Causes and Solutions
1. Missing Kerberos Configuration
Problem: Attempting to connect to a Kerberos-protected Kudu cluster without authentication configuration.
Solution: Add the kudu_security block with Kerberos credentials:
nodes:
- name: my_apache_kudu
type: apache_kudu_output
hosts:
- kudu-master1.example.com:7051
table_name: my_table
kudu_security:
auth_type: kerberos
kerberos:
principal: edgedelta-agent@EXAMPLE.COM
keytab: /etc/security/keytabs/edgedelta.keytab
realm: EXAMPLE.COM
sasl_protocol_name: kudu
krb5_conf_path: /etc/krb5.conf
tls:
ca_file: /etc/ssl/certs/kudu-ca.crt
schema_mappings:
# ... your mappings ...
See Kerberos Authentication for detailed setup instructions.
2. Clock Skew Between Agent and KDC
Problem: Kerberos requires synchronized time between clients and the Key Distribution Center (typically within 5 minutes).
Symptoms: Authentication fails with “Clock skew too great” or similar time-related errors.
Solution:
- Verify time synchronization on the Edge Delta agent host:
timedatectl status - Enable and configure NTP:
sudo systemctl enable ntp sudo systemctl start ntp - If time cannot be perfectly synchronized, adjust
clockskewin/etc/krb5.conf:[libdefaults] clockskew = 300 # 5 minutes tolerance (in seconds)
3. Keytab File Issues
Problem: The keytab file is missing, has incorrect permissions, or contains wrong credentials.
Solution:
- Verify the keytab file exists and is readable:
ls -la /etc/security/keytabs/edgedelta.keytab - Check keytab contents:
klist -kt /etc/security/keytabs/edgedelta.keytab - Test authentication manually:
kinit -kt /etc/security/keytabs/edgedelta.keytab edgedelta-agent@EXAMPLE.COM klist - Ensure proper permissions (readable by the Edge Delta agent process):
chmod 400 /etc/security/keytabs/edgedelta.keytab chown edgedelta:edgedelta /etc/security/keytabs/edgedelta.keytab
4. Incorrect Principal or Realm
Problem: The principal name or realm in the configuration doesn’t match what’s registered in the KDC.
Solution:
- Verify the exact principal name with your Kerberos administrator
- Ensure the realm is uppercase (Kerberos realms are case-sensitive)
- Check that the principal format matches:
service/hostname@REALMorusername@REALM
5. KDC Connectivity Issues
Problem: The agent cannot reach the Key Distribution Center.
Solution:
- Test connectivity to the KDC (default port 88):
nc -zv kdc.example.com 88 - Verify krb5.conf has correct KDC addresses:
[realms] EXAMPLE.COM = { kdc = kdc.example.com admin_server = kdc.example.com }
6. TLS Certificate Issues
Problem: When using TLS with Kerberos, certificate validation fails.
Solution:
kudu_security:
auth_type: kerberos
kerberos:
# ... kerberos config ...
tls:
ca_file: /etc/ssl/certs/kudu-ca.crt # Ensure this file exists and is valid
Verify the CA certificate:
openssl x509 -in /etc/ssl/certs/kudu-ca.crt -text -noout
Kubernetes-Specific Kerberos Issues
When running in Kubernetes, ensure keytab and krb5.conf files are properly mounted:
# Helm values.yaml
agent:
extraVolumes:
- name: kerberos-keytab
secret:
secretName: edgedelta-kerberos
defaultMode: 0400
- name: krb5-config
configMap:
name: krb5-config
extraVolumeMounts:
- name: kerberos-keytab
mountPath: /etc/security/keytabs
readOnly: true
- name: krb5-config
mountPath: /etc/krb5.conf
subPath: krb5.conf
readOnly: true
Verify mounts inside the pod:
kubectl exec -it <pod-name> -n edgedelta -- ls -la /etc/security/keytabs/
kubectl exec -it <pod-name> -n edgedelta -- cat /etc/krb5.conf
Schema Mismatch Errors
Symptoms
- “Column not found” errors
- “Type mismatch” exceptions
- Data insertion failures with schema-related messages
Root Causes and Solutions
1. Incorrect Column Type Mapping
Problem: Specified column types don’t match actual Kudu table schema.
Solution:
schema_mappings:
# Verify these types match your Kudu table exactly
- column_name: timestamp
column_type: int64 # Must match Kudu table definition
expression: attributes["timestamp"]
- column_name: value
column_type: double # Not 'float' if table uses DOUBLE
expression: attributes["value"]
Verification: Use Kudu CLI to verify table schema:
kudu table describe <master_addresses> <table_name>
2. Missing Required Columns
Problem: Required columns in Kudu table not mapped in configuration.
Solution:
- Ensure all non-nullable columns have mappings
- Provide default values for optional columns when needed
schema_mappings:
- column_name: id
column_type: string
expression: attributes["id"]
required: true # Must be provided
- column_name: status
column_type: string
expression: attributes["status"]
default_value: "active" # Fallback value
3. Key Column Mismatch
Problem: Primary key columns not properly identified.
Solution:
schema_mappings:
# Primary key columns must be marked and ordered correctly
- column_name: partition_key
column_type: string
expression: attributes["partition"]
is_key: true
required: true
- column_name: sort_key
column_type: int64
expression: attributes["timestamp"]
is_key: true
required: true
Performance Issues
Symptoms
- Slow data ingestion rates
- High latency in pipeline processing
- Memory/CPU usage spikes
- Timeouts during batch writes
Root Causes and Solutions
1. Suboptimal Batch Configuration
Problem: Batch size too small causing excessive write operations.
Solution:
nodes:
- name: my_apache_kudu
type: apache_kudu_output
batch_config:
rows_limit: 1000 # Increase from default 100
row_size_limit: "10MB" # Adjust based on data size
flush_interval: "30s" # Balance between latency and throughput
flush_mode: auto
Tuning Guidelines:
- High Volume: Increase
rows_limitto 5000-10000 - Low Latency: Decrease
flush_intervalto “5s” or less - Large Records: Increase
row_size_limitappropriately
2. Insufficient Parallel Workers
Problem: Default worker count limiting throughput.
Solution:
nodes:
- name: my_apache_kudu
type: apache_kudu_output
parallel_worker_count: 10 # Increase from default 2
connection:
max_connections: 20 # Should be >= parallel_worker_count
3. Connection Pool Exhaustion
Problem: Too few connections for workload.
Solution:
connection:
max_connections: 30 # Increase for high concurrency
timeout: "60s" # Allow more time for busy clusters
retry_attempts: 5 # More retries for transient issues
retry_delay: "2s" # Backoff between retries
4. Write Mode Inefficiency
Problem: Using upsert when insert would suffice.
Solution:
# Use insert mode for append-only workloads
mode: insert # Faster than upsert for new records
When to use each mode:
- insert: New records only, best performance
- upsert: Updates or inserts, handles duplicates
Data Quality Issues
Symptoms
- Data not appearing in Kudu tables
- Incorrect values in columns
- Missing or null values where data expected
Root Causes and Solutions
1. Expression Evaluation Failures
Problem: CEL/OTTL expressions not extracting data correctly.
Solution:
schema_mappings:
# Test expressions carefully
- column_name: user_id
column_type: string
# Ensure path exists in your data
expression: attributes["user"]["id"] # Nested access
- column_name: timestamp
column_type: int64
# Convert to appropriate type
expression: int(attributes["timestamp"])
2. Type Conversion Issues
Problem: Data types not converting properly.
Common Conversions:
# String to integer
expression: int(attributes["count"])
# String to boolean
expression: attributes["active"] == "true"
# Timestamp handling
expression: int(attributes["timestamp"] * 1000) # Convert to milliseconds
Debugging Techniques
1. Enable Debug Logging
To capture detailed logs for troubleshooting, configure the Edge Delta agent’s logging level:
For agent-wide debug logging:
# In your agent configuration
agent:
log_level: debug # Options: trace, debug, info, warn, error
Monitor agent logs for Kudu-specific messages:
# View agent logs (location varies by deployment)
tail -f /var/log/edgedelta/edgedelta.log | grep -i kudu
2. Test with Small Batches
Start with minimal configuration to isolate issues:
batch_config:
rows_limit: 10 # Small batch for testing
flush_interval: "5s" # Quick feedback
3. Monitor Kudu Metrics
Check Kudu master and tablet server metrics:
- Write latency
- Queue sizes
- Error rates
- Resource utilization
4. Use Kudu CLI Tools
Verify table operations independently:
# List tables
kudu table list <master_addresses>
# Scan table
kudu table scan <master_addresses> <table_name>
# Check table statistics
kudu table statistics <master_addresses> <table_name>
Best Practices
1. Schema Design
- Keep primary keys simple and efficient
- Use appropriate column types (avoid unnecessary precision)
- Consider partitioning strategy for large tables
2. Resource Planning
- Monitor Edge Delta agent resource usage
- Scale Kudu cluster based on workload
- Use appropriate instance types for Kudu nodes
3. Error Handling
- Implement proper retry logic
- Monitor failed writes
- Set up alerts for persistent failures
4. Testing Strategy
- Start with a test Kudu table
- Validate schema mappings with sample data
- Gradually increase load to production levels
- Monitor performance metrics throughout
Common Error Messages
| Error Message | Cause | Solution |
|---|---|---|
| “Unable to connect to leader master” | Network or configuration issue | Verify master addresses and connectivity |
| “Column ‘X’ not found in table schema” | Schema mismatch | Update schema_mappings to match table |
| “Invalid type for column ‘X’” | Type mismatch | Correct column_type in configuration |
| “Row too large” | Exceeds size limit | Increase row_size_limit or split data |
| “Timed out waiting for flush” | Slow writes | Increase timeout or optimize batch config |
| “Maximum number of attempts reached” | Persistent failures | Check Kudu cluster health and logs |
Getting Help
If issues persist after following this guide:
- Check Edge Delta agent logs for detailed error messages
- Review Kudu master and tablet server logs
- Contact Edge Delta support with:
- Configuration snippet
- Error messages
- Kudu cluster version and configuration
- Sample data structure