Troubleshooting Splunk Integrations

Comprehensive troubleshooting guide for all Edge Delta Splunk integrations including TCP, HEC, and source nodes.

Overview

This guide provides comprehensive troubleshooting for all Edge Delta Splunk integrations:

Quick Diagnostic Checklist

Before diving into specific issues, verify:

  • Correct node type for your use case (source vs destination)
  • Network connectivity between Edge Delta and Splunk
  • Proper authentication (tokens for HEC, certificates for TCP)
  • Firewall rules allowing required ports
  • Splunk service status and configuration

Connection Issues

Splunk TCP Destination (S2S)

Symptoms

  • “Connection refused” errors on port 9997
  • “Unable to establish TCP connection”
  • Intermittent connection drops

Diagnosis and Solutions

1. Test Network Connectivity

Test the connection to your Splunk indexer using standard networking tools:

telnet splunk-indexer.example.com 9997
nc -zv splunk-indexer.example.com 9997

2. Verify Splunk Configuration

On the Splunk indexer, check the inputs.conf file to ensure the TCP input is properly configured:

cat $SPLUNK_HOME/etc/system/local/inputs.conf | grep -A 5 "splunktcp"

The expected configuration should show:

[splunktcp://9997]
disabled = 0

3. Check TLS Configuration

Ensure your Edge Delta TLS configuration is correct. Verify that certificate files exist, have proper permissions, and are not expired:

nodes:
- name: splunk_tcp
  type: splunk_tcp_output
  host: splunk.example.com
  port: 9997
  tls:
    enabled: true
    ca_file: /path/to/ca.pem
    crt_file: /path/to/client.pem
    key_file: /path/to/client.key

4. Validate Certificate

Check certificate expiration dates:

openssl x509 -in /path/to/client.pem -noout -dates

Verify the certificate chain is valid:

openssl verify -CAfile /path/to/ca.pem /path/to/client.pem

Splunk HEC Output

Symptoms

  • HTTP 400/401/403 errors
  • “Invalid token” messages
  • SSL/TLS handshake failures

Diagnosis and Solutions

1. Test HEC Endpoint

Test HEC connectivity by checking the health endpoint (replace with your values):

curl -k https://splunk-hec.example.com:8088/services/collector/health

Test data submission with your HEC token:

curl -k -H "Authorization: Splunk YOUR-TOKEN" \
  https://splunk-hec.example.com:8088/services/collector \
  -d '{"event": "test"}'

2. Verify Token Configuration

Ensure your Edge Delta configuration has a valid HEC token:

nodes:
- name: splunk_hec
  type: splunk_output
  hec_uri: https://splunk-hec.example.com:8088/services/collector
  token: your-hec-token

3. Check Splunk HEC Settings

  • Navigate to Settings → Data Inputs → HTTP Event Collector
  • Verify token is enabled and not expired
  • Check source type and index settings
  • Ensure “Enable SSL” matches your configuration

Splunk TCP Source (Receiving from Forwarders)

Symptoms

  • Edge Delta not receiving data from Universal Forwarders
  • “Bind: address already in use” errors
  • Authentication failures

Diagnosis and Solutions

1. Verify Port Availability

Check if the required port is available and not in use by another service:

sudo netstat -tulpn | grep 9997
sudo lsof -i :9997

2. Configure Universal Forwarder Output

On the Universal Forwarder, check the outputs.conf configuration:

cat $SPLUNK_HOME/etc/system/local/outputs.conf

The configuration should point to your Edge Delta agent:

[tcpout:edge_delta]
server = edge-delta-agent.example.com:9997

3. Edge Delta Configuration

Configure Edge Delta to listen on all interfaces for incoming Splunk forwarder connections:

nodes:
- name: splunk_tcp_input
  type: splunk_tcp_input
  listen: 0.0.0.0
  port: 9997
  read_timeout: 1m

Data Flow Issues

Data Not Appearing in Splunk

Common Causes and Solutions

1. Index Configuration

Verify the index exists in Splunk:

index=your_index | head 1

Check index permissions:

| rest /services/data/indexes | table title

2. Source Type Mismatch

Ensure the source type in your Edge Delta configuration matches Splunk’s expectations:

nodes:
- name: splunk_output
  type: splunk_output
  index: main
  source_type: _json

3. Time Zone and Timestamp Issues

  • Verify timestamp format matches Splunk’s expectations
  • Check time zone settings on both Edge Delta and Splunk
  • Use timestamp extraction if needed

4. Data Format Problems

For JSON data, ensure proper formatting. Edge Delta will automatically format the data for Splunk:

nodes:
- name: splunk_tcp
  type: splunk_tcp_output
  host: splunk.example.com
  port: 9997
  index: json_index

Partial Data or Missing Fields

1. Field Extraction Issues

  • Check Splunk props.conf for field extraction rules
  • Verify JSON structure if using structured data
  • Test with sample data in Splunk’s search interface

2. Data Truncation

Increase buffer sizes if you’re experiencing truncation with large events:

nodes:
- name: splunk_tcp
  type: splunk_tcp_output
  buffer_max_bytesize: "500MB"

Performance Optimization

Slow Data Transmission

TCP Destination Optimization

Optimize TCP destination performance by using a load balancer, increasing worker count, extending retry periods, and enlarging buffers:

nodes:
- name: high_performance_splunk
  type: splunk_tcp_output
  host: splunk-lb.example.com
  port: 9997
  parallel_worker_count: 20
  buffer_ttl: "1h"
  buffer_max_bytesize: "1GB"

HEC Output Optimization

Optimize HEC output by batching events and adjusting timeouts based on your network conditions:

nodes:
- name: optimized_hec
  type: splunk_output
  hec_uri: https://splunk-hec.example.com:8088/services/collector
  token: your-token
  parallel_worker_count: 15
  batch_size: 1000
  timeout: "30s"

Resource Usage Issues

1. Monitor Edge Delta Agent Resources

Check CPU and memory usage of the Edge Delta agent:

top -p $(pgrep edgedelta)
ps aux | grep edgedelta

2. Adjust Worker Counts

Balance worker counts with available resources. Start conservative and increase gradually based on performance metrics:

parallel_worker_count: 10

3. Implement Buffering Strategy

buffer_path: "/var/log/edgedelta/buffer"
buffer_max_bytesize: "500MB"
buffer_ttl: "30m"

Migration Issues

Universal Forwarder to Edge Delta

Common Migration Problems

1. Data Duplication During Transition

  • Use different indexes during migration
  • Implement phased rollout
  • Monitor for duplicate events

2. Configuration Differences

Map Universal Forwarder settings to Edge Delta configuration. The UF outputs.conf translates to Edge Delta’s splunk_tcp_output:

nodes:
- name: uf_replacement
  type: splunk_tcp_output
  host: same-as-uf-target.com
  port: 9997
  index: same_index

3. Authentication Migration

  • Convert from Splunk certificates to Edge Delta TLS config
  • Update firewall rules for new source IPs
  • Test authentication before full migration

Error Messages Reference

Error Message Likely Cause Solution
“Connection refused” Service not running or wrong port Verify Splunk service status and port configuration
“Invalid HEC token” Token expired or incorrect Regenerate token in Splunk HEC settings
“SSL certificate problem” Certificate mismatch or expired Update certificates, check CA chain
“Index not found” Index doesn’t exist or no permissions Create index or adjust permissions
“Timeout waiting for response” Network latency or Splunk overloaded Increase timeout, check Splunk performance
“Address already in use” Port conflict for input nodes Change port or stop conflicting service
“Authentication failed” Wrong credentials or method Verify authentication configuration
“Buffer overflow” Data volume exceeds buffer Increase buffer_max_bytesize

Advanced Debugging

Enable Debug Logging

1. Edge Delta Agent Debug Mode

agent:
  log_level: debug

2. Monitor Logs

Watch Edge Delta logs for Splunk-related messages:

tail -f /var/log/edgedelta/edgedelta.log | grep -i splunk

Filter logs for errors and failures:

grep -i "error\|fail\|refuse" /var/log/edgedelta/edgedelta.log

Packet Capture for Network Issues

Capture network traffic to Splunk for detailed analysis:

sudo tcpdump -i any -w splunk_traffic.pcap host splunk.example.com and port 9997

Analyze the captured traffic with Wireshark or tcpdump:

tcpdump -r splunk_traffic.pcap -nn

Test Data Flow

Create a test pipeline to verify data flow to Splunk:

nodes:
- name: test_generator
  type: memory_input

- name: test_splunk
  type: splunk_tcp_output
  host: splunk.example.com
  port: 9997
  index: test_index

Best Practices

Connection Management

  1. Use connection pooling appropriately
  2. Implement retry logic with backoff
  3. Monitor connection health metrics
  4. Use load balancers for high availability

Data Integrity

  1. Enable buffering for reliability
  2. Monitor for data loss indicators
  3. Implement checksums if needed
  4. Validate data in Splunk regularly

Security

  1. Always use TLS/SSL in production
  2. Rotate tokens and certificates regularly
  3. Restrict network access with firewalls
  4. Audit authentication logs

Performance

  1. Start with conservative worker counts
  2. Monitor and adjust based on metrics
  3. Use appropriate batch sizes
  4. Consider data volume and velocity

Getting Help

If issues persist:

  1. Collect Diagnostic Information:

    • Edge Delta configuration (sanitized)
    • Error messages from logs
    • Splunk version and configuration
    • Network topology diagram
  2. Check Splunk Logs:

    Query Splunk’s internal logs for errors:

    index=_internal source=*splunkd.log* ERROR
    
  3. Contact Support with:

    • Diagnostic information
    • Steps to reproduce
    • Expected vs actual behavior
    • Any workarounds attempted