Ingest Data from a File

Learn how to ingest logs from files using the File Source Node in Edge Delta, including setup with Helm and pipeline configuration.

Overview

The Edge Delta agent begins to ingest logs from stdout out of the box. However, you can specify alternative or additional inputs. The File source node captures log input from specific files. It does this by tailing the configured files and capturing the changes made to them. The changes are sent as logs into the pipeline as they arrive in the file. The file source node is useful when dealing with system logs that are written to flat files on disk. It also serves as a valuable tool for testing and troubleshooting purposes.

AI Team: Configure this source using the File connector for streamlined setup in AI Team.

System logs are commonly written to flat files on disk. If the logs are not already being sent over the network, it is simpler to ingest them from the file than set up a new process to send them over TCP or HTTP.

Prerequisites

To use a file source node you need the following prerequisites:

  • An Edge Delta account with a pipeline configuration already created. This is the configuration in which you will create the File source node.
  • An in-cluster volume where the logs are being saved, and the cluster might need to be configured to mount external files depending on your architecture.

1. Create a Helm Override File

In this step you create a helm override values file to enable Edge Delta to access the in-cluster file locations.

  1. Create a YAML file and name it appropriately, such as custom-values.yaml.
  2. Add the following YAML to the file:
volumeProps:
  volumeMounts:
    - name: input-file
      mountPath: /mnt/inputfile/logs
      readOnly: true
  volumes:
    - name: input-file
      hostPath:
        path: /mnt/inputfile/logs
        type: DirectoryOrCreate

2. Install the Edge Delta Pipeline

Run the Helm deployment commands for the configuration:

  1. In the Edge Delta App, click Pipelines.
  2. Click the pipeline you want to deploy and select Add Agents.
  3. Select Helm.
  4. Copy and run the first command provided to create the edgedelta namespace in the cluster.
helm repo add edgedelta https://helm.edgedelta.com
  1. Copy and run the second command provided.
helm repo update
  1. Copy the third command provided in the UI, and append the override values file. The command will look similar to this but with a different Pipeline ID, note the -f custom-values.yaml:
helm upgrade edgedelta edgedelta/edgedelta -i --version v0.1.92 --set secretApiKey.value=123456789987654321 -n edgedelta --create-namespace -f custom-values.yaml

In this example the custom YAML file was called custom-values.yaml, replace it with the name of the file you created in step 1.

3. Create the File Source Node

  1. In the Edge Delta App, click Pipeline.
  2. Select the pipeline you want to add the file input to.
  3. Click Edit Mode
  4. Click Add Node, expand Collect and select File Source.
  5. Specify a Name for the file source node.
  6. Specify the path containing the files you want to tail for logs. The path should match the one you specified in the Edge Delta custom manifest. The file name can use wildcards to tail a number of files.
  7. Optionally, specify a filename pattern for files in that folder that should not be tailed for logs.
  8. Click Okay.
  9. Connect the file input to the first processor in the pipeline.

In this example, the files you want to tail for logs is captured with the following pattern: /mnt/inputfile/logs/*.*.

  1. Click Review Changes.
  2. Click Save Changes.

The Edge Delta pipeline will now ingest logs saved to any file in the /mnt/inputfile/logs location. By default, it creates one log per line.

Optional Parameters

rate_limit

The rate_limit parameter enables you to control data ingestion based on system resource usage. This advanced setting helps prevent source nodes from overwhelming the agent by automatically throttling or stopping data collection when CPU or memory thresholds are exceeded.

Use rate limiting to prevent runaway log collection from overwhelming the agent in high-volume sources, protect agent stability in resource-constrained environments with limited CPU/memory, automatically throttle during bursty traffic patterns, and ensure fair resource allocation across source nodes in multi-tenant deployments.

When rate limiting triggers, pull-based sources (File, S3, HTTP Pull) stop fetching new data, push-based sources (HTTP, TCP, UDP, OTLP) reject incoming data, and stream-based sources (Kafka, Pub/Sub) pause consumption. Rate limiting operates at the source node level, where each source with rate limiting enabled independently monitors and enforces its own thresholds.

Configuration Steps:

  1. Click Add New in the Rate Limit section
  2. Click Add New for Evaluation Policy
  3. Select Policy Type:
  • CPU Usage: Monitors CPU consumption and rate limits when usage exceeds defined thresholds. Use for CPU-intensive sources like file parsing or complex transformations.
  • Memory Usage: Monitors memory consumption and rate limits when usage exceeds defined thresholds. Use for memory-intensive sources like large message buffers or caching.
  • AND (composite): Combines multiple sub-policies with AND logic. All sub-policies must be true simultaneously to trigger rate limiting. Use when you want conservative rate limiting (both CPU and memory must be high).
  • OR (composite): Combines multiple sub-policies with OR logic. Any sub-policy can trigger rate limiting. Use when you want aggressive rate limiting (either CPU or memory being high triggers).
  1. Select Evaluation Mode. Choose how the policy behaves when thresholds are exceeded:
  • Enforce (default): Actively applies rate limiting when thresholds are met. Pull-based sources (File, S3, HTTP Pull) stop fetching new data, push-based sources (HTTP, TCP, UDP, OTLP) reject incoming data, and stream-based sources (Kafka, Pub/Sub) pause consumption. Use in production to protect agent resources.
  • Monitor: Logs when rate limiting would occur without actually limiting data flow. Use for testing thresholds before enforcing them in production.
  • Passthrough: Disables rate limiting entirely while keeping the configuration in place. Use to temporarily disable rate limiting without removing configuration.
  1. Set Absolute Limits and Relative Limits (for CPU Usage and Memory Usage policies)

Note: If you specify both absolute and relative limits, the system evaluates both conditions and rate limiting triggers when either condition is met (OR logic). For example, if you set absolute limit to 1.0 CPU cores and relative limit to 50%, rate limiting triggers when the source uses either 1 full core OR 50% of available CPU, whichever happens first.

  • For CPU Absolute Limits: Enter value in full core units:

    • 0.1 = one-tenth of a CPU core
    • 0.5 = half a CPU core
    • 1.0 = one full CPU core
    • 2.0 = two full CPU cores
  • For CPU Relative Limits: Enter percentage of total available CPU (0-100):

    • 50 = 50% of available CPU
    • 75 = 75% of available CPU
    • 85 = 85% of available CPU
  • For Memory Absolute Limits: Enter value in bytes

    • 104857600 = 100Mi (100 × 1024 × 1024)
    • 536870912 = 512Mi (512 × 1024 × 1024)
    • 1073741824 = 1Gi (1 × 1024 × 1024 × 1024)
  • For Memory Relative Limits: Enter percentage of total available memory (0-100)

    • 60 = 60% of available memory
    • 75 = 75% of available memory
    • 80 = 80% of available memory
  1. Set Refresh Interval (for CPU Usage and Memory Usage policies). Specify how frequently the system checks resource usage:
  • Recommended Values:
    • 10s to 30s for most use cases
    • 5s to 10s for high-volume sources requiring quick response
    • 1m or higher for stable, low-volume sources

The system fetches current CPU/memory usage at the specified refresh interval and uses that value for evaluation until the next refresh. Shorter intervals provide more responsive rate limiting but incur slightly higher overhead, while longer intervals are more efficient but slower to react to sudden resource spikes.

The GUI generates YAML as follows:

# Simple CPU-based rate limiting
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: cpu_usage
        evaluation_mode: enforce
        absolute_limit: 0.5  # Limit to half a CPU core
        refresh_interval: 10s
# Simple memory-based rate limiting
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: memory_usage
        evaluation_mode: enforce
        absolute_limit: 536870912  # 512Mi in bytes
        refresh_interval: 30s

Composite Policies (AND / OR)

When using AND or OR policy types, you define sub-policies instead of limits. Sub-policies must be siblings (at the same level)—do not nest sub-policies within other sub-policies. Each sub-policy is independently evaluated, and the parent policy’s evaluation mode applies to the composite result.

  • AND Logic: All sub-policies must evaluate to true at the same time to trigger rate limiting. Use when you want conservative rate limiting (limit only when CPU AND memory are both high).
  • OR Logic: Any sub-policy evaluating to true triggers rate limiting. Use when you want aggressive protection (limit when either CPU OR memory is high).

Configuration Steps:

  1. Select AND (composite) or OR (composite) as the Policy Type
  2. Choose the Evaluation Mode (typically Enforce)
  3. Click Add New under Sub-Policies to add the first condition
  4. Configure the first sub-policy by selecting policy type (CPU Usage or Memory Usage), selecting evaluation mode, setting absolute and/or relative limits, and setting refresh interval
  5. In the parent policy (not within the child), click Add New again to add a sibling sub-policy
  6. Configure additional sub-policies following the same pattern

The GUI generates YAML as follows:

# AND composite policy - both CPU AND memory must exceed limits
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: and
        evaluation_mode: enforce
        sub_policies:
          # First sub-policy (sibling)
          - policy_type: cpu_usage
            evaluation_mode: enforce
            absolute_limit: 0.75  # Limit to 75% of one core
            refresh_interval: 15s
          # Second sub-policy (sibling)
          - policy_type: memory_usage
            evaluation_mode: enforce
            absolute_limit: 1073741824  # 1Gi in bytes
            refresh_interval: 15s
# OR composite policy - either CPU OR memory can trigger
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: or
        evaluation_mode: enforce
        sub_policies:
          - policy_type: cpu_usage
            evaluation_mode: enforce
            relative_limit: 85  # 85% of available CPU
            refresh_interval: 20s
          - policy_type: memory_usage
            evaluation_mode: enforce
            relative_limit: 80  # 80% of available memory
            refresh_interval: 20s
# Monitor mode for testing thresholds
nodes:
  - name: <node name>
    type: <node type>
    rate_limit:
      evaluation_policy:
        policy_type: memory_usage
        evaluation_mode: monitor  # Only logs, doesn't limit
        relative_limit: 70  # Test at 70% before enforcing
        refresh_interval: 30s