Resource-Aware Gateway Load Balancing

Distribute ingest traffic across Edge Delta gateway pods by current resource load using ORCA reports and an Envoy load balancer.

10 minute read

Overview

This guide walks through enabling Open Request Cost Aggregation (ORCA) based load balancing in front of the Edge Delta gateway pipeline. An Envoy proxy distributes ingest traffic across gateway pods by current resource load instead of in-flight request count, shifting work away from gateways that are CPU-bound, memory-pressured, or backed up on their ingest buffers.

The following diagram shows the routing path from a node pipeline agent through the Envoy load balancer to the gateway pods:

flowchart LR
    A[Node pipeline agent] -->|OTLP export| B[Service: edgedelta-gateway]
    B --> C[Envoy LB pods
CSWRR with
enable_oob_load_report=true]
    C --> D[Service: edgedelta-gateway-backends
headless, per-pod DNS]
    D --> E[Gateway pod 1
ORCA producer]
    D --> F[Gateway pod 2
ORCA producer]
    D --> G[Gateway pod N
ORCA producer]

When to use this

Use ORCA-aware load balancing when:

Your gateway pool serves heterogeneous ingest rates and the default LEAST_REQUEST policy produces uneven CPU and memory across pods.
You have transient hot pods that should drain naturally rather than via outlier ejection.
You want backpressure signals (ingest channel fill) to influence routing before connections start failing with 503s.

If the gateway pool’s load is uniform across replicas, this adds operational complexity for no benefit. Stick with LEAST_REQUEST.

How it works

The Edge Delta gateway is an ORCA producer: when enabled, each pod periodically emits an OrcaLoadReport over a dedicated gRPC service on the ingest port, containing CPU utilization, memory utilization, and an ingest-buffer-fullness signal.

An Envoy proxy in front of the gateway runs the client_side_weighted_round_robin (CSWRR) load balancing policy. CSWRR opens a long-lived out-of-band (OOB) stream to each gateway pod’s ORCA service, computes a weight per pod from the metrics named in the policy config, and routes incoming requests proportionally.

The Envoy load balancer hop is a separate Deployment, not the agent’s or gateway’s Istio sidecar. Istio’s default proxyv2 Envoy build does not include the envoy.load_balancing_policies.* extension family required by CSWRR. Using upstream envoyproxy/envoy for the load balancer hop avoids modifying the Istio control plane or rebuilding sidecar images.

Prerequisites

Edge Delta gateway running version v2.17.0 or newer (the version that ships the ORCA producer). Verify by checking the gateway pod’s image tag and the presence of the load_reporting config field in the agent’s configuration schema.
A way to run an additional Deployment in the cluster, typically the same namespace as the gateway.
Optional: an Istio service mesh (or any other mTLS layer) terminating traffic between agents, the load balancer hop, and the gateway pods. The load balancer proxy doesn’t require Istio, but mTLS through a mesh is a common deployment.

The rest of this guide assumes:

Gateway pods live in namespace edgedelta-gateway with label edgedelta/agent-type: processor and listen on TCP port 4319.
DaemonSet agents (one per node) live in namespace edgedelta. They are the gRPC clients that dial the gateway Service.

Adjust names and ports for your environment.

Step 1. Enable ORCA reporting on the gateway

On each ed_gateway_input source you want to report load, add a load_reporting block:

nodes:
  - name: my-gateway-input
    type: ed_gateway_input
    port: 4319
    protocol: grpc
    load_reporting:
      enabled: true
      # Optional. The gateway sampler reads CPU/memory/buffer signals at
      # this interval. Defaults to 1s.
      sample_interval: 1s
      # Optional. Minimum interval at which the gRPC OOB stream pushes
      # a LoadReport to subscribers. Defaults to 30s; the gRPC library
      # silently raises lower values to 30s.
      min_reporting_interval: 30s

load_reporting is off by default. Setting enabled: true is the only required field; the intervals are tuned for typical deployments. Enabling it also registers gRPC server reflection on the ingest port, so debugging tools (grpcurl, evans) can discover the ORCA service.

The following table lists the fields reported in each OrcaLoadReport:

OrcaLoadReport field	Source	Notes
`cpu_utilization`	Process cumulative CPU seconds / cgroup CPU quota	May exceed 1.0 if the workload is oversubscribed (per the xDS ORCA spec).
`mem_utilization`	Process RSS / cgroup available memory	Clamped to `[0, 1]`.
`named_metrics["buffer_fullness"]`	`max(len/cap)` across the gateway’s ingest channels	Application-specific signal; `[0, 1]`.
`rps_fractional`	Ingest request rate over the last `sample_interval` window	Per-second. CSWRR uses this in the weight formula `weight = rps / utilization` (approximate).

The HTTP variant of ed_gateway_input (with protocol: http) also stamps an endpoint-load-metrics response header on 202 and 204 responses in the same shape:

endpoint-load-metrics: TEXT cpu_utilization=0.45,mem_utilization=0.60,named_metrics.buffer_fullness=0.30,rps_fractional=120.5

Verify the producer is emitting

Port-forward to one gateway pod and call the ORCA service from your local grpcurl:

GATEWAY_POD=$(kubectl get pod -n edgedelta-gateway \
  -l edgedelta/agent-type=processor -o jsonpath='{.items[0].metadata.name}')

kubectl port-forward -n edgedelta-gateway $GATEWAY_POD 4319:4319 &

The gateway exposes gRPC server reflection while load_reporting is enabled, so grpcurl can discover the service:

grpcurl -plaintext localhost:4319 list
# Expect, among other services:
#   xds.service.orca.v3.OpenRcaService

Stream a load report:

grpcurl -plaintext \
  -d '{"report_interval":{"seconds":1}}' \
  localhost:4319 \
  xds.service.orca.v3.OpenRcaService/StreamCoreMetrics

A first LoadReport arrives immediately when the stream opens; further reports arrive every 30 seconds or so (gRPC’s hard floor on min_reporting_interval):

{
  "cpuUtilization": 0.07,
  "memUtilization": 0.12,
  "namedMetrics": { "buffer_fullness": 0 },
  "rpsFractional": 12.4
}

If cpuUtilization, memUtilization, and rpsFractional are present and non-zero, and namedMetrics contains buffer_fullness, the producer is working. rpsFractional corresponds to the proto’s rps_fractional field; CSWRR refuses to weight a backend whose report has rps_fractional == 0.

Step 2. Deploy a CSWRR Envoy in front of the gateway

The agents need a load balancer that consumes ORCA reports. Most Istio distributions don’t compile the envoy.load_balancing_policies.client_side_weighted_round_robin extension into their proxy image. Rather than rebuild Istio’s proxy, run a small upstream envoyproxy/envoy Deployment as the load balancer hop between agents and gateway pods.

The traffic path becomes:

flowchart LR
    A[Agent] --> B[Service: edgedelta-gateway]
    B --> C[Envoy LB pods]
    C --> D[Headless Service:
edgedelta-gateway-backends]
    D --> E[Gateway pods]

Two Services are needed:

edgedelta-gateway: fronts the Envoy load balancer pods. Agents already dial this name; only the selector changes.
edgedelta-gateway-backends: a headless Service (clusterIP: None) selecting the actual gateway pods. The Envoy load balancer resolves it via DNS and gets one A record per pod, which CSWRR weights individually.

Step 2a. Apply the Services

# Frontend Service the agent dials. Selects the CSWRR Envoy pods.
apiVersion: v1
kind: Service
metadata:
  name: edgedelta-gateway
  namespace: edgedelta-gateway
spec:
  ports:
    - name: grpc-gateway
      port: 4319
      protocol: TCP
      targetPort: 4319
  selector:
    app: cswrr-lb
---
# Headless backends. Envoy's STRICT_DNS cluster sees one A record per
# gateway pod, which CSWRR needs to assign per-pod weights.
apiVersion: v1
kind: Service
metadata:
  name: edgedelta-gateway-backends
  namespace: edgedelta-gateway
spec:
  clusterIP: None
  ports:
    - name: grpc-gateway
      port: 4319
      protocol: TCP
      targetPort: 4319
  selector:
    edgedelta/agent-type: processor

If your existing edgedelta-gateway Service already points directly at the gateway pods, change its selector to app: cswrr-lb and add the new headless backends Service alongside it.

Step 2b. Envoy bootstrap (ConfigMap)

apiVersion: v1
kind: ConfigMap
metadata:
  name: cswrr-lb-envoy-config
  namespace: edgedelta-gateway
data:
  envoy.yaml: |
    admin:
      address:
        socket_address: { address: 0.0.0.0, port_value: 9901 }

    static_resources:
      listeners:
        - name: gateway_listener
          address:
            socket_address: { address: 0.0.0.0, port_value: 4319 }
          filter_chains:
            - filters:
                - name: envoy.filters.network.http_connection_manager
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                    stat_prefix: ingress_grpc
                    codec_type: HTTP2
                    http2_protocol_options: {}
                    # Long-lived gRPC streams (including ORCA OOB):
                    # disable per-stream idle timeout.
                    stream_idle_timeout: 0s
                    route_config:
                      name: local_route
                      virtual_hosts:
                        - name: local
                          domains: ["*"]
                          routes:
                            - match: { prefix: "/" }
                              route:
                                cluster: gateway_backends
                                timeout: 0s
                    http_filters:
                      - name: envoy.filters.http.router
                        typed_config:
                          "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

      clusters:
        - name: gateway_backends
          type: STRICT_DNS
          connect_timeout: 2s
          # Upstream gateway speaks plaintext h2c gRPC.
          typed_extension_protocol_options:
            envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
              "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
              explicit_http_config:
                http2_protocol_options: {}
          lb_policy: LOAD_BALANCING_POLICY_CONFIG
          load_balancing_policy:
            policies:
              - typed_extension_config:
                  name: envoy.load_balancing_policies.client_side_weighted_round_robin
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.load_balancing_policies.client_side_weighted_round_robin.v3.ClientSideWeightedRoundRobin
                    enable_oob_load_report: true
                    # Which fields of each backend's OrcaLoadReport
                    # drive its weight. CSWRR weights by max() across
                    # the listed metrics; see the Tuning section.
                    metric_names_for_computing_utilization:
                      - cpu_utilization
                      - mem_utilization
                    weight_update_period: 1s
                    blackout_period: 60s
                    # proto3 JSON Duration requires the "s" suffix.
                    # Use 180s, not "3m", or Envoy rejects the bootstrap.
                    weight_expiration_period: 180s
                    error_utilization_penalty: 1.0
          load_assignment:
            cluster_name: gateway_backends
            endpoints:
              - lb_endpoints:
                  - endpoint:
                      address:
                        socket_address:
                          address: edgedelta-gateway-backends.edgedelta-gateway.svc.cluster.local
                          port_value: 4319

Step 2c. Envoy Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cswrr-lb
  namespace: edgedelta-gateway
  labels:
    app: cswrr-lb
spec:
  replicas: 2
  selector:
    matchLabels: { app: cswrr-lb }
  template:
    metadata:
      labels: { app: cswrr-lb }
    spec:
      containers:
        - name: envoy
          # Any upstream tag that includes CSWRR (v1.26 or newer) works.
          # Pin a specific minor in production.
          image: envoyproxy/envoy:v1.34.0
          args: ["-c", "/etc/envoy/envoy.yaml", "--log-level", "info"]
          ports:
            - name: grpc-gateway
              containerPort: 4319
            - name: admin
              containerPort: 9901
          resources:
            requests: { cpu: 200m, memory: 256Mi }
            limits:   { cpu: 1,    memory: 512Mi }
          readinessProbe:
            httpGet: { path: /ready, port: 9901 }
            initialDelaySeconds: 2
            periodSeconds: 5
          volumeMounts:
            - name: envoy-config
              mountPath: /etc/envoy
              readOnly: true
      volumes:
        - name: envoy-config
          configMap:
            name: cswrr-lb-envoy-config

Apply all three documents. After the pods become Ready, agents already dialing edgedelta-gateway.edgedelta-gateway.svc.cluster.local:4319 are routed through Envoy without any client-side change.

Step 3. Verify weights are flowing

Port-forward to one of the Envoy load balancer pods and inspect its admin endpoint:

LB_POD=$(kubectl get pod -n edgedelta-gateway -l app=cswrr-lb \
  -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n edgedelta-gateway $LB_POD 9901:9901 &

Check that the policy actually loaded (no CDS rejection):

curl -s 'localhost:9901/config_dump?resource=dynamic_active_clusters&name_regex=gateway_backends' \
  | jq '.configs[] | .cluster | {name, lb_policy, load_balancing_policy}'

You should see lb_policy: "LOAD_BALANCING_POLICY_CONFIG" and the CSWRR typed_config block. If the field is missing or shows LEAST_REQUEST (the cluster’s compiled-in default), the policy didn’t register and CDS rejected it. Check kubectl logs on the Envoy pod for the actual error.

Then look at per-endpoint weights:

curl -s 'localhost:9901/clusters?format=json' \
  | jq '.cluster_statuses[]
       | select(.name=="gateway_backends")
       | .host_statuses[]
       | {address: .address.socket_address.address, weight: .weight}'

Under uniform load, weights should be similar across all gateway pod IPs. Under asymmetric load (one pod doing significantly more work), its weight should drop.

Confirm traffic actually shifts under asymmetric load

Apply differential CPU pressure on one gateway pod by running stress-ng --cpu 2 (or any CPU burner) in a debug container alongside it, or scale a downstream output destination down to 0 replicas to make one pod’s ingest buffer fill.

Within 60 seconds of the load asymmetry (one blackout_period), expect:

The loaded pod’s weight in /clusters?format=json drops noticeably (often by a factor of 2 or more).
The upstream_rq_total counters per host diverge in the same direction:

curl -s 'localhost:9901/stats?filter=cluster.gateway_backends.*upstream_rq_total'

When the load asymmetry is removed, weights converge again within one weight_expiration_period.

Tuning

Which metrics to weight by

metric_names_for_computing_utilization decides which fields of each backend’s OrcaLoadReport contribute to its weight. CSWRR weights each backend by the max across the listed metrics.

Metric	When to include
`cpu_utilization`	Always. The most reliable saturation signal. Can exceed 1.0 when oversubscribed, which CSWRR uses to weight the pod down sharply.
`mem_utilization`	Recommended. Catches memory-pressured pods before they OOM. Clamped to `[0, 1]`.

If you want the load balancer to react only to genuine resource saturation, cpu_utilization and mem_utilization are sufficient.

Timing parameters

Parameter	Default	Purpose
`weight_update_period`	`1s`	How often Envoy recomputes per-endpoint weights. Match to the agent’s `sample_interval`.
`blackout_period`	`60s`	A backend must report for at least this long before its weight is trusted. Avoids early flapping.
`weight_expiration_period`	`180s`	If a backend goes silent for this long, fall back to default weight. Use seconds (`180s`, not `3m`).
`error_utilization_penalty`	`1.0`	Multiplier applied to the error rate (`eps/qps`) when computing a backend’s effective utilization.

The 60s blackout_period is chosen because the gRPC OOB stream is floored at 30 seconds. 60s covers two full reports comfortably.

Troubleshooting

Symptom	Solution
Envoy logs `didn't find a registered load balancer factory implementation`	The Envoy build in use doesn’t include the `envoy.load_balancing_policies.*` extension family. Verify you’re running upstream `envoyproxy/envoy:v1.26` or newer. Don’t use Istio’s `docker.io/istio/proxyv2` for the load balancer hop; that build typically strips those extensions.
Envoy logs `Unable to parse JSON as proto: ... weight_expiration_period`	`Duration` fields in the `typed_config` must be in seconds form (`180s`), not human-readable (`3m`). proto3 JSON Duration parsing is strict.
`LoadReport` shows `cpu_utilization` and `mem_utilization` as `-1` or missing	The agent’s `load_reporting.enabled` flag isn’t true on this gateway, or the source config didn’t reach the pod. Confirm the rendered config with `kubectl exec -n edgedelta-gateway $GATEWAY_POD -- cat /var/run/edgedelta/config.yaml`.
Reports flowing, but Envoy’s `/clusters` shows `weight: 1` for every host	Either fewer than `blackout_period` seconds have elapsed since Envoy first saw each backend (initial settle window), or all the metrics in `metric_names_for_computing_utilization` are 0 on every backend (the pool is genuinely uniform).