Troubleshoot the Edge Delta Agent with kubectl

Troubleshooting Edge Delta using kubectl.

5 minute read

Overview

You can diagnose and resolve common issues with Edge Delta Fleets deployed using kubectl.

Note: When editing or troubleshooting, ensuring that your kubectl context is set to the correct cluster is crucial. Always verify the context with kubectl config current-context and kubectl config view before performing operations. Depending on your setup’s specifics, some parameters such as pod names, deployment names, namespaces, and image names/tags may need to be customized.

1. Check Agent Status

Determine if the Edge Delta pods are running correctly.

kubectl get pods -n edgedelta

When examining the output, check for the following:

DESIRED: The number of desired pods, which should match the number of nodes that the DaemonSet is configured to run on.
CURRENT: The number of current pods.
READY: The number of pods that are ready.
UP-TO-DATE: The number of pods that are running the updated version of the agent, if an update has been attempted.
AVAILABLE: The number of pods that are available to serve requests.

2. Inspect the Pod’s Details

If the kubectl get daemonset command indicates an issue, list all pods for the edgedelta namespace:

kubectl get pods -n edgedelta

Look for pods that don’t have a status of Running or have a high number of restarts. Pods in other states, such as Pending, CrashLoopBackOff, or Error, can indicate configuration errors, scheduling issues, or runtime problems. Once you have identified a pod that seems problematic, get more detailed information about that specific pod:

kubectl describe pod <pod-name> -n edgedelta

Replace <pod-name> with the name of the pod you need to investigate. The output of this command provides a plethora of information:

Metadata: Includes labels, annotations, and other identifiers.
Spec: The desired state as defined in the pod’s manifest.
Status: Current status of the pod, including phase, conditions, and events which may point to issues during scheduling or running the pod.
Conditions: These include PodScheduled, ContainersReady, Initialized, and Ready. They provide insight into the lifecycle state of the pod.
Events: This is usually the most useful in diagnosing problems. Events describe actions taken by the Kubernetes system (like attempting to start a pod), as well as any issues it’s encountered. Error messages here can point you toward problems with volume mounts, image pull issues, resource shortages, and health checks failures.
Resource requests and limits to spot CPU and memory constraints.
Volumes and mount points for ensuring data persistence and access to required files.
Environment variables configured for the Edge Delta agent.

See Debug the Installation of Edge Delta Components for more detailed steps.

3. Restart the Fleet

Consider performing a rolling update of the DaemonSet if a configuration change is needed for all agents at once.

kubectl rollout restart daemonset/edgedelta -n edgedelta

The rollout restart command is useful to restart all pods managed by the DaemonSet without changing any of the current specifications. This could be to pick up a refreshed ConfigMap or Secret, or to refresh the pod instances due to some intermittent issues that may have resolved.

A rolling update is the default update strategy for DaemonSets, which ensures zero downtime by updating pods one at a time, waiting for new pods to be Ready before updating the next one.

Note: A rolling restart will not apply any changes to the DaemonSet’s configuration. To apply changes to the configuration, you would have to make those changes in the DaemonSet manifest and then apply it using kubectl apply. This rolling restart command is specifically for when you need the pods to restart and pick up some changes that do not require a modification to the DaemonSet’s spec, such as external updates or changes to mounted ConfigMaps and Secrets.

4. Check for Cluster Events

Investigate events in the namespace for additional context.

kubectl get events -n edgedelta

Pod Scheduling Issues: Events that include FailedScheduling indicating that a pod cannot be scheduled, often due to insufficient resources or configured affinity/anti-affinity rules.
Pod Lifecycle Events: These events document the creation, starting, killing, and scheduling of pods, which may reveal transient or recurring problems.
Image Pull Problems: ErrImagePull or ImagePullBackOff events suggest that there is a problem with pulling the container image from the registry.
Resource Evictions: Evicted events signal that a pod has been terminated due to resource scarcity, often tied to the node’s available CPU or memory.
Node Issues: Events indicating that a node is in a NotReady state or other problems affecting node health that could impact pod scheduling and operation.
Volume Attachment Issues: FailedMount or FailedAttachVolume events are critical if the pod relies on persistent volumes, indicating issues with volume binding or access rights.
Probe Failures: Unhealthy events related to liveness and readiness probes reveal that a container may not be running as expected.
Excessive Restarts or Failures: Frequent Restarting events or BackOff statuses point to issues with pod stability.
ConfigMap or Secret Issues: Problems with accessing ConfigMaps or Secrets required by the Edge Delta agent may cause the pod to fail to start or operate correctly.
Warnings and Errors: Any event classified as a Warning or Error should be promptly investigated, as these typically denote significant issues that may impact service operations.

This step is essential to get a holistic view of what may be affecting the Edge Delta agent and could provide necessary clues when troubleshooting complex issues that span multiple Kubernetes resources. Pay attention to the timestamps of these events to help pinpoint when certain issues occurred and correlate them with other observed behaviors or log entries.

5. Update Fleet Image

Roll out an updated version for the Edge Delta Fleet if needed. Follow the upgrade/deploy steps in the UI:

To view the deployment commands for an existing v3 pipeline configuration:

Click Pipelines and select the Fleet.
Click Add Agents.
Select the deployment type.

To view the deployment commands for an existing v2 pipeline configuration:

Click Pipelines and select Legacy Fleets.
Click the kebab (⋮) icon in the Actions column for the agent and select Add Agents.
Select the deployment method.

6. Troubleshoot Resource Limits

See Scale Edge Delta Deployments.