Troubleshoot the Edge Delta Agent with kubectl

Instructions for troubleshooting the Edge Delta Agent on Kubernetes using kubectl.

Overview

Troubleshooting the Edge Delta agent in a Kubernetes environment involves understanding how the agent operates within a Kubernetes pod. This document outlines steps to diagnose and resolve common issues with the Edge Delta agent in such a setup.

Note: When editing or troubleshooting, ensuring that your kubectl context is set to the correct cluster and namespace is crucial. Always verify the context with kubectl config current-context and kubectl config view before performing operations. Ensure that your usage of kubectl commands respects your Kubernetes environment’s specific context and configuration. Depending on your setup’s specifics, some parameters such as pod names, deployment names, namespaces, and image names/tags may need to be customized.

1. Check Agent Status

Determine if the Edge Delta agent pods in the DaemonSet are running correctly.

kubectl get daemonset -n edgedelta

When examining the output, check for the following:

  • DESIRED: The number of desired pods, which should match the number of nodes that the DaemonSet is configured to run on.
  • CURRENT: The number of current pods.
  • READY: The number of pods that are ready.
  • UP-TO-DATE: The number of pods that are running the updated version of the agent, if an update has been attempted.
  • AVAILABLE: The number of pods that are available to serve requests.

2. Inspect the Pod’s Details

If the kubectl get daemonset command indicates an issue, list all pods for the edgedelta namespace:

kubectl get pods -n edgedelta  

Look for pods that don’t have a status of Running or have a high number of restarts. Pods in other states, such as Pending, CrashLoopBackOff, or Error, can indicate configuration errors, scheduling issues, or runtime problems. Once you have identified a pod that seems problematic, get more detailed information about that specific pod:

kubectl describe pod <pod-name> -n edgedelta

Replace with the name of the pod you need to investigate. The output of this command provides a plethora of information:

  • Metadata: Includes labels, annotations, and other identifiers.
  • Spec: The desired state as defined in the pod’s manifest.
  • Status: Current status of the pod, including phase, conditions, and events which may point to issues during scheduling or running the pod.
  • Conditions: These include PodScheduled, ContainersReady, Initialized, and Ready. They provide insight into the lifecycle state of the pod.
  • Events: This is usually the most useful in diagnosing problems. Events describe actions taken by the Kubernetes system (like attempting to start a pod), as well as any issues it’s encountered. Error messages here can point you toward problems with volume mounts, image pull issues, resource shortages, and health checks failures.
  • Resource requests and limits to spot CPU and memory constraints.
  • Volumes and mount points for ensuring data persistence and access to required files.
  • Environment variables configured for the Edge Delta agent.

See Debug the Installation of Edge Delta Components for more detailed steps.

3. Restart the Agent

Consider performing a rolling update of the DaemonSet if a configuration change is needed for all agents at once.

kubectl rollout restart daemonset/edgedelta -n edgedelta

The rollout restart command is useful to restart all pods managed by the DaemonSet without changing any of the current specifications. This could be to pick up a refreshed ConfigMap or Secret, or to refresh the pod instances due to some intermittent issues that may have resolved.

A rolling update is the default update strategy for DaemonSets, which ensures zero downtime by updating pods one at a time, waiting for new pods to be Ready before updating the next one.

Note: A rolling restart will not apply any changes to the DaemonSet’s configuration. To apply changes to the configuration, you would have to make those changes in the DaemonSet manifest and then apply it using kubectl apply. This rolling restart command is specifically for when you need the pods to restart and pick up some changes that do not require a modification to the DaemonSet’s spec, such as external updates or changes to mounted ConfigMaps and Secrets.

4. Check for Cluster Events

Investigate events in the namespace for additional context.

kubectl get events -n edgedelta
  • Pod Scheduling Issues: Events that include FailedScheduling indicating that a pod cannot be scheduled, often due to insufficient resources or configured affinity/anti-affinity rules.
  • Pod Lifecycle Events: These events document the creation, starting, killing, and scheduling of pods, which may reveal transient or recurring problems.
  • Image Pull Problems: ErrImagePull or ImagePullBackOff events suggest that there is a problem with pulling the container image from the registry.
  • Resource Evictions: Evicted events signal that a pod has been terminated due to resource scarcity, often tied to the node’s available CPU or memory.
  • Node Issues: Events indicating that a node is in a NotReady state or other problems affecting node health that could impact pod scheduling and operation.
  • Volume Attachment Issues: FailedMount or FailedAttachVolume events are critical if the pod relies on persistent volumes, indicating issues with volume binding or access rights.
  • Probe Failures: Unhealthy events related to liveness and readiness probes reveal that a container may not be running as expected.
  • Excessive Restarts or Failures: Frequent Restarting events or BackOff statuses point to issues with pod stability.
  • ConfigMap or Secret Issues: Problems with accessing ConfigMaps or Secrets required by the Edge Delta agent may cause the pod to fail to start or operate correctly.
  • Warnings and Errors: Any event classified as a Warning or Error should be promptly investigated, as these typically denote significant issues that may impact service operations.

This step is essential to get a holistic view of what may be affecting the Edge Delta agent and could provide necessary clues when troubleshooting complex issues that span multiple Kubernetes resources. Pay attention to the timestamps of these events to help pinpoint when certain issues occurred and correlate them with other observed behaviors or log entries.

5. Update Agent Image

Roll out an updated version for the Edge Delta agent if needed. Follow the upgrade/deploy steps in the UI:

To view the deployment commands for an existing v3 pipeline configuration:

  1. Click Pipelines and select Pipelines.
  2. Click the kebab (⋮) icon and select Deploy Pipeline.
  3. Select the deployment method.

To view the deployment commands for an existing v2 pipeline configuration:

  1. Click Pipelines and select Legacy Pipelines.
  2. Click the kebab (⋮) icon in the Actions column for the agent and select Deploy/Upgrade.
  3. Select the deployment method.

The deployment commands for the agent using the selected method are listed.

6. Troubleshoot Resource Limits

See Scale Edge Delta Deployments.