Common Troubleshooting for Pod

When deploying applications in Kubernetes, we often encounter abnormal situations such as Pods being in the Pending state for a long time, or restarting repeatedly, below we introduce various abnormal states and troubleshooting thoughts.

1. Common Errors

Status	Status Explanation	Troubleshooting
Error	An error occurs in the Pod startup process.	Usually caused by incorrect configuration of container startup commands and parameters, please contact the image maker
NodeLost	The node where the Pod is located is lost.	Check the status of the node where the Pod is located
Unkown	The node where the Pod is located is lost or other unknown exceptions.	Check the status of the node where the Pod is located
Pending	The Pod is waiting to be scheduled.	Caused by insufficient resources, etc., view Pod events with the `kubectl describe` command
Terminating	The Pod is being destroyed.	Can be forcibly deleted by adding `--fore` parameter
CrashLoopBackOff	The container exited, and the Kubelet is restarting it.	Usually caused by incorrect configuration of container startup commands and parameters
ErrImageNeverPull	The strategy forbids pulling images.	Pulling image failed, verify if the imagePullSecrets are correct
ImagePullBackOff	Retrying pull.	Issues with network connectivity between the image repository and the cluster
RegistryUnavailable	Unable to connect to the image repository.	Contact the repository administrator
ErrImagePull	Failed to pull image.	Contact the repository administrator, or verify if the image name is correct
RunContainerError	Failed to start the container.	Exception in container parameter configuration
PostStartHookError	postStart hook command error.	postStart command is incorrect
NetworkPluginNotReady	Network plugin not fully started.	CNI plugin exception, you can check the CNI status

2. Common Commands

When we find that the Pod is in the above state, we can use the following commands to quickly locate the problem:

Get Pod status


kubectl -n ${NAMESPACE} get pod  -o wide

View the Pod’s yaml configuration


kubectl -n ${NAMESPACE} get pod ${POD_NAME}  -o yaml

View Pod events


kubectl  -n ${NAMESPACE} describe pod ${POD_NAME}

View Pod logs


kubectl  -n ${NAMESPACE} logs ${POD_NAME} ${CONTAINER_NAME}


kubectl -n ${NAMESPACE} exec -it  ${POD_NAME} /bin/bash

3. Does UK8S limit the containers deployed on the Node? How to modify?

UK8S, in order to ensure the stable operation of Pods in a production environment, limits the number of Pods on each Node to 110. Users can log in to the Node node and modify maxpods:110 in “/etc/kubernetes/kubelet.conf”, and then execute systemctl restart kubelet to restart kubelet.

4. Why did my container exit as soon as it started?

View container logs to troubleshoot the cause of abnormal restart
Whether the pod has correctly set the startup command, the startup command can be specified when making the image, or it can be specified in the pod configuration
The startup command must stay in the foreground, otherwise k8s will think the pod has ended, and restart the pod.

5. How to adjust Docker’s log level

Modify the /etc/docker/daemon.json file, add a line of configuration “debug”: true
Reload Docker configuration with systemctl reload docker and view the logs
If you no longer need to view detailed logs, delete the debug configuration and reload docker again

6. Why is the node abnormal, but the Pod is still in the Running state

This is caused by k8s’s status protection, which is prone to occur when there are few nodes or many abnormal nodes
You can view the documentation at https://kubernetes.io/zh/docs/concepts/architecture/nodes/#reliability

7. What if the node is down and the Pod is stuck in Terminating

After the node has been down for a certain period of time (usually 5 minutes), k8s will try to evict the pod, causing the pod to become Terminating
Since kubelet can’t perform series of operations to delete the pod at this time, the pod will be stuck in Terminating
For pods of type daemonset, it is scheduled on each node by default, so you don’t need to consider this type of pod when the pod is down, and k8s will not evict this type of pod by default
For pods of type depolyment and replicaset, when the pod is stuck in termanting, the controller will automatically pull up an equivalent number of pods
For pods of type statefulset, when the pod is stuck in termanting, because the pod name under statefulset is fixed, the new pod will not be pulled up until the previous pod is completely deleted.
For pods using udisk-pvc, since the pvc cannot be unloaded, it will cause the newly started pod to fail to run, please follow the content in this article related to pvc(#how to check the actual mount situation of the udisk corresponding to the pvc), to verify the relevant relationships

8. What to do if the Pod exits abnormally?

kubectl describe pods pod-name –n ns to view the related events and statuses of each container of the pod, whether it is the pod’s own exit, whether it is killed by oom, or is it evicted
If it is the pod’s own exit, kubectl logs pod-name -p -n ns to view the container’s exit logs and investigate the cause
If it is killed by oom, it is recommended to readjust the pod’s request and limit settings (the two should not differ too much) based on business needs, or to check whether there is memory leak
If the pod is evicted, it means the node is under too much pressure, you need to check which pod is using up too many resources, and adjust the request and limit settings
For exits not caused by the pod itself, you need to execute dmesg to view system logs and journalctl -u kubelet to view kubelet related logs.

9. Why the direct container network started by Docker on K8S node is not working

UK8S uses Surfercloud‘s own CNI plugin, and containers started directly with Docker cannot use this plugin, so the network is disconnected.
If you need to run tasks for a long time, it is not recommended to start containers directly with Docker on the UK8S node, you should use pods
If it is just a temporary test, you can add the --network host parameter to start the container in the hostnetwork mode

10. Timezone Issues of Pod

The containers running in the Kubernetes cluster use Greenwich Mean Time (GMT) by default, not the local time of the host. If you need the container time to be consistent with the host time, you can use the “hostPath” method to mount the timezone file on the host to the container.

Most Linux distributions configure the timezone through the “/etc/localtime” file, we can get the timezone information through the following command:


# ls -l /etc/localtime
lrwxrwxrwx. 1 root root 32 Oct 15  2015 /etc/localtime -> ../usr/share/zoneinfo/Asia/Shanghai

From the above information, we know that the timezone of the host is Asia/Shanghai, and a Pod yaml example is shown below, which explains how to change the timezone configuration in the container to Asia/Shanghai, so that it is consistent with the host.


apiVersion: app/v1
kind: Pod
metadata:
 name: nginx
 labels:
   name: nginx
spec:
    containers:
    - name: nginx
      image: nginx
      imagePullPolicy: "IfNotPresent"
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
      ports:
         - containerPort: 80
      volumeMounts:
      - name: timezone-config
        mountPath: /etc/localtime
    volumes:
      - name: timezone-config
        hostPath:
           path: /usr/share/zoneinfo/Asia/Shanghai

If the container has been created before, you only need to add the volumeMounts and volumes parameters to the yaml file, and then update it using the kubectl apply command.