Common Troubleshooting for Pod
When deploying applications in Kubernetes, we often encounter abnormal situations such as Pods being in the Pending state for a long time, or restarting repeatedly, below we introduce various abnormal states and troubleshooting thoughts.
1. Common Errors
Status | Status Explanation | Troubleshooting |
---|---|---|
Error | An error occurs in the Pod startup process. | Usually caused by incorrect configuration of container startup commands and parameters, please contact the image maker |
NodeLost | The node where the Pod is located is lost. | Check the status of the node where the Pod is located |
Unkown | The node where the Pod is located is lost or other unknown exceptions. | Check the status of the node where the Pod is located |
Pending | The Pod is waiting to be scheduled. | Caused by insufficient resources, etc., view Pod events with the kubectl describe command |
Terminating | The Pod is being destroyed. | Can be forcibly deleted by adding --fore parameter |
CrashLoopBackOff | The container exited, and the Kubelet is restarting it. | Usually caused by incorrect configuration of container startup commands and parameters |
ErrImageNeverPull | The strategy forbids pulling images. | Pulling image failed, verify if the imagePullSecrets are correct |
ImagePullBackOff | Retrying pull. | Issues with network connectivity between the image repository and the cluster |
RegistryUnavailable | Unable to connect to the image repository. | Contact the repository administrator |
ErrImagePull | Failed to pull image. | Contact the repository administrator, or verify if the image name is correct |
RunContainerError | Failed to start the container. | Exception in container parameter configuration |
PostStartHookError | postStart hook command error. | postStart command is incorrect |
NetworkPluginNotReady | Network plugin not fully started. | CNI plugin exception, you can check the CNI status |
2. Common Commands
When we find that the Pod is in the above state, we can use the following commands to quickly locate the problem:
- Get Pod status
kubectl -n ${NAMESPACE} get pod -o wide
- View the Pod's yaml configuration
kubectl -n ${NAMESPACE} get pod ${POD_NAME} -o yaml
- View Pod events
kubectl -n ${NAMESPACE} describe pod ${POD_NAME}
- View Pod logs
kubectl -n ${NAMESPACE} logs ${POD_NAME} ${CONTAINER_NAME}
- Login to Pod
kubectl -n ${NAMESPACE} exec -it ${POD_NAME} /bin/bash
3. Does UK8S limit the containers deployed on the Node? How to modify?
UK8S, in order to ensure the stable operation of Pods in a production environment, limits the number of Pods on each Node to 110. Users can log in to the Node node and modify maxpods:110
in "/etc/kubernetes/kubelet.conf", and then execute systemctl restart kubelet
to restart kubelet.
4. Why did my container exit as soon as it started?
- View container logs to troubleshoot the cause of abnormal restart
- Whether the pod has correctly set the startup command, the startup command can be specified when making the image, or it can be specified in the pod configuration
- The startup command must stay in the foreground, otherwise k8s will think the pod has ended, and restart the pod.
5. How to adjust Docker's log level
- Modify the /etc/docker/daemon.json file, add a line of configuration "debug": true
- Reload Docker configuration with
systemctl reload docker
and view the logs - If you no longer need to view detailed logs, delete the debug configuration and reload docker again
6. Why is the node abnormal, but the Pod is still in the Running state
- This is caused by k8s’s status protection, which is prone to occur when there are few nodes or many abnormal nodes
- You can view the documentation at https://kubernetes.io/zh/docs/concepts/architecture/nodes/#reliability (opens in a new tab)
7. What if the node is down and the Pod is stuck in Terminating
- After the node has been down for a certain period of time (usually 5 minutes), k8s will try to evict the pod, causing the pod to become Terminating
- Since kubelet can't perform series of operations to delete the pod at this time, the pod will be stuck in Terminating
- For pods of type daemonset, it is scheduled on each node by default, so you don't need to consider this type of pod when the pod is down, and k8s will not evict this type of pod by default
- For pods of type depolyment and replicaset, when the pod is stuck in termanting, the controller will automatically pull up an equivalent number of pods
- For pods of type statefulset, when the pod is stuck in termanting, because the pod name under statefulset is fixed, the new pod will not be pulled up until the previous pod is completely deleted.
- For pods using udisk-pvc, since the pvc cannot be unloaded, it will cause the newly started pod to fail to run, please follow the content in this article related to pvc(#how to check the actual mount situation of the udisk corresponding to the pvc), to verify the relevant relationships
8. What to do if the Pod exits abnormally?
kubectl describe pods pod-name –n ns
to view the related events and statuses of each container of the pod, whether it is the pod’s own exit, whether it is killed by oom, or is it evicted- If it is the pod's own exit,
kubectl logs pod-name -p -n ns
to view the container's exit logs and investigate the cause - If it is killed by oom, it is recommended to readjust the pod's request and limit settings (the two should not differ too much) based on business needs, or to check whether there is memory leak
- If the pod is evicted, it means the node is under too much pressure, you need to check which pod is using up too many resources, and adjust the request and limit settings
- For exits not caused by the pod itself, you need to execute
dmesg
to view system logs andjournalctl -u kubelet
to view kubelet related logs.
9. Why the direct container network started by Docker on K8S node is not working
- UK8S uses SurferCloud's own CNI plugin, and containers started directly with Docker cannot use this plugin, so the network is disconnected.
- If you need to run tasks for a long time, it is not recommended to start containers directly with Docker on the UK8S node, you should use pods
- If it is just a temporary test, you can add the
--network host
parameter to start the container in the hostnetwork mode
10. Timezone Issues of Pod
The containers running in the Kubernetes cluster use Greenwich Mean Time (GMT) by default, not the local time of the host. If you need the container time to be consistent with the host time, you can use the "hostPath" method to mount the timezone file on the host to the container.
Most Linux distributions configure the timezone through the "/etc/localtime" file, we can get the timezone information through the following command:
# ls -l /etc/localtime
lrwxrwxrwx. 1 root root 32 Oct 15 2015 /etc/localtime -> ../usr/share/zoneinfo/Asia/Shanghai
From the above information, we know that the timezone of the host is Asia/Shanghai, and a Pod yaml example is shown below, which explains how to change the timezone configuration in the container to Asia/Shanghai, so that it is consistent with the host.
apiVersion: app/v1
kind: Pod
metadata:
name: nginx
labels:
name: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: "IfNotPresent"
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 80
volumeMounts:
- name: timezone-config
mountPath: /etc/localtime
volumes:
- name: timezone-config
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
If the container has been created before, you only need to add the volumeMounts
and volumes
parameters to the yaml file, and then update it using the kubectl apply
command.