Resource Modification for node-problem-detector
Due to the low default resource configuration given by the node-problem-detector, the following problems may occur:
- Zombie processes appear in the system
- High read IO in node nodes
- npd process will report an error:
Timeout when running plugin "./config/plugin/network_problem.sh": state -signal
We need to adjust the resource configuration of the node-problem-detector, and the operation steps are as follows:
1. Get the DaemonSet's name
kubectl get ds -n kube-system |grep node-problem-detector
node-problem-detector 1 1 1 1 1 <none> 236d
Find the name of DaemonSet
, usually it's called node-problem-detector, or ack-node-problem-detector-daemonset.
2. Edit DaemonSet resource and modify resources configuration
Modify the resource configuration using the edit command. After executing the command below, you will enter into vim mode
kubectl edit ds node-problem-detector -n kube-system
Change the content corresponding to resource to 100m and 100Mi. Save and exit, and the pod corresponding to node-problem-detector will restart automatically.
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
3. Verify that the pod is running normally
kubectl get pod -n kube-system |grep node-problem-detector
Make sure the pod is running normally
Frequently Asked Questions:
1. Will the modification affect our application service?
No, the role of node-problem-detector is to check whether there are anomalies in the node nodes. Modifying node-problem-detector will not affect the application service
2. Which clusters need to be modified?
If the cluster was created before March 12, 2022, and the Node-Problem-Detector node monitoring plugin is activated in Application Center
-> NPD Node Monitoring
of the cluster, this issue may occur.
We have modified this issue in recent releases, and it will not occur in newly created clusters in the future.