Resource Modification for node-problem-detector

Due to the low default resource configuration given by the node-problem-detector, the following problems may occur:

Zombie processes appear in the system
High read IO in node nodes
npd process will report an error: Timeout when running plugin "./config/plugin/network_problem.sh": state -signal

We need to adjust the resource configuration of the node-problem-detector, and the operation steps are as follows:

1. Get the DaemonSet’s name


kubectl get ds -n kube-system |grep node-problem-detector
node-problem-detector            1         1         1       1            1           <none>          236d

Find the name of DaemonSet, usually it’s called node-problem-detector, or ack-node-problem-detector-daemonset.

2. Edit DaemonSet resource and modify resources configuration

Modify the resource configuration using the edit command. After executing the command below, you will enter into vim mode


kubectl edit ds node-problem-detector -n kube-system

Change the content corresponding to resource to 100m and 100Mi. Save and exit, and the pod corresponding to node-problem-detector will restart automatically.


resources:
  limits:
    cpu: 100m
    memory: 100Mi
  requests:
    cpu: 100m
    memory: 100Mi

3. Verify that the pod is running normally


kubectl get pod -n kube-system |grep node-problem-detector

Make sure the pod is running normally

Frequently Asked Questions:

1. Will the modification affect our application service?

No, the role of node-problem-detector is to check whether there are anomalies in the node nodes. Modifying node-problem-detector will not affect the application service

2. Which clusters need to be modified?

If the cluster was created before March 12, 2022, and the Node-Problem-Detector node monitoring plugin is activated in Application Center -> NPD Node Monitoring of the cluster, this issue may occur.

We have modified this issue in recent releases, and it will not occur in newly created clusters in the future.