Frequently Asked Questions on Storage

1. What is the relationship between PV, PVC, StorageClass, and UDisk?


apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: udisk-ssd-test
provisioner: udisk.csi.ucloud.cn # Storage supplier, cannot be changed here.
---
apiVersion: v1
kind: PersistentVolumeClaim
spec:
  storageClassName: ssd-csi-udisk

Users only need to set up the StorageClass. When using pvc, the csi-udisk plugin will automatically complete a series of operations such as UDisk creation, mounting, etc. The main process is as follows

StorageClass sets related parameters and binds with the CSI plugin.
The pvc and StorageClass are bound.
K8S observes the newly created pvc using StorageClass, automatically creates pv, and hands it over to the CSI plugin to complete the creation of UDisk.
pv is bound to pvc, and CSI plugin completes subsequent UDisk mounting and mount operations.
Surfercloud’s CSI plugin can be viewed by using kubectl get pods -o wide -n kube-system |grep udisk (a total controller and each node corresponding to the pod)

1.1 Using PVC in Statefulset

The pvctemplate field in the Statefulset controller can set the K8S cluster to automatically create pvc when the corresponding pvc does not exist, making the above process more automated (pvc and pv are both built by UK8S).
Statefulset is responsible for creating but not deleting pvc, so the corresponding surplus pvc needs to be deleted manually.

2. The role of VolumeAttachment

VolumeAttachment is not created by the user, so many users are not clear about its role, but in the use of pvc, VolumeAttachment plays a very important role

What VolumeAttachment represents is the relationship of mounting between pv and a certain Node in the K8S cluster. You can execute kubectl get volumeattachment |grep pv-name to check
This mounting relationship is often consistent with the mounting relationship between UDisk and the cloud host, but sometimes there may be inconsistencies.
Inconsistent situations often occur when UDisk has been unmounted from the cloud host, but it still exists in the VolumeAttachment record. Whether UDisk is mounted on the cloud host can be checked by How to check the actual mounting situation of UDisk corresponding to PVC
For inconsistent situations, you can choose to manually delete the corresponding VolumeAttachment field and create a new identical VolumeAttachment (after creating a new one, the ATTACHED state is false)
If you can’t delete it, you can check the csi-controller log to locate the reason by kubectl logs csi-udisk-controller-0 -n kube-system csi-udisk
Generally, if kubelet can’t delete it manually, it may be that the corresponding node does not exist. At this time, you can directly edit volumeattachment to delete the finalizers field.


[root@10-9-112-196 ~]# kubectl get volumeattachment |grep pvc-e51b694f-ffac-4d23-af5e-304a948a155a
NAME                                                                                        ATTACHER            PV                                          NODE         ATTACHED   AGE
csi-1d52d5a7b4c5c172de7cfc17df71c312059cf8a2d7800e05f46e04876a0eb50e   udisk.csi.ucloud.cn   pvc-e51b694f-ffac-4d23-af5e-304a948a155a   10.9.184.108   true       2d2h

2.1 VolumeAttachment file example


apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
  annotations:
    csi.alpha.kubernetes.io/node-id: 10.9.184.108 # Bound node ip
  finalizers:
  - external-attacher/udisk-csi-ucloud-cn
  name: csi-1d52d5a7b4c5c172de7cfc17df71c312059cf8a2d7800e05f46e04876a0eb50e # VolumeAttachment name
spec:
  attacher: udisk.csi.ucloud.cn
  nodeName: 10.9.184.108 # bound node ip
  source:
    persistentVolumeName: pvc-e51b694f-ffac-4d23-af5e-304a948a155a # Name of the bound pv

3. How to check the actual mounting situation of UDisk corresponding to PVC

Correspondence table

UK8S resource type	Correspondence to host
PV	UDisk’s disk
VolumeAttachment	Disk and host mount relationship (vdb, vdc block device)
PVC	Disk location where the host is mounted
pod	Process using the disk

kubectl get pvc -n ns pvc-name The corresponding VOLUME field is viewed to find the pv bound to pvc, generally (pvc-e51b694f-ffac-4d23-af5e-304a948a155a)
kubectl get pv pv-name -o yaml In the spec.csi.volumeHandle field, you can see that pv is bound with the UDisk(flexv plugin is the last few digits of pv)
Check the status of the udisk in the console to see whether it is mounted on a host.
kubectl get volumeattachment |grep pv-name Check the disk mounting status recorded in K8S cluster
ssh to the corresponding host, lsblk can see the corresponding disk
mount |grep pv-name You can check the actual mounting point of the disk. There is a globalmount and one or more pod mount points.


[root@10-9-184-108 ~]# mount |grep pvc-e51b694f-ffac-4d23-af5e-304a948a155a
/dev/vdc on /data/kubelet/plugins/kubernetes.io/csi/pv/pvc-e51b694f-ffac-4d23-af5e-304a948a155a/globalmount type ext4 (rw,relatime)
/dev/vdc on /data/kubelet/pods/587962f5-3009-4c53-a56e-a78f6636ce86/volumes/kubernetes.io~csi/pvc-e51b694f-ffac-4d23-af5e-304a948a155a/mount type ext4 (rw,relatime)

4. Error handling of disk mounting

Due to the long process of disk content, it is recommended to first determine the current status How to check the actual mounting situation of UDisk corresponding to PVC when encountering problems
If there is a situation where the status in UK8S is inconsistent with the status of the host, first clean up and remove the inconsistent resources, and then follow the normal process for recovery

4.1 What if PV and PVC are always stuck in terminating/disk unloading fails

Through How to Check the Actual Mounting Situation of UDisk Corresponding to PVC determine the actual mounting status of the current pv and pvc
Manually process according to your requirements, first clean all pods that use this pv and pvc (if the pvc has been successfully deleted, this step is not required)
If the deletion of pvc is stuck in terminating, manually umount the corresponding mounting path
If the deletion of VolumeAttachment is stuck in terminating, manually unmount the disk from the console (if it is stuck in unloading, find the host for processing)
If the deletion of pv is stuck in terminating, manually delete the disk from the console (before deleting pv, make sure that the related VolumeAttachment has been deleted)
Ensure that after the manual release of the corresponding resources is completed, you can use kubectl edit to delete the finalizers field in the resource, and the resource will be successfully released.
After deleting VolumeAttachment, if the pod mount error, according to the [VolumeAttachment File Example](#_21-VolumeAttachment File Example) provided in the yaml file, add a new one with the same name as the VolumeAttachment can be.

4.2 What if the PVC of the pod is always not mounted?

kubectl get pvc -n ns pvc-name The corresponding VOLUME field is viewed to find the pv bound to pvc, generally (pvc-e51b694f-ffac-4d23-af5e-304a948a155a)
kubectl get pv pv-name -o yaml In the spec.csi.volumeHandle field, you can see that pv is bound with the UDisk (the flexv plugin is the last A few digits of pv)
After finding the UDisk disk, if the disk is in a usable state on the console page or the host that is mounted is not the host where the pod is located, you can find technical support, check the error log of the UDisk mount and unmount request, and contact the host to process at the same time
If there is no UDisk-related error log, please contact the UK8S staff on duty, and provide the log output of kubectl logs csi-udisk-controller-0 -n kube-system csi-udisk and pod’s event

5. Precautions for using UDisk-PVC

Because UDisk cannot cross the availability zone, the volumeBindingMode: WaitForFirstConsumer must be specified in the StorageClass.
Because UDisk cannot be multi-point mounted, the accessModes in pvc must be set to ReadWriteOnce.
Based on UDisk cannot be multi-point mounted, multiple pods cannot share the same udisk-pvc. If the udisk-pvc of the previous pod is not cleaned up, it will cause subsequent pods to fail to create. At this time, you can check VolumeAttachment status confirms

6. Detach problem of Cloud Disk during K8S 1.17 version upgrade to 1.18

We found that in the process of upgrading the UK8S cluster from 1.17 to 1.18, some Pods mounted with PVC will encounter IO errors. Check the related logs and find the IO abnormality caused by the unloading of the mounting disk.

The community introduced this issue in the 1.18 version to solve Dangling Attachments. See Recover CSI volumes from dangling attachments

In the K8S implementation of disk mounting and unmounting, a single Node can choose to be managed by kubelet and controller-manager. The code above solves the dangling attachments The problem introduced a new problem. The disk of the Node node managed by kubelet will be forcibly unmounted after the controller-manager restarts.

In order to solve this problem, it is necessary to change the nodes responsible for disk mounting by kubelet to disk mounting by controller-manager. The nodes added by UK8S are now default to use controller-manager Responsible for disk mounting, no need to change manually for subsequent additional nodes

6.1 Manually modify the node for controller-manager mounting

Check Kubelet configuration

Check the configuration of /etc/kubernetes/kubelet.conf on the node. If the value of enableControllerAttachDetach is false, change this value to true.

Then execute the command systemctl restart kubelet to restart Kubelet.

Check Node status

Execute the command kubectl get no $IP -o yaml to check whether there is data in the status of Node and whether the data is consistent with the volumesInUse data.

Node annotations should have a record of volumes.kubernetes.io/controller-managed-attach-detach: "true".

As confirmed that the above data is consistent and has corresponding records in Annotations, it can proceed normally for upgrades. If there is a problem, please contact technical support.

7. Flexv plugin causes failure to delete pod

7.1 Phenomena Description

The automatic creation of pv binding to pod using the flexv plug-in can lead to the failure of pod deletion when deleting pod, resulting in the pod being in a Terminating state and not being able to be deleted normally.

Kubernetes version: 1.13
Plug-in version: Flexvolume-19.06.1

7.2 Reason for the problem

Unable to find the Flexvolume plugin corresponding to the volume after kubelet restarts. After kubelet restarts, if it finds an orphan pod (a normal pod will not cause this problem), it will infer the plug-in used based on the volume record, but flexv will add a flexvolume- field before the plug-in, causing the name inferred by kubelet to not match the name provided by flexv. The kubelet log will report no “volume plugin matched” error, which in turn leads to the pod stuck in the Terminating state.

You can check the following issues for specifics

7.3 Solution

Manually umount the path currently used by the pod and clean up the operations.

Operate with caution. This operation is to manually clean up resources instead of kubelet. Please read all the following steps before operating.

Find the pv that cannot be umounted normally.
Log in to the node to view the mount record.


mount | grep pv-name

Record all paths ** path ** matched in the last step, manually umount the path of the pv on the current node.


umount path

In the last step of umount, there will be a directory starting with /var/lib/kubelet/pods. After umount, you need to manually delete this directory.
Delete pvc, after pvc is deleted, you need to manually unload the corresponding udisk from the console. The id of udisk is the last few digits of the name of pv, for example, if the name of pv is pvc-58f9978e-3133-11ea-b4d6-5254000cee42-bsm-olx0uqti, Then the corresponding udisk name is bsm-olx0uqti. You can also get the diskId field in spec.flexVolume.options by describe pv.

8. Other Frequently Asked Storage Problems

8.1 Can a PVC be mounted on multiple pods?

UDisk does not support multi-point read-write, if you need multi-point read-write, please use UFS.

8.2 After the Pod is deleted, how to reuse the original cloud disk?

You can use the method of static creation of PV to bind the original cloud disk. Detailed see Using existing UDISK in UK8S

8.3 How to set custom parameters directly by nfs is not possible under the default circumstances?

The optional options for NFS mount cannot be specified in the Pod spec. You can choose to set the mount optional options on the server side, or use /etc/nfsmount.conf. In addition, NFS volumes can also be mounted through a persistent volume that allows setting mount optional options. For details, please refer to the official Kubernetes documentationnfs use

9. Scheduling problems of the Pod mounted with UDisk

⚠️ The problem of mounting RSSD cloud disk involves RDMA problem and needs to be dynamically scheduled, involving more complex contents, so it is explained separately in the documentRSSD Disk Mounting Problem, the following only involves the scenario of static scheduling of non-RSSD disks.

Compared to ordinary Pod, the scheduling of Pod using UDisk involves the restriction of the mounting rules of UDisk itself, which is more complicated. The specific restrictions are as follows

Ordinary cloud disk and SSD cloud disk mount require that the cloud disk must be in the same availability zone as the cloud host

The restriction of UDisk mount in the actual use of UK8S is mainly reflected in the following two aspects

During the process of automatically creating PV, how to determine which availability zone of the cloud disk to create
When the Pod needs to be rescheduled, how to ensure that the new scheduled node meets the requirements for cloud disk mounting

The UK8S provides the csi-udisk plugin, which relies on the CSI plugin capabilities provided by K8S to help users achieve the least intervention. The following takes SSD UDisk as an example for explanation.

9.1 Automatically create UDisk when creating PVC

From the above documentation, you can understand that when PVC is created, CSI automatically creates PV and UDisk and completes the binding work. But which availability zone of UDisk to create? If you choose at random, it will lead to the inability to mount the cloud disk after the subsequent Pod scheduling.

For this K8S provides the WaitForFirstConsumer mechanism. When the StorageClass specifies the volumeBindingMode: WaitForFirstConsumer parameter, CSI will not immediately create PV and cloud disk. The following is the workflow in the WaitForFirstConsumer mode.

Manually create PVC
Create Pod, and bind the PVC defined in the previous step to Pod
Wait for the Pod to schedule, at this time k8s will add a field volume.kubernetes.io/selected-node to the Annotations of the PVC to record the Node that the Pod expects to schedule to. Note that the status of Pod viewing at this time is still Pending.
CSI queries the availability zone of Node cloud host, creates a cloud disk in the same availability zone, and creates the corresponding PV for binding
CSI updates the spec.csi.volumeHandle field in PV to record the created UDisk ID
CSI updates the spec.nodeAffinity field in PV to record information such as the availability zone where the cloud disk is located

In accordance with the above logic, it can be ensured that the cloud disk will be successfully mounted to the corresponding host after the Pod scheduling

But there is a special case. RSSD disks can only be mounted to Kuaijie models. If Pod first schedules to a non-Kuaijie model, then the subsequent cloud disk creation will fail. Therefore, if you choose RSSD disk, please ensure that Pod first schedules to Kuaijie model.

9.2 Pod rebuild and scheduling process

After the first run, if an update of the service or a node fault triggers Pod reconstruction, it will be rescheduled. Below is the scheduling process

Clean up the old Pod and complete the cleaning and unmounting of UDisk from the old node
Create a new Pod
The K8S scheduler will check the nodes according to the spec.nodeAffinity field in PV
If all nodes do not meet disk scheduling requirements, it will record had volume node affinity conflict type EVENT to Pod and repeat the previous step process
The K8S scheduler continues to schedule according to the normal Pod scheduling process in the range of nodes that can be scheduled based on the results of the previous filter

10. Working principle of CSI components

CSI is a container storage interface defined by K8S, which can dock various storages of cloud vendors. Surfercloud currently implements UDisk and UFile/US3 CSI plugins.

The CSI components are divided into two major categories, namely Controller and Daemonset. At present, all csi component pods are running in the kube-system by default, which can be viewed by executing kubectl get pods -n kube-system -o wide |grep csi. If there is a storage mounting problem, you can first check whether the CSI Controller is working normally, and whether there is a Pod for the CSI Daemonset on the node.

The following brief introduction to CSI components.

10.1 CSI Controller

CSI Controller is responsible for global resource management. It manages the corresponding operations by listing/watching related resources in K8S. UDisk CSI Controller is responsible for creating and deleting disks and handling the unmounting and mounting operations of disks to cloud hosts. US3 CSI Controller only needs to verify some basic information in StorageClass because there is no need to handle mounting operations.

10.2 CSI Daemonset

The CSI Daemonset component schedules to each node and performs some work on a single node. Unlike the Controller mode, CSI Daemonset communicates with kubelet via the unix socket address to receive kubelet request information to perform corresponding operations. Generally, the unix socket address of CSI is /var/lib/kubelet/csi-plugins/csi-name/csi.sock UDisk/US3 CSI Daemonset mainly handles storage Mount and Umount operations

10.3 Other functions

In addition to the basic storage management and mounting functions, CSI also provides a variety of other capabilities. Currently, CSI UDisk has implemented disk dynamic expansion (requires Controller and Daemonset) and disk Metrics information collection (requires CSI Daemonset).

11 CSI Common Problem Troubleshooting Process

This section will take UDisk-CSI as an example, analyze every possible error point in each step after creating pvc, and give handling suggestions. In addition, the content of this section only involves the related content in the process of creating Pod. and is based on an assumption that is the Pod that used this PVC last time has been destroyed, and all the operations and resources in the middle have been cleaned up. If the resources used by the last Pod have not been cleaned up, you can also rely on this document to infer the cleaning scheme.

Ensure that the controller and Daemonset components of csi work normally by ‘kubectl get pods -n kube-system -o wide’
Confirm whether PV is created successfully, if not, please view section 11.1
After PV is created successfully, it needs to ensure that Pod scheduling is successful. The Pod with udisk has extra scheduling requirements on ordinary scheduling rules. Specific details can be viewed[Section 9](#_9- Scheduling Problems for Pods Mounted with UDisk)
If the disk mounting fails, please refer to section 11.2
When confirming that the disk has been mounted to the target host, you need to confirm the success of mount. If mount fails, please refer to section 11.3

11.1 PV is not created successfully

If PV is not created successfully, make sure that a Pod is using that PVC. For specific reasons, please refer to [Section 9.1](#_91- Automatically create UDisk when creating PVC).

If there is already a Pod using that PVC, then use kubectl logs csi-udisk-controller-0 -n kube-system csi-udisk View controller logs and confirm whether there are logs that UDisk creation fails.

Through kubectl get pv <pv-name> -o yaml record the name of pv corresponding to udisk, and check the corresponding udisk in the console.

Generally, the name format of the pv generated automatically is pvc-xxxxxxxxx, which is easy to confuse here.

11.2 Disk mounting fails

11.2.1 Ensure that `volumeattachment` resources exist

In order to successfully mount the disk, first, you need to make sure that the volumeattachment resource exists, and check the information of the node to confirm whether it is currently handled by kubelet or controller-manager.

The kubelet disk mounting method has defects. At present, k8s recommends using the controller-manager for disk mounting. The specific viewing and conversion methods can refer to this document [Section 6.1](#_61- Manually modify the node for controller-manager mounting)
If kubelet is responsible for disk mounting, and the pod log shows the situation of the volumeattachment resource does not exist, you can manually add a new volumeAttachment with the same name according to the yaml file provided in the document VolumeAttachment File Example.
If it is controller-manager responsible for disk mounting, you need to confirm whether the k8s version is 1.17.1-1.17.7 or 1.18.1-1.18.4. These versions controller-manager disk mounting has a performance problem .
Controller-manager log viewing method, log in to three master nodes, execute journalctl -fu kube-controller-manager to view, note that only one Master’s controller-manager in the three master nodes is a leader, which is actually working.
Kubelet log viewing method. You need to log in to the target node and execute journalctl -fu kubelet

11.2.2 Ensure that the disk is successfully mounted

First, you need to make sure that the status of volumeattachment is true.
If the state is not true, you can check whether there is an error in the disk mounting process in csi-controller.
If the state is true, you need to confirm in the console that the udisk is indeed mounted to the target host. If you confirm there are problems, you can contact technical support.
Also, at this time, you need to confirm that there is only one corresponding volumeattachment. Because udisk only allows single point mounting, and us3 allows multi-point mounting due to its nature, and there are no restrictions.

11.3 Disk mount problem

First, you need to confirm the corresponding drive letter of the disk. udisk mounting is limited by the implementation principle. In some special cases, the drive letter seen on the page and the actual drive letter may be inconsistent. The correspondence of the drive letter can be viewed from the /sys/block/vdx/serial File. udisk-csi has implemented this logic, there will be no wrong disk mounting occurring, but manual troubleshooting needs to understand this.
After confirming the corresponding drive letter of the disk, you can view the mounting path through mount |grep pv-name.
udisk implements globalmount and pod mount paths according to the csi standard, so under normal circumstances you will see two mount paths for a udisk, one ending with globalmount and one ending with mount.
us3 only implements pod mount path, so you can only see one mount path, and us3 also doesn’t need to confirm the disc drive.

Slow disk mount caused by fsGroup

Many users will encounter a problem of slow disk mount. First, confirm whether fsGroup is set, and whether there are a large number of small files in the disk. If both conditions are met, it may cause slow mounting.

11.3 Denied permissions of the mounted directory

If the pod sets securityContext.fsGroup and the storage class does not have fsType (it is by default), kubelet will not be able to set the permissions correctly, which will result in a permission denied error.