Frequently Asked Questions on Storage
1. What is the relationship between PV, PVC, StorageClass, and UDisk?
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: udisk-ssd-test
provisioner: udisk.csi.ucloud.cn # Storage supplier, cannot be changed here.
---
apiVersion: v1
kind: PersistentVolumeClaim
spec:
storageClassName: ssd-csi-udisk
Users only need to set up the StorageClass. When using pvc, the csi-udisk plugin will automatically complete a series of operations such as UDisk creation, mounting, etc. The main process is as follows
- StorageClass sets related parameters and binds with the CSI plugin.
- The pvc and StorageClass are bound.
- K8S observes the newly created pvc using StorageClass, automatically creates pv, and hands it over to the CSI plugin to complete the creation of UDisk.
- pv is bound to pvc, and CSI plugin completes subsequent UDisk mounting and mount operations.
- SurferCloud's CSI plugin can be viewed by using
kubectl get pods -o wide -n kube-system |grep udisk
(a total controller and each node corresponding to the pod)
1.1 Using PVC in Statefulset
- The pvctemplate field in the Statefulset controller can set the K8S cluster to automatically create pvc when the corresponding pvc does not exist, making the above process more automated (pvc and pv are both built by UK8S).
- Statefulset is responsible for creating but not deleting pvc, so the corresponding surplus pvc needs to be deleted manually.
2. The role of VolumeAttachment
VolumeAttachment is not created by the user, so many users are not clear about its role, but in the use of pvc, VolumeAttachment plays a very important role
- What VolumeAttachment represents is the relationship of mounting between pv and a certain Node in the K8S cluster. You can execute
kubectl get volumeattachment |grep pv-name
to check - This mounting relationship is often consistent with the mounting relationship between UDisk and the cloud host, but sometimes there may be inconsistencies.
- Inconsistent situations often occur when UDisk has been unmounted from the cloud host, but it still exists in the VolumeAttachment record. Whether UDisk is mounted on the cloud host can be checked by How to check the actual mounting situation of UDisk corresponding to PVC
- For inconsistent situations, you can choose to manually delete the corresponding VolumeAttachment field and create a new identical VolumeAttachment (after creating a new one, the ATTACHED state is false)
- If you can't delete it, you can check the csi-controller log to locate the reason by
kubectl logs csi-udisk-controller-0 -n kube-system csi-udisk
- Generally, if kubelet can't delete it manually, it may be that the corresponding node does not exist. At this time, you can directly edit volumeattachment to delete the finalizers field.
[root@10-9-112-196 ~]# kubectl get volumeattachment |grep pvc-e51b694f-ffac-4d23-af5e-304a948a155a
NAME ATTACHER PV NODE ATTACHED AGE
csi-1d52d5a7b4c5c172de7cfc17df71c312059cf8a2d7800e05f46e04876a0eb50e udisk.csi.ucloud.cn pvc-e51b694f-ffac-4d23-af5e-304a948a155a 10.9.184.108 true 2d2h
2.1 VolumeAttachment file example
apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
annotations:
csi.alpha.kubernetes.io/node-id: 10.9.184.108 # Bound node ip
finalizers:
- external-attacher/udisk-csi-ucloud-cn
name: csi-1d52d5a7b4c5c172de7cfc17df71c312059cf8a2d7800e05f46e04876a0eb50e # VolumeAttachment name
spec:
attacher: udisk.csi.ucloud.cn
nodeName: 10.9.184.108 # bound node ip
source:
persistentVolumeName: pvc-e51b694f-ffac-4d23-af5e-304a948a155a # Name of the bound pv
3. How to check the actual mounting situation of UDisk corresponding to PVC
Correspondence table
UK8S resource type | Correspondence to host |
---|---|
PV | UDisk's disk |
VolumeAttachment | Disk and host mount relationship (vdb, vdc block device) |
PVC | Disk location where the host is mounted |
pod | Process using the disk |
kubectl get pvc -n ns pvc-name
The corresponding VOLUME field is viewed to find the pv bound to pvc, generally (pvc-e51b694f-ffac-4d23-af5e-304a948a155a)kubectl get pv pv-name -o yaml
In the spec.csi.volumeHandle field, you can see that pv is bound with the UDisk(flexv plugin is the last few digits of pv)- Check the status of the udisk in the console to see whether it is mounted on a host.
kubectl get volumeattachment |grep pv-name
Check the disk mounting status recorded in K8S cluster- ssh to the corresponding host,
lsblk
can see the corresponding disk mount |grep pv-name
You can check the actual mounting point of the disk. There is a globalmount and one or more pod mount points.
[root@10-9-184-108 ~]# mount |grep pvc-e51b694f-ffac-4d23-af5e-304a948a155a
/dev/vdc on /data/kubelet/plugins/kubernetes.io/csi/pv/pvc-e51b694f-ffac-4d23-af5e-304a948a155a/globalmount type ext4 (rw,relatime)
/dev/vdc on /data/kubelet/pods/587962f5-3009-4c53-a56e-a78f6636ce86/volumes/kubernetes.io~csi/pvc-e51b694f-ffac-4d23-af5e-304a948a155a/mount type ext4 (rw,relatime)
4. Error handling of disk mounting
- Due to the long process of disk content, it is recommended to first determine the current status How to check the actual mounting situation of UDisk corresponding to PVC when encountering problems
- If there is a situation where the status in UK8S is inconsistent with the status of the host, first clean up and remove the inconsistent resources, and then follow the normal process for recovery
4.1 What if PV and PVC are always stuck in terminating/disk unloading fails
- Through How to Check the Actual Mounting Situation of UDisk Corresponding to PVC determine the actual mounting status of the current pv and pvc
- Manually process according to your requirements, first clean all pods that use this pv and pvc (if the pvc has been successfully deleted, this step is not required)
- If the deletion of pvc is stuck in terminating, manually umount the corresponding mounting path
- If the deletion of VolumeAttachment is stuck in terminating, manually unmount the disk from the console (if it is stuck in unloading, find the host for processing)
- If the deletion of pv is stuck in terminating, manually delete the disk from the console (before deleting pv, make sure that the related VolumeAttachment has been deleted)
- Ensure that after the manual release of the corresponding resources is completed, you can use
kubectl edit
to delete the finalizers field in the resource, and the resource will be successfully released. - After deleting VolumeAttachment, if the pod mount error, according to the [VolumeAttachment File Example](#_21-VolumeAttachment File Example) provided in the yaml file, add a new one with the same name as the VolumeAttachment can be.
4.2 What if the PVC of the pod is always not mounted?
kubectl get pvc -n ns pvc-name
The corresponding VOLUME field is viewed to find the pv bound to pvc, generally (pvc-e51b694f-ffac-4d23-af5e-304a948a155a)kubectl get pv pv-name -o yaml
In the spec.csi.volumeHandle field, you can see that pv is bound with the UDisk (the flexv plugin is the last A few digits of pv)- After finding the UDisk disk, if the disk is in a usable state on the console page or the host that is mounted is not the host where the pod is located, you can find technical support, check the error log of the UDisk mount and unmount request, and contact the host to process at the same time
- If there is no
UDisk-related error log, please contact the UK8S staff on duty, and provide the log output of
kubectl logs csi-udisk-controller-0 -n kube-system csi-udisk
and pod's event
5. Precautions for using UDisk-PVC
- Because UDisk cannot cross the availability zone, the volumeBindingMode: WaitForFirstConsumer must be specified in the StorageClass.
- Because UDisk cannot be multi-point mounted, the accessModes in pvc must be set to ReadWriteOnce.
- Based on UDisk cannot be multi-point mounted, multiple pods cannot share the same udisk-pvc. If the udisk-pvc of the previous pod is not cleaned up, it will cause subsequent pods to fail to create. At this time, you can check VolumeAttachment status confirms
6. Detach problem of Cloud Disk during K8S 1.17 version upgrade to 1.18
We found that in the process of upgrading the UK8S cluster from 1.17 to 1.18, some Pods mounted with PVC will encounter IO errors. Check the related logs and find the IO abnormality caused by the unloading of the mounting disk.
The community introduced this issue in the 1.18 version to solve Dangling Attachments. See Recover CSI volumes from dangling attachments (opens in a new tab)
In the K8S implementation of disk mounting and unmounting, a single Node can choose to be managed by kubelet and controller-manager. The code above solves the dangling attachments The problem introduced a new problem. The disk of the Node node managed by kubelet will be forcibly unmounted after the controller-manager restarts.
In order to solve this problem, it is necessary to change the nodes responsible for disk mounting by kubelet to disk mounting by controller-manager. The nodes added by UK8S are now default to use controller-manager Responsible for disk mounting, no need to change manually for subsequent additional nodes
6.1 Manually modify the node for controller-manager mounting
Check Kubelet configuration
Check the configuration of /etc/kubernetes/kubelet.conf
on the node. If the value of enableControllerAttachDetach
is false
, change this value to
true
.
Then execute the command systemctl restart kubelet
to restart Kubelet.
Check Node status
Execute the command kubectl get no $IP -o yaml
to check whether there is data in the status
of Node and whether the data is consistent with the volumesInUse
data.
Node annotations
should have a record of volumes.kubernetes.io/controller-managed-attach-detach: "true"
.
As confirmed that the above data is consistent and has corresponding records in Annotations, it can proceed normally for upgrades. If there is a problem, please contact technical support.
7. Flexv plugin causes failure to delete pod
7.1 Phenomena Description
The automatic creation of pv binding to pod using the flexv plug-in can lead to the failure of pod deletion when deleting pod, resulting in the pod being in a Terminating state and not being able to be deleted normally.
- Kubernetes version: 1.13
- Plug-in version: Flexvolume-19.06.1
7.2 Reason for the problem
Unable to find the Flexvolume plugin corresponding to the volume after kubelet restarts. After kubelet restarts, if it finds an orphan pod (a normal pod will not cause this problem), it will infer the plug-in used based on the volume record, but flexv will add a flexvolume- field before the plug-in, causing the name inferred by kubelet to not match the name provided by flexv. The kubelet log will report no "volume plugin matched" error, which in turn leads to the pod stuck in the Terminating state.
You can check the following issues for specifics
- https://github.com/kubernetes/kubernetes/issues/80972 (opens in a new tab)
- https://github.com/kubernetes/kubernetes/pull/80973 (opens in a new tab)
7.3 Solution
Manually umount the path currently used by the pod and clean up the operations.
Operate with caution. This operation is to manually clean up resources instead of kubelet. Please read all the following steps before operating.
- Find the pv that cannot be umounted normally.
- Log in to the node to view the mount record.
mount | grep pv-name
- Record all paths ** path ** matched in the last step, manually umount the path of the pv on the current node.
umount path
-
In the last step of umount, there will be a directory starting with /var/lib/kubelet/pods. After umount, you need to manually delete this directory.
-
Delete pvc, after pvc is deleted, you need to manually unload the corresponding udisk from the console. The id of udisk is the last few digits of the name of pv, for example, if the name of pv is pvc-58f9978e-3133-11ea-b4d6-5254000cee42-bsm-olx0uqti, Then the corresponding udisk name is bsm-olx0uqti. You can also get the diskId field in spec.flexVolume.options by describe pv.
8. Other Frequently Asked Storage Problems
8.1 Can a PVC be mounted on multiple pods?
UDisk does not support multi-point read-write, if you need multi-point read-write, please use UFS.
8.2 After the Pod is deleted, how to reuse the original cloud disk?
You can use the method of static creation of PV to bind the original cloud disk. Detailed see Using existing UDISK in UK8S
8.3 How to set custom parameters directly by nfs is not possible under the default circumstances?
The optional options for NFS mount cannot be specified in the Pod spec. You can choose to set the mount optional options on the server side, or use /etc/nfsmount.conf. In addition, NFS volumes can also be mounted through a persistent volume that allows setting mount optional options. For details, please refer to the official Kubernetes documentationnfs use (opens in a new tab)
9. Scheduling problems of the Pod mounted with UDisk
⚠️ The problem of mounting RSSD cloud disk involves RDMA problem and needs to be dynamically scheduled, involving more complex contents, so it is explained separately in the documentRSSD Disk Mounting Problem, the following only involves the scenario of static scheduling of non-RSSD disks.
Compared to ordinary Pod, the scheduling of Pod using UDisk involves the restriction of the mounting rules of UDisk itself, which is more complicated. The specific restrictions are as follows
- Ordinary cloud disk and SSD cloud disk mount require that the cloud disk must be in the same availability zone as the cloud host
The restriction of UDisk mount in the actual use of UK8S is mainly reflected in the following two aspects
- During the process of automatically creating PV, how to determine which availability zone of the cloud disk to create
- When the Pod needs to be rescheduled, how to ensure that the new scheduled node meets the requirements for cloud disk mounting
The UK8S provides the csi-udisk plugin, which relies on the CSI plugin capabilities provided by K8S to help users achieve the least intervention. The following takes SSD UDisk as an example for explanation.
9.1 Automatically create UDisk when creating PVC
From the above documentation, you can understand that when PVC is created, CSI automatically creates PV and UDisk and completes the binding work. But which availability zone of UDisk to create? If you choose at random, it will lead to the inability to mount the cloud disk after the subsequent Pod scheduling.
For this K8S provides the WaitForFirstConsumer
mechanism. When the StorageClass
specifies the volumeBindingMode: WaitForFirstConsumer
parameter, CSI will not immediately create PV and cloud disk. The following is the workflow in the WaitForFirstConsumer
mode.
- Manually create PVC
- Create Pod, and bind the PVC defined in the previous step to Pod
- Wait for the Pod to schedule, at this time k8s will add a field
volume.kubernetes.io/selected-node
to the Annotations of the PVC to record the Node that the Pod expects to schedule to. Note that the status of Pod viewing at this time is still Pending. - CSI queries the availability zone of Node cloud host, creates a cloud disk in the same availability zone, and creates the corresponding PV for binding
- CSI updates the
spec.csi.volumeHandle
field in PV to record the created UDisk ID - CSI updates the
spec.nodeAffinity
field in PV to record information such as the availability zone where the cloud disk is located
In accordance with the above logic, it can be ensured that the cloud disk will be successfully mounted to the corresponding host after the Pod scheduling
But there is a special case. RSSD disks can only be mounted to Kuaijie models. If Pod first schedules to a non-Kuaijie model, then the subsequent cloud disk creation will fail. Therefore, if you choose RSSD disk, please ensure that Pod first schedules to Kuaijie model.
9.2 Pod rebuild and scheduling process
After the first run, if an update of the service or a node fault triggers Pod reconstruction, it will be rescheduled. Below is the scheduling process
- Clean up the old Pod and complete the cleaning and unmounting of UDisk from the old node
- Create a new Pod
- The K8S scheduler will check the nodes according to the
spec.nodeAffinity
field in PV - If all nodes do not meet disk scheduling requirements, it will record
had volume node affinity conflict
type EVENT to Pod and repeat the previous step process - The K8S scheduler continues to schedule according to the normal Pod scheduling process in the range of nodes that can be scheduled based on the results of the previous filter
10. Working principle of CSI components
CSI is a container storage interface (opens in a new tab) defined by K8S, which can dock various storages of cloud vendors. SurferCloud currently implements UDisk and UFile/US3 CSI plugins.
The CSI components are divided into two major categories, namely Controller and Daemonset. At present, all csi component pods are running in the kube-system
by default, which can be viewed by executing kubectl get pods -n kube-system -o wide |grep csi
.
If there is a storage mounting problem, you can first check whether the CSI Controller is working normally, and whether there is a Pod for the CSI Daemonset on the node.
The following brief introduction to CSI components.
10.1 CSI Controller
CSI Controller is responsible for global resource management. It manages the corresponding operations by listing/watching related resources in K8S. UDisk CSI Controller is responsible for creating and deleting disks and handling the unmounting and mounting operations of disks to cloud hosts. US3 CSI Controller only needs to verify some basic information in StorageClass because there is no need to handle mounting operations.
10.2 CSI Daemonset
The CSI Daemonset component schedules to each node and performs some work on a single node. Unlike the Controller mode, CSI Daemonset communicates with kubelet via the unix
socket address to receive kubelet request information to perform corresponding operations. Generally, the unix socket address of CSI is /var/lib/kubelet/csi-plugins/csi-name/csi.sock
UDisk/US3 CSI Daemonset mainly handles storage Mount and Umount operations
10.3 Other functions
In addition to the basic storage management and mounting functions, CSI also provides a variety of other capabilities. Currently, CSI UDisk has implemented disk dynamic expansion (requires Controller and Daemonset) and disk Metrics information collection (requires CSI Daemonset).
11 CSI Common Problem Troubleshooting Process
This section will take UDisk-CSI as an example, analyze every possible error point in each step after creating pvc, and give handling suggestions. In addition, the content of this section only involves the related content in the process of creating Pod. and is based on an assumption that is the Pod that used this PVC last time has been destroyed, and all the operations and resources in the middle have been cleaned up. If the resources used by the last Pod have not been cleaned up, you can also rely on this document to infer the cleaning scheme.
- Ensure that the controller and Daemonset components of csi work normally by 'kubectl get pods -n kube-system -o wide'
- Confirm whether PV is created successfully, if not, please view section 11.1
- After PV is created successfully, it needs to ensure that Pod scheduling is successful. The Pod with udisk has extra scheduling requirements on ordinary scheduling rules. Specific details can be viewed[Section 9](#_9- Scheduling Problems for Pods Mounted with UDisk)
- If the disk mounting fails, please refer to section 11.2
- When confirming that the disk has been mounted to the target host, you need to confirm the success of mount. If mount fails, please refer to section 11.3
11.1 PV is not created successfully
If PV is not created successfully, make sure that a Pod is using that PVC. For specific reasons, please refer to [Section 9.1](#_91- Automatically create UDisk when creating PVC).
If there is already a Pod using that PVC, then use kubectl logs csi-udisk-controller-0 -n kube-system csi-udisk
View controller logs and confirm whether there are logs that UDisk creation fails.
Through kubectl get pv <pv-name> -o yaml
record the name of pv corresponding to udisk, and check the corresponding udisk in the console.
Generally, the name format of the pv generated automatically is pvc-xxxxxxxxx, which is easy to confuse here.
11.2 Disk mounting fails
11.2.1 Ensure that volumeattachment
resources exist
In order to successfully mount the disk, first, you need to make sure that the volumeattachment
resource exists, and check the information of the node to confirm whether it is currently handled by kubelet or controller-manager.
- The kubelet disk mounting method has defects. At present, k8s recommends using the
controller-manager
for disk mounting. The specific viewing and conversion methods can refer to this document [Section 6.1](#_61- Manually modify the node for controller-manager mounting) - If kubelet is responsible for disk mounting, and the pod log shows the situation of the
volumeattachment
resource does not exist, you can manually add a new volumeAttachment with the same name according to the yaml file provided in the document VolumeAttachment File Example. - If it is
controller-manager
responsible for disk mounting, you need to confirm whether the k8s version is 1.17.1-1.17.7 or 1.18.1-1.18.4. These versionscontroller-manager
disk mounting has a performance problem (opens in a new tab). - Controller-manager log viewing method, log in to three master nodes, execute
journalctl -fu kube-controller-manager
to view, note that only one Master's controller-manager in the three master nodes is a leader, which is actually working. - Kubelet log viewing method. You need to log in to the target node and execute
journalctl -fu kubelet
11.2.2 Ensure that the disk is successfully mounted
- First, you need to make sure that the status of
volumeattachment
is true. - If the state is not true, you can check whether there is an error in the disk mounting process in csi-controller.
- If the state is true, you need to confirm in the console that the udisk is indeed mounted to the target host. If you confirm there are problems, you can contact technical support.
- Also, at this time, you need to confirm that there is only one corresponding
volumeattachment
. Because udisk only allows single point mounting, and us3 allows multi-point mounting due to its nature, and there are no restrictions.
11.3 Disk mount problem
- First, you need to confirm the corresponding drive letter of the disk. udisk mounting is limited by the implementation principle. In some special cases, the drive letter seen on the page and the actual drive letter may be inconsistent. The correspondence of the drive letter can be viewed from the
/sys/block/vdx/serial
File. udisk-csi has implemented this logic, there will be no wrong disk mounting occurring, but manual troubleshooting needs to understand this. - After confirming the corresponding drive letter of the disk, you can view the mounting path through
mount |grep pv-name
. - udisk implements globalmount and pod mount paths according to the csi standard, so under normal circumstances you will see two mount paths for a udisk, one ending with globalmount and one ending with mount.
- us3 only implements pod mount path, so you can only see one mount path, and us3 also doesn't need to confirm the disc drive.
Slow disk mount caused by fsGroup
Many users will encounter a problem of slow disk mount. First, confirm whether fsGroup is set, and whether there are a large number of small files in the disk. If both conditions are met, it may cause slow mounting.
11.3 Denied permissions of the mounted directory
- If the pod sets
securityContext.fsGroup
and the storage class does not havefsType
(it is by default), kubelet will not be able to set the permissions correctly, which will result in apermission denied
error.