Multiple Pods Share a GPU
This solution will deploy the GPU-Share plugin. After the deployment is completed, the GPU of the cluster nodes can be scheduled by multiple Pods. Currently, it only supports command-line installation. In the future, the UK8S team will add this function to the cluster plugin according to the schedule to facilitate one-click installation.
Install and Use the GPU Sharing Plugin
⚠️ Please check the Kubernetes version before installing. The required Kubernetes version is >=1.17.4
1. Label the nodes that require GPU sharing
kubectl label node <nodeip> nodeShareGPU=true
2. Delete the original nvdia plugin in the cluster using kubectl
kubectl delete ds -n kube-system nvidia-device-plugin-daemonset
3. Use kubectl to install the GPU-Share plugin
kubectl apply -f https://docs.surfercloud.com/uk8s/yaml/gpu-share/1.1.0.yaml
Test GPU Sharing
Test Conditions:
- The cluster only has one single-card GPU cloud host.
- The cluster has completed the plugin installation following the above three steps.
- The plugin pod is now in a running state.
Next, we run test-gpushare1 and test-gpushare2 respectively.
# Run test-gpushare1
kubectl apply -f https://docs.surfercloud.com/uk8s/yaml/gpu-share/test-gpushare1.yaml
# Run test-gpushare2
kubectl apply -f https://docs.surfercloud.com/uk8s/yaml/gpu-share/test-gpushare2.yaml
Take test-gpushare1 as an example.
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-gpushare1
labels:
app: test-gpushare1
spec:
selector:
matchLabels:
app: test-gpushare1
template:
metadata:
labels:
app: test-gpushare1
spec:
schedulerName: gpushare-scheduler
containers:
- name: test-gpushare1
image: uhub.surfercloud.com/ucloud//gpu-player:share
command:
- python3
- /app/main.py
resources:
limits:
# GiB
ucloud.cn/gpu-mem: 1
In the limits, ucloud.cn/gpu-mem: 1
is set. Similarly, test-gpushare2 also has this setting. Then, we can observe that with only a single GPU card node in the cluster, the GPU can support two Pods at the same time.
kubectl get pod |grep test-gpushare
Monitor GPU Usage
You can monitor the resource usage of the GPU node or check by entering the GPU node and executing nvidia-smi
.
Remove the GPU Sharing Plugin
Please execute the following command on the master node
kubectl delete -f https://docs.surfercloud.com/uk8s/yaml/gpu-share/1.1.0.yaml
kubectl apply -f /etc/kubernetes/yaml/nvidia-device-plugin.yaml