You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2022/12/01 01:55:16 UTC
[GitHub] [yunikorn-site] wilfred-s commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

wilfred-s commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1036618136


##########
docs/user_guide/workloads/run_nvidia.md:
##########
@@ -0,0 +1,346 @@
+---
+id: run_nvidia
+title: Run NVIDIA GPU Jobs
+description: How to run generic example of GPU scheduling with Yunikorn.
+keywords:
+ - NVIDIA GPU
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Yunikorn with NVIDIA GPUs
+This guide gives an overview of how to set up NVIDIA Device Plugin which enable user to run GPUs with Yunikorn, for more details please check [**Kubernetes with GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#option-2-installing-kubernetes-using-kubeadm).
+
+### Prerequisite
+Before following the steps below, Yunikorn need to deploy on the [**Kubernetes with GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#install-kubernetes).
+
+### Install NVIDIA Device Plugin
+Add the nvidia-device-plugin helm repository.
+```
+helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
+helm repo update
+helm repo list
+```
+
+Verify the latest release version of the plugin is available.
+```
+helm search repo nvdp --devel
+NAME                     	  CHART VERSION  APP VERSION	   DESCRIPTION
+nvdp/nvidia-device-plugin	  0.12.3         0.12.3         A Helm chart for ...
+```
+
+Deploy the device plugin
+```
+kubectl create namespace nvidia
+helm install --generate-name nvdp/nvidia-device-plugin --namespace nvidia --version 0.12.3
+```
+
+Check the status of the pods to ensure NVIDIA device plugin is running
+```
+kubectl get pods -A
+
+NAMESPACE      NAME                                      READY   STATUS    RESTARTS      AGE
+kube-flannel   kube-flannel-ds-j24fx                     1/1     Running   1 (11h ago)   11h
+kube-system    coredns-78fcd69978-2x9l8                  1/1     Running   1 (11h ago)   11h
+kube-system    coredns-78fcd69978-gszrw                  1/1     Running   1 (11h ago)   11h
+kube-system    etcd-katlantyss-nzxt                      1/1     Running   3 (11h ago)   11h
+kube-system    kube-apiserver-katlantyss-nzxt            1/1     Running   4 (11h ago)   11h
+kube-system    kube-controller-manager-katlantyss-nzxt   1/1     Running   3 (11h ago)   11h
+kube-system    kube-proxy-4wz7r                          1/1     Running   1 (11h ago)   11h
+kube-system    kube-scheduler-katlantyss-nzxt            1/1     Running   4 (11h ago)   11h
+kube-system    nvidia-device-plugin-1659451060-c92sb     1/1     Running   1 (11h ago)   11h
+```
+
+### Testing NVIDIA Device Plugin
+Create a gpu test yaml file.
+```
+# gpu-pod.yaml
+	apiVersion: v1
+	kind: Pod
+	metadata:
+	  name: gpu-operator-test
+	spec:
+	  restartPolicy: OnFailure
+	  containers:
+	  - name: cuda-vector-add
+	    image: "nvidia/samples:vectoradd-cuda10.2"
+	    resources:
+	      limits:
+	         nvidia.com/gpu: 1
+```
+Deploy the application.
+```
+kubectl apply -f gpu-pod.yaml
+```
+Check the logs to ensure the app completed successfully.
+```
+kubectl get pods gpu-operator-test
+
+NAME                READY   STATUS      RESTARTS   AGE
+gpu-operator-test   0/1     Completed   0          9d
+```
+Check the result.
+```
+kubectl logs gpu-operator-test
+	
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+Done
+```
+
+---
+## Enable GPU Time-Slicing(Optional)

Review Comment:
   nit: space between Slicing and the bracket.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org