You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2022/10/25 06:14:09 UTC

[GitHub] [yunikorn-site] KatLantyss opened a new pull request, #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

KatLantyss opened a new pull request, #200:
URL: https://github.com/apache/yunikorn-site/pull/200

   # Issue
   
   Add a generic example of GPU scheduling with Yunikorn.
   # Link
   
   https://issues.apache.org/jira/browse/YUNIKORN-1355?filter=-1
   # Solve
   
   Test the GPU scheduling.
   Update Yunikorn site.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] yuchaoran2011 commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
yuchaoran2011 commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1029676761


##########
docs/user_guide/workloads/workload_overview.md:
##########
@@ -53,6 +53,7 @@ omitted as it will be set automatically on newly created pods.
 
 Examples of more advanced use cases can be found here:
 
+* [Run NVIDIA GPU Scheduling Jobs](run_nvidia)

Review Comment:
   The title is `Run NVIDIA GPU Jobs` above. Here needs to be consistent



##########
package.json:
##########
@@ -12,7 +12,7 @@
   "dependencies": {
     "@docusaurus/core": "2.1.0",
     "@docusaurus/preset-classic": "2.1.0",
-    "@docusaurus/theme-search-algolia": "2.1.0",
+    "@docusaurus/theme-search-algolia": "^2.2.0",

Review Comment:
   Can we leave the version update to another PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] wilfred-s commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1028719177


##########
docs/user_guide/workloads/run_nvidia.md:
##########
@@ -0,0 +1,263 @@
+---
+id: run_nvidia
+title: Run NVIDIA GPU Scheduling Jobs
+description: How to run generic example of GPU scheduling with Yunikorn.
+keywords:
+ - NVIDIA GPU
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+To know how the GPU scheduling works, please refer to [**Time-Slicing GPUs in Kubernetes | Introduction**](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html#introduction). This page covers ways to enable GPU scheduling in Yunikorn using [**NVIDIA GPU Operator**](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator).

Review Comment:
   @KatLantyss please check this comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] KatLantyss commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
KatLantyss commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1035600971


##########
package.json:
##########
@@ -12,7 +12,7 @@
   "dependencies": {
     "@docusaurus/core": "2.1.0",
     "@docusaurus/preset-classic": "2.1.0",
-    "@docusaurus/theme-search-algolia": "2.1.0",
+    "@docusaurus/theme-search-algolia": "^2.2.0",

Review Comment:
   @yuchaoran2011 Sure!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] wilfred-s commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1036618136


##########
docs/user_guide/workloads/run_nvidia.md:
##########
@@ -0,0 +1,346 @@
+---
+id: run_nvidia
+title: Run NVIDIA GPU Jobs
+description: How to run generic example of GPU scheduling with Yunikorn.
+keywords:
+ - NVIDIA GPU
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Yunikorn with NVIDIA GPUs
+This guide gives an overview of how to set up NVIDIA Device Plugin which enable user to run GPUs with Yunikorn, for more details please check [**Kubernetes with GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#option-2-installing-kubernetes-using-kubeadm).
+
+### Prerequisite
+Before following the steps below, Yunikorn need to deploy on the [**Kubernetes with GPUs**](https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html#install-kubernetes).
+
+### Install NVIDIA Device Plugin
+Add the nvidia-device-plugin helm repository.
+```
+helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
+helm repo update
+helm repo list
+```
+
+Verify the latest release version of the plugin is available.
+```
+helm search repo nvdp --devel
+NAME                     	  CHART VERSION  APP VERSION	   DESCRIPTION
+nvdp/nvidia-device-plugin	  0.12.3         0.12.3         A Helm chart for ...
+```
+
+Deploy the device plugin
+```
+kubectl create namespace nvidia
+helm install --generate-name nvdp/nvidia-device-plugin --namespace nvidia --version 0.12.3
+```
+
+Check the status of the pods to ensure NVIDIA device plugin is running
+```
+kubectl get pods -A
+
+NAMESPACE      NAME                                      READY   STATUS    RESTARTS      AGE
+kube-flannel   kube-flannel-ds-j24fx                     1/1     Running   1 (11h ago)   11h
+kube-system    coredns-78fcd69978-2x9l8                  1/1     Running   1 (11h ago)   11h
+kube-system    coredns-78fcd69978-gszrw                  1/1     Running   1 (11h ago)   11h
+kube-system    etcd-katlantyss-nzxt                      1/1     Running   3 (11h ago)   11h
+kube-system    kube-apiserver-katlantyss-nzxt            1/1     Running   4 (11h ago)   11h
+kube-system    kube-controller-manager-katlantyss-nzxt   1/1     Running   3 (11h ago)   11h
+kube-system    kube-proxy-4wz7r                          1/1     Running   1 (11h ago)   11h
+kube-system    kube-scheduler-katlantyss-nzxt            1/1     Running   4 (11h ago)   11h
+kube-system    nvidia-device-plugin-1659451060-c92sb     1/1     Running   1 (11h ago)   11h
+```
+
+### Testing NVIDIA Device Plugin
+Create a gpu test yaml file.
+```
+# gpu-pod.yaml
+	apiVersion: v1
+	kind: Pod
+	metadata:
+	  name: gpu-operator-test
+	spec:
+	  restartPolicy: OnFailure
+	  containers:
+	  - name: cuda-vector-add
+	    image: "nvidia/samples:vectoradd-cuda10.2"
+	    resources:
+	      limits:
+	         nvidia.com/gpu: 1
+```
+Deploy the application.
+```
+kubectl apply -f gpu-pod.yaml
+```
+Check the logs to ensure the app completed successfully.
+```
+kubectl get pods gpu-operator-test
+
+NAME                READY   STATUS      RESTARTS   AGE
+gpu-operator-test   0/1     Completed   0          9d
+```
+Check the result.
+```
+kubectl logs gpu-operator-test
+	
+[Vector addition of 50000 elements]
+Copy input data from the host memory to the CUDA device
+CUDA kernel launch with 196 blocks of 256 threads
+Copy output data from the CUDA device to the host memory
+Test PASSED
+Done
+```
+
+---
+## Enable GPU Time-Slicing(Optional)

Review Comment:
   nit: space between Slicing and the bracket.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] wilfred-s commented on pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#issuecomment-1311072318

   > In other words, whether time slicing is enabled for GPUs shouldn't matter from a resource scheduling perspective, right?
   
   That is correct this just brings the NVIDIA setup and the YuniKorn bits together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] yuchaoran2011 commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
yuchaoran2011 commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1019912473


##########
docs/user_guide/workloads/run_nvidia.md:
##########
@@ -0,0 +1,263 @@
+---
+id: run_nvidia
+title: Run NVIDIA GPU Scheduling Jobs
+description: How to run generic example of GPU scheduling with Yunikorn.
+keywords:
+ - NVIDIA GPU
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+To know how the GPU scheduling works, please refer to [**Time-Slicing GPUs in Kubernetes | Introduction**](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html#introduction). This page covers ways to enable GPU scheduling in Yunikorn using [**NVIDIA GPU Operator**](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator).

Review Comment:
   Thanks @wilfred-s for the clarification. In that case, I think this part needs to be rephrased. The way it's written seems to suggest that enabling time slicing is a requirement for using GPUs, but in reality, it's an optional feature. We should make that clear. I've heard some production ML training use cases that in fact explicitly turned off time slicing since each work unit is heavy enough to fully use a single GPU. @KatLantyss 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] KatLantyss commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
KatLantyss commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1036714801


##########
package.json:
##########
@@ -12,7 +12,7 @@
   "dependencies": {
     "@docusaurus/core": "2.1.0",
     "@docusaurus/preset-classic": "2.1.0",
-    "@docusaurus/theme-search-algolia": "2.1.0",
+    "@docusaurus/theme-search-algolia": "^2.2.0",

Review Comment:
   @yuchaoran2011 Nailed it. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] KatLantyss commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
KatLantyss commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1028899818


##########
docs/user_guide/workloads/run_nvidia.md:
##########
@@ -0,0 +1,263 @@
+---
+id: run_nvidia
+title: Run NVIDIA GPU Scheduling Jobs
+description: How to run generic example of GPU scheduling with Yunikorn.
+keywords:
+ - NVIDIA GPU
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+To know how the GPU scheduling works, please refer to [**Time-Slicing GPUs in Kubernetes | Introduction**](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html#introduction). This page covers ways to enable GPU scheduling in Yunikorn using [**NVIDIA GPU Operator**](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator).

Review Comment:
   @wilfred-s Sorry for the delay, I will change it recently, and thanks @yuchaoran2011 for the advice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] yuchaoran2011 merged pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
yuchaoran2011 merged PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] wilfred-s commented on pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#issuecomment-1333040678

   Looks good now, one tiny nit
   Checking with @yuchaoran2011 to see if anything is still left


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] yuchaoran2011 commented on a diff in pull request #200: [YUNIKORN-1355] Generic example of GPU scheduling with Yunikorn

Posted by GitBox <gi...@apache.org>.
yuchaoran2011 commented on code in PR #200:
URL: https://github.com/apache/yunikorn-site/pull/200#discussion_r1036663106


##########
package.json:
##########
@@ -12,7 +12,7 @@
   "dependencies": {
     "@docusaurus/core": "2.1.0",
     "@docusaurus/preset-classic": "2.1.0",
-    "@docusaurus/theme-search-algolia": "2.1.0",
+    "@docusaurus/theme-search-algolia": "^2.2.0",

Review Comment:
   Looks like this change hasn't been reverted back



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org