You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2022/10/14 01:38:40 UTC

[GitHub] [yunikorn-site] wusamzong opened a new pull request, #195: [YUNIKORN-1339] Adding time slicing GPU to Tensorflow example

wusamzong opened a new pull request, #195:
URL: https://github.com/apache/yunikorn-site/pull/195

   # What is this issue ?
   After NVIDA GPU Operator release 1.11.0, it support for Time-Slicing GPUs in Kubernetes.
   I deploy the Nvida gpu operator and yunikorn on the k8s cluster.
   I run the TFJob and the yunikorn scheduler can schedule the tensorflow pods to the node owning the GPU resource.
   
   # Issue type
   * [x] Improvement
   
   # Apache jira
   https://issues.apache.org/jira/browse/YUNIKORN-1339
   
   # How to test?
   ```
   yarn run start
   ```
   OS:
     Ubuntu 20.04
   
   ![Screenshot 2022-10-13 at 11-38-58 Run TensorFlow Jobs Apache YuniKorn](https://user-images.githubusercontent.com/48400525/195493696-c8f73292-1fbc-474a-a03d-59a98ceade62.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] wilfred-s closed pull request #195: [YUNIKORN-1339] Adding time slicing GPU to Tensorflow example

Posted by GitBox <gi...@apache.org>.
wilfred-s closed pull request #195: [YUNIKORN-1339] Adding time slicing GPU to Tensorflow example
URL: https://github.com/apache/yunikorn-site/pull/195


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] akhilpb001 commented on a diff in pull request #195: [YUNIKORN-1339] Adding time slicing GPU to Tensorflow example

Posted by GitBox <gi...@apache.org>.
akhilpb001 commented on code in PR #195:
URL: https://github.com/apache/yunikorn-site/pull/195#discussion_r995675126


##########
docs/user_guide/workloads/run_tensorflow.md:
##########
@@ -91,3 +91,142 @@ You can view the job info from YuniKorn UI. If you do not know how to access the
 please read the document [here](../../get_started/get_started.md#access-the-web-ui).
 
 ![tf-job-on-ui](../../assets/tf-job-on-ui.png)
+
+## Using Time-Slicing GPU
+
+### Prerequisite
+To use Time-Slicing GPU your cluster must be configured to use GPUs and Time-Slicing GPUs.
+- Nodes must have GPUs attached.
+- Kubernetes version 1.24
+- GPU drivers must be installed on the cluster
+- Use the [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html) to automatically setup and manage the NVIDA software components on the worker nodes.
+- Set the Configuration of [Time-Slicing GPUs in Kubernetes](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html)
+
+
+
+Once the GPU Operator and Time-Slicing GPUs is installed, check the status of the pods to ensure all the containers are running and the validation is complete :
+```shell script
+kubectl get pod -n gpu-operator
+```
+```shell script
+NAME                                                          READY   STATUS      RESTARTS       AGE
+gpu-feature-discovery-fd5x4                                   2/2     Running     0              5d2h
+gpu-operator-569d9c8cb-kbn7s                                  1/1     Running     14 (39h ago)   5d2h
+gpu-operator-node-feature-discovery-master-84c7c7c6cf-f4sxz   1/1     Running     0              5d2h
+gpu-operator-node-feature-discovery-worker-p5plv              1/1     Running     8 (39h ago)    5d2h
+nvidia-container-toolkit-daemonset-zq766                      1/1     Running     0              5d2h
+nvidia-cuda-validator-5tldf                                   0/1     Completed   0              5d2h
+nvidia-dcgm-exporter-95vm8                                    1/1     Running     0              5d2h
+nvidia-device-plugin-daemonset-7nzvf                          2/2     Running     0              5d2h
+nvidia-device-plugin-validator-gj7nn                          0/1     Completed   0              5d2h
+nvidia-operator-validator-nz84d                               1/1     Running     0              5d2h
+```
+Verify that the time-slicing configuration is applied successfully :
+
+```shell script
+kubectl describe node
+```
+
+```shell script
+Capacity:
+  nvidia.com/gpu:     16
+...
+Allocatable:
+  nvidia.com/gpu:     16
+...
+```
+### Teasting TensorFlow job with GPUs

Review Comment:
   Typo: `Teasting`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] wusamzong commented on pull request #195: [YUNIKORN-1339] Adding time slicing GPU to Tensorflow example

Posted by GitBox <gi...@apache.org>.
wusamzong commented on PR #195:
URL: https://github.com/apache/yunikorn-site/pull/195#issuecomment-1279150806

   I correct the typo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-site] wusamzong commented on pull request #195: [YUNIKORN-1339] Adding time slicing GPU to Tensorflow example

Posted by GitBox <gi...@apache.org>.
wusamzong commented on PR #195:
URL: https://github.com/apache/yunikorn-site/pull/195#issuecomment-1278362897

   @wilfred-s  
   I delete the previous branch, and create new with just my change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org