You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sunil Govindan (JIRA)" <ji...@apache.org> on 2018/06/01 08:12:00 UTC

[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

    [ https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497708#comment-16497708 ] 

Sunil Govindan commented on YARN-8220:
--------------------------------------

Attaching v1 patch. This patch majorly covers all scripts/examples/docker file etc which will help to run Tensorflow on YARN (Distributed/Standalone).

Thank you very much [~leftnoteasy] for helping out to integrate TF in YARN with GPU/Docker.

 

Details of this work:
 # Script to auto-generate native service spec file for Tensorflow jobs which will auto submit service to YARN. This will help to run TF jobs on YARN without any complexity. Detailed example is available in the doc.
 # Support to run latest Tensorflow 1.8 and CUDA 9  on YARN.
 # Distributed Tensorflow support. User could simply run this by providing {{--distributed}} option the script and multiple *worker* could run in different nodes and could leverage the resources in YARN.
 # Dockerfile is provided for various cases (GPU/CPU, Different Tensorflow versions) etc.
 # Various tests are done based on TF version / GPU etc and results are published as part of the document in the patch.

Example:
{code:java}
python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed --kerberos
{code}
cc [~vinodkv] [~rohithsharma]

> Running Tensorflow on YARN with GPU and Docker - Examples
> ---------------------------------------------------------
>
>                 Key: YARN-8220
>                 URL: https://issues.apache.org/jira/browse/YARN-8220
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn-native-services
>            Reporter: Sunil Govindan
>            Assignee: Sunil Govindan
>            Priority: Critical
>         Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org