You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sunil Govindan (JIRA)" <ji...@apache.org> on 2018/08/07 14:09:00 UTC

[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

    [ https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571719#comment-16571719 ] 

Sunil Govindan commented on YARN-8561:
--------------------------------------

Thanks [~leftnoteasy] for the effort. I have tried to look through the approach and code. 
Few comments which is mixed or major and minor :)

1. I think we can used same CLI model of client where CLI extends Configured and implements Tool. This helps for tests. Also this helps to avoid abstract run method as its Tool.
2. We could also stop a job from CLI, correct? In that case, do we need to do some thing more extra than a simple yarn app -kill appId ?
3. I think we can use UnitsConversionUtil for unit convertion. CliUtils#parseResourcesString
4. In CapSchedConfig for absolute resource, we used a pattern match code.
{code}
public static final String PATTERN_FOR_ABSOLUTE_RESOURCE = "^\\[[\\w\\.,\\-_=\\ /]+\\]$";
private static final Pattern RESOURCE_PATTERN = Pattern.compile(PATTERN_FOR_ABSOLUTE_RESOURCE);
{code}
Could we use same in CLI as well?
5. May be rename JobState to SubmarineJobState
6. Commandline options looks very clean and thorough. I think as we go forward, more CLI options will be added. and it will become more complex. Could we load a profile to submarine and then use the profile get 80% of such config items. Given a profile, may be user might need to fill 1 or 2 variable arguments.
7. DevelopperGuide.md ==> DeveloperGuide.md

> [Submarine] Add initial implementation: training job submission and job history retrieve.
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-8561
>                 URL: https://issues.apache.org/jira/browse/YARN-8561
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Major
>         Attachments: YARN-8561.001.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org