You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by GitBox <gi...@apache.org> on 2020/02/06 10:17:59 UTC

[GitHub] [submarine] lowc1012 opened a new pull request #174: SUBMARINE-202. submarine core need to support MXNet

lowc1012 opened a new pull request #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174
 
 
   ### What is this PR for?
   To support MXNet framework in Submarine
   
   ### What type of PR is it?
   [Improvement]
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   [SUBMARINE-202](https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-202)
   
   ### How should this be tested?
   [passed CI](https://travis-ci.org/lowc1012/submarine/builds/646785076)
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * Does the licenses files need update? No
   * Is there breaking changes for older versions? No
   * Does this needs documentation? No
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#discussion_r377139787
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarn/src/main/java/org/apache/submarine/server/submitter/yarn/YarnUtils.java
 ##########
 @@ -203,4 +147,118 @@ private static Resource getResource(Parameter parametersHolder, String option)
     }
     return ResourceUtils.createResourceFromString(resourceStr);
   }
+
+  private static void setParametersForWorker (Configuration tonyConf,
+          ParametersHolder parameters) throws YarnException, ParseException {
+    tonyConf.setStrings(
+            TonyConfigurationKeys.getInstancesKey(Constants.WORKER_JOB_NAME),
+            parameters.getOptionValue(CliConstants.N_WORKERS));
+
+    if (parameters.getOptionValue(CliConstants.WORKER_RES) != null) {
+      Resource workerResource = getResource(parameters, CliConstants.WORKER_RES);
+
+      tonyConf.setInt(
+              TonyConfigurationKeys.getResourceKey(Constants.WORKER_JOB_NAME,
+                      Constants.VCORES),
+              workerResource.getVirtualCores());
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.WORKER_JOB_NAME,
+                      Constants.MEMORY),
+              ResourceUtils.getMemorySize(workerResource));
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.WORKER_JOB_NAME,
+                      Constants.GPUS),
+              ResourceUtils.getResourceValue(workerResource,
+                      ResourceUtils.GPU_URI));
+    }
+
+    if (parameters.getOptionValue(CliConstants.WORKER_DOCKER_IMAGE) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getDockerImageKey(Constants.WORKER_JOB_NAME),
+              parameters.getOptionValue(CliConstants.WORKER_DOCKER_IMAGE));
+      tonyConf.setBoolean(TonyConfigurationKeys.DOCKER_ENABLED, true);
+    }
+
+    if (parameters.getOptionValue(CliConstants.WORKER_LAUNCH_CMD) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getExecuteCommandKey(Constants.WORKER_JOB_NAME),
+              parameters.getOptionValue(CliConstants.WORKER_LAUNCH_CMD));
+    }
+  }
+
+  private static void setParametersForPS (Configuration tonyConf,
+          ParametersHolder parameters) throws YarnException, ParseException {
+    String jobName = Constants.PS_JOB_NAME;
+    if (parameters.getFramework() == Framework.MXNET) {
+      jobName = Constants.SERVER_JOB_NAME;
+    }
+
+    if (parameters.getOptionValue(CliConstants.N_PS) != null) {
+      tonyConf.setStrings(
+              TonyConfigurationKeys.getInstancesKey(jobName),
+              parameters.getOptionValue(CliConstants.N_PS));
+    }
+    if (parameters.getOptionValue(CliConstants.PS_RES) != null) {
+      Resource psResource = getResource(parameters, CliConstants.PS_RES);
+
+      tonyConf.setInt(
+              TonyConfigurationKeys.getResourceKey(jobName,
+                      Constants.VCORES),
+              psResource.getVirtualCores());
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(jobName,
+                      Constants.MEMORY),
+              ResourceUtils.getMemorySize(psResource));
+    }
+
+    if (parameters.getOptionValue(CliConstants.PS_LAUNCH_CMD) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getExecuteCommandKey(jobName),
+              parameters.getOptionValue(CliConstants.PS_LAUNCH_CMD));
+    }
+
+    if (parameters.getOptionValue(CliConstants.PS_DOCKER_IMAGE) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getDockerImageKey(jobName),
+              parameters.getOptionValue(CliConstants.PS_DOCKER_IMAGE));
+      tonyConf.setBoolean(TonyConfigurationKeys.DOCKER_ENABLED, true);
+    }
+  }
+
+  private static void setParametersForScheduler (Configuration tonyConf,
+          ParametersHolder parameters) throws YarnException, ParseException {
+    if (parameters.getOptionValue(CliConstants.N_SCHEDULERS) != null) {
+      tonyConf.setStrings(
+              TonyConfigurationKeys.getInstancesKey(Constants.SCHEDULER_JOB_NAME),
+              parameters.getOptionValue(CliConstants.N_SCHEDULERS));
+    }
+
+    if (parameters.getOptionValue(CliConstants.SCHEDULER_RES) != null) {
+      Resource schedulerResource = getResource(parameters, CliConstants.SCHEDULER_RES);
+
+      tonyConf.setInt(
+              TonyConfigurationKeys.getResourceKey(Constants.SCHEDULER_JOB_NAME,
+                      Constants.VCORES),
+              schedulerResource.getVirtualCores());
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.SCHEDULER_JOB_NAME,
+                      Constants.MEMORY),
+              ResourceUtils.getMemorySize(schedulerResource));
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.SCHEDULER_JOB_NAME,
 
 Review comment:
   if MXNET scheduler don't use GPU, we should remove these lines as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-583202932
 
 
   @lowc1012,
   Thanks for the contributions. Can you take a look at the travis error?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] yuanzac edited a comment on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
yuanzac edited a comment on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-586082542
 
 
   @lowc1012,
   Thanks, LGTM.
   @pingsutw, The modification of YarnUtils may conflict with your PR #179. Can you help to take a look.
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] liuxunorg commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
liuxunorg commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-584080614
 
 
   @lowc1012 Can you rebase the development branch and commit it?
   
   Need to execute the following type command.
   ```
   git rebase upstream/master
   git push -f origin SUBMARINE-202:SUBMARINE-202
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
pingsutw commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-586237356
 
 
   +1, LGTM. Thanks @lowc1012 
   @yuanzac He just adds a `setParametersForScheduler(tonyConf, parameters);` for MXNET
   After it merges, I will rebase PR #179 and fix conflict 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-585521799
 
 
   Any more comments? @lowc1012, @pingsutw, @liuxunorg, @tangzhankun

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#discussion_r377138834
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarn/src/main/java/org/apache/submarine/server/submitter/yarn/YarnUtils.java
 ##########
 @@ -54,67 +55,22 @@ public static Configuration tonyConfFromClientContext(
             parameters.getFramework().getValue());
     tonyConf.setStrings(TonyConfigurationKeys.APPLICATION_NAME,
             parameters.getParameters().getName());
-    tonyConf.setStrings(
-        TonyConfigurationKeys.getInstancesKey(Constants.WORKER_JOB_NAME),
-            parameters.getOptionValue(CliConstants.N_WORKERS));
-    if (parameters.getOptionValue(CliConstants.N_PS) != null) {
-      tonyConf.setStrings(
-              TonyConfigurationKeys.getInstancesKey(Constants.PS_JOB_NAME),
-              parameters.getOptionValue(CliConstants.N_PS));
-    }
-    // Resources for PS & Worker
-    if (parameters.getOptionValue(CliConstants.PS_RES) != null) {
-      Resource psResource = getResource(parameters, CliConstants.PS_RES);
 
-      tonyConf.setInt(
-          TonyConfigurationKeys.getResourceKey(Constants.PS_JOB_NAME,
-              Constants.VCORES),
-              psResource.getVirtualCores());
-      tonyConf.setLong(
-          TonyConfigurationKeys.getResourceKey(Constants.PS_JOB_NAME,
-              Constants.MEMORY),
-          ResourceUtils.getMemorySize(psResource));
-    }
-    if (parameters.getOptionValue(CliConstants.WORKER_RES) != null) {
-      Resource workerResource = getResource(parameters, CliConstants.WORKER_RES);
+    setParametersForWorker(tonyConf, parameters);
+    setParametersForPS(tonyConf, parameters);
+    setParametersForScheduler(tonyConf, parameters);
 
 Review comment:
   We could add a function to check if submarine job uses MXNET.
   If we use MXNET and then we `setParametersForScheduler` for it.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] liuxunorg commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
liuxunorg commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-584156724
 
 
   @pingsutw @yuanzac @tangzhankun  Can you help review this PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] asfgit closed pull request #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
pingsutw commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-582865062
 
 
   @lowc1012 Thanks for the contribution.
   - We also need to add `RunJobCliParsingMxnetTest.java`, `RunJobCliParsingMxnetYamlTest.java` like `submarine/submarine-client/src/test/java/org/apache/submarine/client/cli/runjob/pytorch/`
   https://github.com/apache/submarine/tree/master/submarine-client/src/test/java/org/apache/submarine/client/cli/runjob/pytorch
   - This PR looks like we only support MXnet run on TonY runtime, we need to make sure it can also support RPC runtime and restful runtime, you could file another JIRA ticket 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-586082542
 
 
   @lowc1012,
   Thanks, LGTM.
   @pingsutw, The modification of YarnUtils may conflicts with your PR #179. Can you help to take a look.
   Thanks

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] lowc1012 commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
lowc1012 commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#discussion_r377434457
 
 

 ##########
 File path: submarine-server/server-submitter/submitter-yarn/src/main/java/org/apache/submarine/server/submitter/yarn/YarnUtils.java
 ##########
 @@ -203,4 +147,118 @@ private static Resource getResource(Parameter parametersHolder, String option)
     }
     return ResourceUtils.createResourceFromString(resourceStr);
   }
+
+  private static void setParametersForWorker (Configuration tonyConf,
+          ParametersHolder parameters) throws YarnException, ParseException {
+    tonyConf.setStrings(
+            TonyConfigurationKeys.getInstancesKey(Constants.WORKER_JOB_NAME),
+            parameters.getOptionValue(CliConstants.N_WORKERS));
+
+    if (parameters.getOptionValue(CliConstants.WORKER_RES) != null) {
+      Resource workerResource = getResource(parameters, CliConstants.WORKER_RES);
+
+      tonyConf.setInt(
+              TonyConfigurationKeys.getResourceKey(Constants.WORKER_JOB_NAME,
+                      Constants.VCORES),
+              workerResource.getVirtualCores());
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.WORKER_JOB_NAME,
+                      Constants.MEMORY),
+              ResourceUtils.getMemorySize(workerResource));
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.WORKER_JOB_NAME,
+                      Constants.GPUS),
+              ResourceUtils.getResourceValue(workerResource,
+                      ResourceUtils.GPU_URI));
+    }
+
+    if (parameters.getOptionValue(CliConstants.WORKER_DOCKER_IMAGE) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getDockerImageKey(Constants.WORKER_JOB_NAME),
+              parameters.getOptionValue(CliConstants.WORKER_DOCKER_IMAGE));
+      tonyConf.setBoolean(TonyConfigurationKeys.DOCKER_ENABLED, true);
+    }
+
+    if (parameters.getOptionValue(CliConstants.WORKER_LAUNCH_CMD) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getExecuteCommandKey(Constants.WORKER_JOB_NAME),
+              parameters.getOptionValue(CliConstants.WORKER_LAUNCH_CMD));
+    }
+  }
+
+  private static void setParametersForPS (Configuration tonyConf,
+          ParametersHolder parameters) throws YarnException, ParseException {
+    String jobName = Constants.PS_JOB_NAME;
+    if (parameters.getFramework() == Framework.MXNET) {
+      jobName = Constants.SERVER_JOB_NAME;
+    }
+
+    if (parameters.getOptionValue(CliConstants.N_PS) != null) {
+      tonyConf.setStrings(
+              TonyConfigurationKeys.getInstancesKey(jobName),
+              parameters.getOptionValue(CliConstants.N_PS));
+    }
+    if (parameters.getOptionValue(CliConstants.PS_RES) != null) {
+      Resource psResource = getResource(parameters, CliConstants.PS_RES);
+
+      tonyConf.setInt(
+              TonyConfigurationKeys.getResourceKey(jobName,
+                      Constants.VCORES),
+              psResource.getVirtualCores());
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(jobName,
+                      Constants.MEMORY),
+              ResourceUtils.getMemorySize(psResource));
+    }
+
+    if (parameters.getOptionValue(CliConstants.PS_LAUNCH_CMD) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getExecuteCommandKey(jobName),
+              parameters.getOptionValue(CliConstants.PS_LAUNCH_CMD));
+    }
+
+    if (parameters.getOptionValue(CliConstants.PS_DOCKER_IMAGE) != null) {
+      tonyConf.set(
+              TonyConfigurationKeys.getDockerImageKey(jobName),
+              parameters.getOptionValue(CliConstants.PS_DOCKER_IMAGE));
+      tonyConf.setBoolean(TonyConfigurationKeys.DOCKER_ENABLED, true);
+    }
+  }
+
+  private static void setParametersForScheduler (Configuration tonyConf,
+          ParametersHolder parameters) throws YarnException, ParseException {
+    if (parameters.getOptionValue(CliConstants.N_SCHEDULERS) != null) {
+      tonyConf.setStrings(
+              TonyConfigurationKeys.getInstancesKey(Constants.SCHEDULER_JOB_NAME),
+              parameters.getOptionValue(CliConstants.N_SCHEDULERS));
+    }
+
+    if (parameters.getOptionValue(CliConstants.SCHEDULER_RES) != null) {
+      Resource schedulerResource = getResource(parameters, CliConstants.SCHEDULER_RES);
+
+      tonyConf.setInt(
+              TonyConfigurationKeys.getResourceKey(Constants.SCHEDULER_JOB_NAME,
+                      Constants.VCORES),
+              schedulerResource.getVirtualCores());
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.SCHEDULER_JOB_NAME,
+                      Constants.MEMORY),
+              ResourceUtils.getMemorySize(schedulerResource));
+      tonyConf.setLong(
+              TonyConfigurationKeys.getResourceKey(Constants.SCHEDULER_JOB_NAME,
 
 Review comment:
   Thanks for the review! I will remove it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] lowc1012 commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
lowc1012 commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#discussion_r377434429
 
 

 ##########
 File path: submarine-client/src/main/java/org/apache/submarine/client/cli/runjob/RunJobCli.java
 ##########
 @@ -219,6 +224,23 @@ private void addTensorboardOptions(Options options) {
             CAN_BE_USED_WITH_TF_ONLY);
   }
 
+  private void addSchedulerOptions(Options options) {
+    options.addOption(CliConstants.N_SCHEDULERS, true,
+        "Number of scheduler tasks of the job. " +
+        "It should be 1 or 0, by default it's 0."+
+        CAN_BE_USED_WITH_MXNET_ONLY);
+    options.addOption(CliConstants.SCHEDULER_DOCKER_IMAGE, true,
+        "Specify docker image for scheduler, when this is not specified, " +
+        "scheduler uses --" + CliConstants.DOCKER_IMAGE +
+        " as default. " + CAN_BE_USED_WITH_MXNET_ONLY);
+    options.addOption(CliConstants.SCHEDULER_LAUNCH_CMD, true,
+        "Commandline of scheduler, arguments will be " +
+        "directly used to launch the scheduler. " + CAN_BE_USED_WITH_MXNET_ONLY);
+    options.addOption(CliConstants.SCHEDULER_RES, true,
+        "Resource of each scheduler, for example " +
+        "memory-mb=2048,vcores=2,yarn.io/gpu=2. " + CAN_BE_USED_WITH_MXNET_ONLY);
 
 Review comment:
   Thanks for the review! I will remove it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
yuanzac commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-585006515
 
 
   LGTM~
   @lowc1012,
   Thanks for the contributions. Have you tested the MXNet job on yarn cluster? 
   With this PR, we can run MXNet job with both command parameters and yaml parameter, Right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] pingsutw commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#discussion_r377134005
 
 

 ##########
 File path: submarine-client/src/main/java/org/apache/submarine/client/cli/runjob/RunJobCli.java
 ##########
 @@ -219,6 +224,23 @@ private void addTensorboardOptions(Options options) {
             CAN_BE_USED_WITH_TF_ONLY);
   }
 
+  private void addSchedulerOptions(Options options) {
+    options.addOption(CliConstants.N_SCHEDULERS, true,
+        "Number of scheduler tasks of the job. " +
+        "It should be 1 or 0, by default it's 0."+
+        CAN_BE_USED_WITH_MXNET_ONLY);
+    options.addOption(CliConstants.SCHEDULER_DOCKER_IMAGE, true,
+        "Specify docker image for scheduler, when this is not specified, " +
+        "scheduler uses --" + CliConstants.DOCKER_IMAGE +
+        " as default. " + CAN_BE_USED_WITH_MXNET_ONLY);
+    options.addOption(CliConstants.SCHEDULER_LAUNCH_CMD, true,
+        "Commandline of scheduler, arguments will be " +
+        "directly used to launch the scheduler. " + CAN_BE_USED_WITH_MXNET_ONLY);
+    options.addOption(CliConstants.SCHEDULER_RES, true,
+        "Resource of each scheduler, for example " +
+        "memory-mb=2048,vcores=2,yarn.io/gpu=2. " + CAN_BE_USED_WITH_MXNET_ONLY);
 
 Review comment:
   Will MXnet scheduler server use GPU? if not, we can remove `yarn.io/gpu=2`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org


[GitHub] [submarine] lowc1012 commented on issue #174: SUBMARINE-202. submarine core need to support MXNet

Posted by GitBox <gi...@apache.org>.
lowc1012 commented on issue #174: SUBMARINE-202. submarine core need to support MXNet
URL: https://github.com/apache/submarine/pull/174#issuecomment-585545961
 
 
   @yuanzac, @pingsutw 
   Thanks for your review. I have tested on my YARN cluster.
   Yes, we can submit MXNet job with this PR.
   And I made some modifications, please review it . Thank you!
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org