You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/05/17 06:57:50 UTC

[GitHub] liuzx32 opened a new issue #10984: Some wrong with training on yarn!?

liuzx32 opened a new issue #10984: Some wrong with training on yarn!?
URL: https://github.com/apache/incubator-mxnet/issues/10984
 
 
   ../../tools/launch.py -n 2 -s 2 --cluster=local python train_mnist.py --network lenet --kv-store dist_sync, run successful.
   So I adjust --cluster=yarn to mxnet on yarn
   ../../tools/launch.py -n 2 -s 2 --cluster=yarn python train_mnist.py --network lenet --kv-store dist_sync 
   ##
   ##
   But job failed Exit code: 1
   18/05/16 12:55:16 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
   18/05/16 12:55:16 INFO dmlc.ApplicationMaster: onContainerStarted Invoked
   18/05/16 12:55:17 INFO dmlc.ApplicationMaster: [DMLC] Task 0 exited with status 1 Diagnostics:Exception from container-launch.
   Container id: container_1498108406715_361607_01_000002
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
   	at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
   	at org.apache.hadoop.util.Shell.run(Shell.java:479)
   	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
   	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   	at java.lang.Thread.run(Thread.java:745)
   
   
   Container exited with a non-zero exit code 1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services