You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by gi...@git.apache.org on 2017/07/31 02:41:16 UTC

[GitHub] herryz commented on issue #6548: [feature req / question ] slurm scheduler support for launch.py? (distrubted processing / multi-machines support)

herryz commented on issue #6548: [feature req / question ] slurm scheduler support for launch.py? (distrubted processing / multi-machines support)
URL: https://github.com/apache/incubator-mxnet/issues/6548#issuecomment-318953613
 
 
   @al-rigazzi yep, you understand correctly, but in my case every node has 1 server and 1 worker progress, and the master node also have 1 root progress. 
   
   And.. sorry I'm a little confused. In your solution, do you mean I have to use launch.py to start my job? 
   like add  'python ../../tools/launch.py -n 2 -H hosts  --launcher=slurm python train_cifar10.py'   into my slurm jobfile? 
   
   in my case, I first request resources in jobfile, like 2 nodes and 2 gpus on each node, then I consume them in 5 srun (1 root, 2 servers ,2 workers), use
   root:  'srun -N 1 -n 1 -gres=gpu:0 --nodelist=node1 -l python ...'
   server: 'srun -N 1 -n 1 -gres=gpu:0 --nodelist=node1 -l python ...'  
   worker:'srun -N 1 -n 1 -gres=gpu:1 --nodelist=node1 -l python ...'
   server: 'srun -N 1 -n 1 -gres=gpu:0 --nodelist=node2 -l python ...'  
   worker:'srun -N 1 -n 1 -gres=gpu:1 --nodelist=node2 -l python ...'  
   then the root and server 'srun' will meet an error that they cannot find gpus resource(because of I use 'context=mx.gpu() ' in my code).
   
   and of course I add the env before each 'srun'.
   so, do you have any idea? 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services