You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/05/03 01:18:55 UTC

[GitHub] huyangc opened a new issue #10788: Start a process for training. The training get stuck

huyangc opened a new issue #10788: Start a process for training. The training get stuck
URL: https://github.com/apache/incubator-mxnet/issues/10788
 
 
   I start a process for training procedure, including loading data, prepare a module, and module.fit. However, when I just ran the training procedure, it can be successfully started, but when I start a process of the training procedure, it get stuck with the gpu memory allocated but GPU-Util is also 0%.
   
   some pseudocode here
   ```
   def train_net(args):
           sym = build_net()
           mod = mx.mod.Module(sym, ctx=gpu(0-7))
           initializer=mx.xiavier
           dataiter = get_dataiter()
           mod.fit(...)
   
   def train():
           p = mp.Process(target=train_net, args=(args,))
           p.start()
           p.join()
   
   ```
           

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services