You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/10/01 18:24:28 UTC

[GitHub] fhieber opened a new issue #12710: Process Deadlock with mxnet-mkl and mkl-optimized numpy

fhieber opened a new issue #12710: Process Deadlock with mxnet-mkl and mkl-optimized numpy
URL: https://github.com/apache/incubator-mxnet/issues/12710
 
 
   ## Description
   mxnet-mkl hangs indefinitely when trying to spawn subprocesses (using mxnet) in an environment that uses MKL-optimized numpy. This is a recent issue we are observing with Sockeye and may be related to #8532, but it can be reproduced without Sockeye (see below).
   
   ## Environment info (Required)
   - Python 3.6.6
   - MacOs
   - mxnet-mkl==1.3.0.post0
   - Anaconda Numpy (with MKL optimization): `conda install mkl ; conda install numpy`
   
   ## Minimum reproducible example
   The following code reliably reproduces the deadlock/indefinite hang in the main process.
   It creates a minimal module and 'trains' for 500 iterations, spawning itself in 'testing mode' every 100 iterations. The testing mode is the same mxnet code, ran for fewer iterations. The main process is supposed to wait until the subprocess finishes before starting the next one.
   
   code.py:
   ```python
   import subprocess
   import sys
   
   import mxnet as mx
   
   if __name__ == '__main__':
   
       if len(sys.argv) > 1:
           print("TESTING")
           test = True
           iterations = 50
       else:
           print("TRAINING")
           test = False
           iterations = 500
   
       x = mx.sym.Variable('x')
       y = mx.sym.Variable('y')
   
       sym = mx.sym.FullyConnected(x, num_hidden=5)
       sym = mx.sym.SoftmaxOutput(sym, y)
   
       x_data = mx.nd.uniform(0, 1, (32, 16))
       y_data = mx.nd.zeros((32, 5))
       batch = mx.io.DataBatch(data=[x_data], label=[y_data])
   
       mod = mx.mod.Module(sym, data_names=['x'], label_names=['y'])
       mod.bind(data_shapes=[mx.io.DataDesc('x', shape=x_data.shape)],
                label_shapes=[mx.io.DataDesc('y', shape=y_data.shape)],
                for_training=True, grad_req='write' if not test else 'null')
       mod.init_params()
       mod.init_optimizer()
       process = None
       for i in range(iterations):
           mod.forward(batch)
           if not test:
               mod.backward()
               mod.update()
           if i % 100 == 0 and i > 0:
               print(i)
               if not test:
                   if process:
                       print("Waiting for process")
                       process.wait()
                   cmd = [sys.executable, sys.argv[0], 'test']
                   print("Starting process: '%s'" % " ".join(cmd))
                   process = subprocess.Popen(cmd)
       if process:
           process.wait()
   ```
   
   ## Steps to reproduce
   1. conda install mkl
   2. conda install numpy
   3. pip install mxnet-mkl
   4. python3 code.py
   
   ## What have you tried to solve it?
   Replacing `mxnet-mkl` with `mxnet` or conda Numpy with pip-installed numpy (`conda uninstall numpy; conda uninstall mkl; pip install numpy`) resolves the issue and the output is as expected:
   ```
   TRAINING
   100
   Starting process: '/Users/fhieber/miniconda3/bin/python3 sockeye/process_test.py test'
   200
   Waiting for process
   TESTING
   Starting process: '/Users/fhieber/miniconda3/bin/python3 sockeye/process_test.py test'
   300
   Waiting for process
   TESTING
   Starting process: '/Users/fhieber/miniconda3/bin/python3 sockeye/process_test.py test'
   400
   Waiting for process
   TESTING
   Starting process: '/Users/fhieber/miniconda3/bin/python3 sockeye/process_test.py test'
   TESTING
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services