You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/09/28 15:31:19 UTC

[GitHub] fhieber edited a comment on issue #8532: mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL

fhieber edited a comment on issue #8532: mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
URL: https://github.com/apache/incubator-mxnet/issues/8532#issuecomment-425473905
 
 
   Thanks @pengzhao-intel, here is a minimal example to reproduce the issue.
   You can run this:
   ```
   conda install mkl
   conda install numpy
   pip install mxnet-mkl
   git clone https://github.com/awslabs/sockeye.git
   cd sockeye
   python -m sockeye.train --num-layers 1 -s setup.py -t setup.py -vs setup.py -vt setup.py -o test_model --batch-size 5 --batch-type sentence --num-embed 8 --transformer-model-size 8 --overwrite-output --use-cpu --transformer-attention-heads 1 --checkpoint-frequency 100 --decode-and-evaluate 2
   ```
   (this will train a tiny model on the setup.py file, but will hang once reached 100 updates and spawns a CheckpointDecoder subprocess to decode 2 sentences of the validation data.
   This will hang with the last log line being:
   ```
   [INFO:sockeye.training] Starting process: Decoder-1
   ```
   
   If you set `--decode-and-evaluate 0`, no decoder subprocess will be started at each checkpoint, and training runs fine.
   
   If you run
   ```
   conda uninstall mkl
   conda uninstall numpy
   pip install numpy
   ```
   and run the same training with `--decode-and-evaluate > 0`, no hanging will occur.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services