You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/09/28 15:31:19 UTC
[GitHub] fhieber edited a comment on issue #8532: mxnet-mkl (v0.12.0) crash
when using (conda-installed) numpy with MKL
fhieber edited a comment on issue #8532: mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
URL: https://github.com/apache/incubator-mxnet/issues/8532#issuecomment-425473905
Thanks @pengzhao-intel, here is a minimal example to reproduce the issue.
You can run this:
```
conda install mkl
conda install numpy
pip install mxnet-mkl
git clone https://github.com/awslabs/sockeye.git
cd sockeye
python -m sockeye.train --num-layers 1 -s setup.py -t setup.py -vs setup.py -vt setup.py -o test_model --batch-size 5 --batch-type sentence --num-embed 8 --transformer-model-size 8 --overwrite-output --use-cpu --transformer-attention-heads 1 --checkpoint-frequency 100 --decode-and-evaluate 2
```
(this will train a tiny model on the setup.py file, but will hang once reached 100 updates and spawns a CheckpointDecoder subprocess to decode 2 sentences of the validation data.
This will hang with the last log line being:
```
[INFO:sockeye.training] Starting process: Decoder-1
```
If you set `--decode-and-evaluate 0`, no decoder subprocess will be started at each checkpoint, and training runs fine.
If you run
```
conda uninstall mkl
conda uninstall numpy
pip install numpy
```
and run the same training with `--decode-and-evaluate > 0`, no hanging will occur.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services