You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/10 21:35:17 UTC
[GitHub] [incubator-mxnet] szhengac opened a new issue #18024: Transformer
Model Segfault
szhengac opened a new issue #18024: Transformer Model Segfault
URL: https://github.com/apache/incubator-mxnet/issues/18024
Training transformer in [gluonnlp](https://github.com/dmlc/gluon-nlp/tree/master/scripts/machine_translation) with master brach leads to segmentation fault. The training passed with the nightly build on March 12th. I used AWS Linux AMI with cuda100.
Command:
`
python train_transformer.py --dataset WMT2014BPE --src_lang en --tgt_lang de --batch_size 2700 --optimizer adam --num_accumulated 16 --lr 3.0 --warmup_steps 4000 --save_dir transformer_en_de_u512 --epochs 30 --gpus 0,1,2,3,4,5,6,7 --scaled --average_start 5 --num_buckets 20 --bucket_scheme exp --bleu 13a --log_interval 10
`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] szhengac commented on issue #18024: Transformer
Model Segfault
Posted by GitBox <gi...@apache.org>.
szhengac commented on issue #18024: Transformer Model Segfault
URL: https://github.com/apache/incubator-mxnet/issues/18024#issuecomment-612329865
@leezu There is no any other message other than the segmentation fault.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] leezu commented on issue #18024: Transformer
Model Segfault
Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18024: Transformer Model Segfault
URL: https://github.com/apache/incubator-mxnet/issues/18024#issuecomment-612329359
Can you include the backtrace?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services