You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/06/01 10:37:06 UTC

[GitHub] juliusshufan opened a new issue #11122: Loss value is "nan" when using gluon vision models training CIFAR10 when model is hybridized.

juliusshufan opened a new issue #11122: Loss value is "nan" when using gluon vision models training CIFAR10 when model is hybridized.
URL: https://github.com/apache/incubator-mxnet/issues/11122
 
 
   ## Description
   I am drafting a training script based on the example/gluon/image_classfication.py, and the script can be found https://github.com/juliusshufan/mxnet/blob/master/gluonmodel/image_classification.py
   The major changes is I am trying to get the loss value (cross-entropy loss for this case) during training. (see line 30 and 246), but the reported loss value is always nan. 
   
   However, if I set the --mode as "symbolic", with my changes on line 264~266, the loss value can be reported with a normal value.
   
   ## Environment info (Required)
   CentOS 7.2 CUDA-9.0
   ```
   What to do:
   Running the script:
   python image_classification.py --dataset=cifar10 --model=resnet50_v1 --batch-size=64 --gpus=0
   The loss will always be nan
   If using 
   python image_classification.py --dataset=cifar10 --model=resnet50_v1 --batch-size=64 --gpus=0 **--mode=symbolic**
   Then the loss value is normal 
   ```
   
   Package used (Python/R/Scala/Julia):
   Python
   
   ## Build info (Required if built from source)
   make -j USE_CUDA=1 USE_CUDNN=1 USE_CUDA_PATH=cudapath USE_BLAS=openblas USE_OPENCV=1
   
   Compiler (gcc/clang/mingw/visual studio):
   GCC 4.8.5
   MXNet commit hash:
   (Paste the output of `git rev-parse HEAD` here.)
   
   ## Steps to reproduce
   Clone my script: https://github.com/juliusshufan/mxnet/tree/master/gluonmodel 
   
   Running the script:
   python image_classification.py --dataset=cifar10 --model=resnet50_v1 --batch-size=64 --gpus=0
   The loss will always be nan
   If using 
   python image_classification.py --dataset=cifar10 --model=resnet50_v1 --batch-size=64 --gpus=0 **--mode=symbolic**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services