You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/10/27 06:40:31 UTC

[GitHub] [incubator-mxnet] ZHEQIUSHUI opened a new issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

ZHEQIUSHUI opened a new issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430


   ## Description
   (A clear and concise description of what the bug is.)
   
   ```
   [14:35:41] F:\BaiduNetdiskDownload\incubator-mxnet\src\io\iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(16,784)
   [14:35:41] F:\BaiduNetdiskDownload\incubator-mxnet\src\io\iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(16,784)
   [14:35:41] F:\BaiduNetdiskDownload\incubator-mxnet\src\executor\graph_executor.cc:2061: Subgraph backend MKLDNN is activated.
   [14:35:41] F:\Code\mxnet_classify_train\train.cpp:126: Epoch: 0
   [14:35:42] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 1 Train Accuracy: 0.125 Train Loss: 2.30259
   [14:35:42] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 2 Train Accuracy: 0.1875 Train Loss: 2.30146
   [14:35:42] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 3 Train Accuracy: 0.1875 Train Loss: 2.30162
   [14:35:42] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 4 Train Accuracy: 0.171875 Train Loss: 2.30361
   [14:35:43] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 5 Train Accuracy: 0.15 Train Loss: 2.30592
   [14:35:43] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 6 Train Accuracy: 0.135417 Train Loss: 2.30718
   [14:35:43] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 7 Train Accuracy: 0.142857 Train Loss: 2.30081
   [14:35:43] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 8 Train Accuracy: 0.132813 Train Loss: 2.30486
   [14:35:43] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 9 Train Accuracy: 0.145833 Train Loss: 2.3007
   [14:35:44] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 10 Train Accuracy: 0.13125 Train Loss: 2.31614
   [14:35:44] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 11 Train Accuracy: 0.119318 Train Loss: 2.3079
   [14:35:44] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 12 Train Accuracy: 0.114583 Train Loss: 2.30001
   [14:35:44] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 13 Train Accuracy: 0.105769 Train Loss: 2.30863
   [14:35:44] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 14 Train Accuracy: 0.102679 Train Loss: 2.3078
   [14:35:45] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 15 Train Accuracy: 0.1125 Train Loss: 2.29704
   [14:35:45] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 16 Train Accuracy: 0.109375 Train Loss: 2.30283
   [14:35:45] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 17 Train Accuracy: 0.117647 Train Loss: 2.2879
   [14:35:45] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 18 Train Accuracy: 0.111111 Train Loss: 2.30765
   [14:35:46] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 19 Train Accuracy: 0.108553 Train Loss: 2.29667
   [14:35:46] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 20 Train Accuracy: 0.10625 Train Loss: 2.30143
   [14:35:46] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 21 Train Accuracy: 0.104167 Train Loss: 2.31742
   [14:35:46] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 22 Train Accuracy: 0.107955 Train Loss: 2.3004
   [14:35:46] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 23 Train Accuracy: 0.105978 Train Loss: 2.30668
   [14:35:47] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 24 Train Accuracy: 0.104167 Train Loss: 2.3081
   [14:35:47] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 25 Train Accuracy: 0.105 Train Loss: 2.31609
   [14:35:47] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 26 Train Accuracy: 0.103365 Train Loss: 2.2924
   [14:35:47] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 27 Train Accuracy: 0.101852 Train Loss: 2.29345
   [14:35:47] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 28 Train Accuracy: 0.100446 Train Loss: 2.32269
   [14:35:48] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 29 Train Accuracy: 0.0969828 Train Loss: 2.30808
   [14:35:48] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 30 Train Accuracy: 0.0958333 Train Loss: 2.29545
   [14:35:48] F:\Code\mxnet_classify_train\train.cpp:148: EPOCH: 0 ITER: 31 Train Accuracy: 0.0947581 Train Loss: 2.32122
   ```
   
   
   
   ## What have you tried to solve it?
   
   1.change lr
   2.add LRScheduler
   
   ## Environment
   
   windows x64 cpp
   mxnet-v1.7.x compile with mkl (no cuda)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717119210


   @bgawrych 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717625298


   @bgawrych it is work!!!thank you so much!!!!
   
   ![image](https://user-images.githubusercontent.com/46700201/97376786-42b4ca80-18f9-11eb-8a53-0495b0196e4c.png)
   but i use the same (dafult)parameter in python that is work.maybe there is any diff in internal.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717039319


   @roywei dalao!!help!!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
pengzhao-intel commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-718327035


   > I don't know if you're using example from _cpp-package/example/resnet.cpp_, but I couldn't run this example - I fixed it by replacing
   > `Xavier xavier = Xavier(Xavier::gaussian, Xavier::in, 2);`
   > with:
   > `Xavier xavier = Xavier(Xavier::gaussian, Xavier::out, 2);`
   > First run worked fine and net was learning, but second run shows similar issue with yours.
   > Probably it's problem with parameter initialization.
   > I fixed it by replacing:
   > 
   > ```
   >   //Xavier xavier = Xavier(Xavier::gaussian, Xavier::out, 2); <- Replace xavier with default Initializer
   >   Initializer init = Initializer();
   >   for (auto &arg : args_map) {
   >     std::cout << arg.first << std::endl;
   >     init(arg.first, &arg.second); // <- changed variable name
   >   }
   > ```
   
   Do you check-in the fix?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI closed issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI closed issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717626515


   ```
   // initialize parameters
       //Xavier xavier = Xavier(Xavier::gaussian, Xavier::out, 3.0f);
       Initializer init = Initializer();
       for (auto& arg : args_map) {
           //xavier(arg.first, &arg.second);
           init(arg.first, &arg.second);
       }
   ```
   they are work!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] bgawrych commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
bgawrych commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717219270


   I don't know if you're using example from _cpp-package/example/resnet.cpp_, but I couldn't run this example - I fixed it by replacing
   `Xavier xavier = Xavier(Xavier::gaussian, Xavier::in, 2);`
   with:
   `Xavier xavier = Xavier(Xavier::gaussian, Xavier::out, 2);`
   First run worked fine and net was learning, but second run shows similar issue with yours.
   Probably it's problem with parameter initialization.
   I fixed it by replacing:
   ```
     //Xavier xavier = Xavier(Xavier::gaussian, Xavier::out, 2); <- Replace xavier with default Initializer
     Initializer init = Initializer();
     for (auto &arg : args_map) {
       std::cout << arg.first << std::endl;
       init(arg.first, &arg.second); // <- changed variable name
     }
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717038836


   ![image](https://user-images.githubusercontent.com/46700201/97268422-d9847700-1866-11eb-8fd2-50eeb1e44a5d.png)
   got almost same acc and loss for each epoch....


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717112193


   now!i make sure mlp.cpp and alexnet.cpp can normal train(acc increase,loss decrease),but resnet is still wrong.....i don`t know the issue is in net or train process.
   
   i make the resnet in alexnet.cpp or alexnet in resnet.cpp,both of they got same issue(acc does not increase)...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI edited a comment on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI edited a comment on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717112193


   now!i make sure mlp.cpp and alexnet.cpp can normal train(acc increase,loss decrease),but resnet is still wrong.....i don`t know the issue is in net or train process.
   
   i try to put resnet in alexnet.cpp or alexnet in resnet.cpp,both of they got same issue(acc does not increase)...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] ZHEQIUSHUI commented on issue #19430: CPP resnet examples training acc does not increase,loss does not decrease

Posted by GitBox <gi...@apache.org>.
ZHEQIUSHUI commented on issue #19430:
URL: https://github.com/apache/incubator-mxnet/issues/19430#issuecomment-717056407


   ![2@CE9A~DCF~DWT5$E3U2P H](https://user-images.githubusercontent.com/46700201/97272020-4a7a5d80-186c-11eb-9eaf-a663a67b2853.png)
   the pred vector got a same number???


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org