You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/04 06:25:26 UTC

[GitHub] [incubator-mxnet] djaym7 opened a new issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch

djaym7 opened a new issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch
URL: https://github.com/apache/incubator-mxnet/issues/17971
 
 
   Given below are times for CPU-np, mxnet-CPU, mxnet-GPU, pytorch-CPU, pytorch-GPU.
   CPU times on numpy (np) and and pytorch CPU are comparable but mxnet is crazy slow.
   GPU time on pytorch is way faster than mxnet.
   
   hardware/software : sagemaker p3.2x latest (1.6.0 cu102mkl)
   
   
   ![image](https://user-images.githubusercontent.com/12378820/78420214-fa068e80-7601-11ea-9629-1f82cf7f8665.png)
   
   ![image](https://user-images.githubusercontent.com/12378820/78420204-e6f3be80-7601-11ea-8aea-702cf7b0de52.png)
   ![image](https://user-images.githubusercontent.com/12378820/78420210-f1ae5380-7601-11ea-8a8f-c1616fffdb83.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch
URL: https://github.com/apache/incubator-mxnet/issues/17971#issuecomment-610396391
 
 
   @anko-intel please help take a look for this issue, thanks.       

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anko-intel commented on issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch

Posted by GitBox <gi...@apache.org>.

anko-intel commented on issue #17971:
URL: https://github.com/apache/incubator-mxnet/issues/17971#issuecomment-633696597

Hi @djaym7
Thank you for your results. I observe some similarity in the results measured locally on Skylake-X i9-7920X on MxNet 1.6.0 cu102mkl binary. The only exception is the time for 512x512 tensor on MxNet(?).
MxNet compiled from master branch (on b2144777b - fix (#18313)) uses MKL if available, and the results are much better. But Mxnet is still worse than NumPy for smaller tensors.
![image](https://user-images.githubusercontent.com/58251767/82837689-13bda700-9eca-11ea-90af-0fbd15f9e2c5.png)

Additional measurements on the master with MxNet Profiler enabled show that > 80us is spent between python and time noted by Profiler for dot operation.
It seems to be an already know issue #14883 and #17097 regarding passing python/C++ barrier. For me it sounds like fixing python-MXNet binding overhead issue should also fix this issue.
![image](https://user-images.githubusercontent.com/58251767/82837709-20da9600-9eca-11ea-8df1-75f4e8f0be56.png)

Results in table below, neglecting measurement noise, shows that differences between time measured in python and MKL are almost the same as between python and MXNet Profiler, so it confirms python <-> C++ API issue.
![image](https://user-images.githubusercontent.com/58251767/82837739-364fc000-9eca-11ea-827c-5202bfcf87b1.png)

In the last table there are results for MxNet when both profiler and MKL verbose are enabled (adding additional time for both measurements). We can see here that the difference between python time and profile time is similar to the results in the previous tables and it is the most significant one.
![image](https://user-images.githubusercontent.com/58251767/82837761-4cf61700-9eca-11ea-8eba-8091f0c48deb.png)

Exact results of my measurements could be find in logs: [dot_issue_logs.zip](https://github.com/apache/incubator-mxnet/files/4678496/dot_issue_logs.zip)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] djaym7 closed issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch

Posted by GitBox <gi...@apache.org>.

djaym7 closed issue #17971:
URL: https://github.com/apache/incubator-mxnet/issues/17971


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on issue #17971:
URL: https://github.com/apache/incubator-mxnet/issues/17971#issuecomment-637907045


   @djaym7 please take a review whether @anko-intel answered your question :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] anko-intel commented on issue #17971: DOT product too slow on CPU and GPU compared to np and pytorch

Posted by GitBox <gi...@apache.org>.

anko-intel commented on issue #17971:
URL: https://github.com/apache/incubator-mxnet/issues/17971#issuecomment-621407717


   Hi @djaym7
   I reproduced the issue locally on Skylake-X i9-7920X. But I am not sure if I observed the same issue. Could you run following python scripts on your environment and upload the results?
   [test_dot_issue.py.txt](https://github.com/apache/incubator-mxnet/files/4553919/test_dot_issue.py.txt)
   [test_dot_issue_logs.py.txt](https://github.com/apache/incubator-mxnet/files/4553920/test_dot_issue_logs.py.txt)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org