You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/02 18:32:11 UTC

[GitHub] [arrow] Harshitg opened a new issue #7882: Performance difference between pc.multiply vs pd.multiply

Harshitg opened a new issue #7882:
URL: https://github.com/apache/arrow/issues/7882


   Wanted to report the performance difference observed between Pandas and Pyarrow. 
   
   ```
   import numpy as np
   import pandas as pd
   import pyarrow as pa
   import pyarrow.compute as pc
   
   df = pd.DataFrame(np.random.randn(100000000))
   %timeit -n 5 -r 5 df.multiply(df)
   
   table = pa.Table.from_pandas(df)
   %timeit -n 5 -r 5 pc.multiply(table[0],table[0])
   ```
   
   Results:
   ```
   %timeit -n 5 -r 5 df.multiply(df)
   374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
   ```
   
   ```
   %timeit -n 5 -r 5 pc.multiply(table[0],table[0])
   698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zacqed closed issue #7882: Performance difference between pc.multiply vs pd.multiply

Posted by GitBox <gi...@apache.org>.
zacqed closed issue #7882:
URL: https://github.com/apache/arrow/issues/7882


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zacqed commented on issue #7882: Performance difference between pc.multiply vs pd.multiply

Posted by GitBox <gi...@apache.org>.
zacqed commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667924646


   https://issues.apache.org/jira/browse/ARROW-9623. Closing this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on issue #7882: Performance difference between pc.multiply vs pd.multiply

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667807803


   Thanks, also if you are using prebuilt wheels, I think this could be due to the target compiler settings (I would guess numpy multiply has dynamic dispatch for more recent CPUs where this is something that we are working on implementing)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on issue #7882: Performance difference between pc.multiply vs pd.multiply

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667784048


   @zacqed thanks for the report.  We use [JIRA](https://issues.apache.org/jira/secure/Dashboard.jspa) could you report this here? Also what version of pyarrow?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zacqed commented on issue #7882: Performance difference between pc.multiply vs pd.multiply

Posted by GitBox <gi...@apache.org>.
zacqed commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667800722


   Sure, i will open JIRA and confirm. The pyarrow version is 1.0.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org