You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/02 18:32:11 UTC
[GitHub] [arrow] Harshitg opened a new issue #7882: Performance difference between pc.multiply vs pd.multiply
Harshitg opened a new issue #7882:
URL: https://github.com/apache/arrow/issues/7882
Wanted to report the performance difference observed between Pandas and Pyarrow.
```
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.compute as pc
df = pd.DataFrame(np.random.randn(100000000))
%timeit -n 5 -r 5 df.multiply(df)
table = pa.Table.from_pandas(df)
%timeit -n 5 -r 5 pc.multiply(table[0],table[0])
```
Results:
```
%timeit -n 5 -r 5 df.multiply(df)
374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
```
```
%timeit -n 5 -r 5 pc.multiply(table[0],table[0])
698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] zacqed closed issue #7882: Performance difference between pc.multiply vs pd.multiply
Posted by GitBox <gi...@apache.org>.
zacqed closed issue #7882:
URL: https://github.com/apache/arrow/issues/7882
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] zacqed commented on issue #7882: Performance difference between pc.multiply vs pd.multiply
Posted by GitBox <gi...@apache.org>.
zacqed commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667924646
https://issues.apache.org/jira/browse/ARROW-9623. Closing this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on issue #7882: Performance difference between pc.multiply vs pd.multiply
Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667807803
Thanks, also if you are using prebuilt wheels, I think this could be due to the target compiler settings (I would guess numpy multiply has dynamic dispatch for more recent CPUs where this is something that we are working on implementing)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on issue #7882: Performance difference between pc.multiply vs pd.multiply
Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667784048
@zacqed thanks for the report. We use [JIRA](https://issues.apache.org/jira/secure/Dashboard.jspa) could you report this here? Also what version of pyarrow?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] zacqed commented on issue #7882: Performance difference between pc.multiply vs pd.multiply
Posted by GitBox <gi...@apache.org>.
zacqed commented on issue #7882:
URL: https://github.com/apache/arrow/issues/7882#issuecomment-667800722
Sure, i will open JIRA and confirm. The pyarrow version is 1.0.0
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org