You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "zfoobar (via GitHub)" <gi...@apache.org> on 2023/04/02 18:45:57 UTC
[GitHub] [arrow] zfoobar opened a new issue, #34845: Performance of explicit array comparison faster vs. Arrow compute
zfoobar opened a new issue, #34845:
URL: https://github.com/apache/arrow/issues/34845
### Describe the usage question you have. Please include as many useful details as possible.
I wrapped the compute_and_write_csv_example.cc code in some timestamps to measure explicit array comparison vs. the arrow::compute method - the output below suggests the explicit comparison is much faster. This may be a dumb question (apologies) - but is this expected? I am exploring Arrow as both an end-user for my AI consultancy and a contributor - I want to understand the performance benefits/use-cases.
output:
(pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./compute_and_write_csv_example
Array explicitly compared
0.000183625 seconds
Arrays compared using a compute function
0.0404125 seconds
Table created
Writing CSV file
### Component(s)
Benchmarking, C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] zfoobar commented on issue #34845: Performance of explicit array comparison faster vs. Arrow compute
Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on issue #34845:
URL: https://github.com/apache/arrow/issues/34845#issuecomment-1500916907
he indeed made a good point. Here is my test code, eager to hear your thoughts:
https://gist.github.com/zfoobar/f61c25271f20d54ea72b065b8e3c6a4c
Here are some sample runs with order of magnitude increases in input size. You can see where the performance superiority of Arrow kicks in (marked in green below)
(pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 10000
Generating 10000 random values..
Array explicitly compared
Time: 0.216 ms
Arrays compared using a compute function
Time: 9.757 ms
Table created
Writing CSV file
(pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 100000
Generating 100000 random values..
Array explicitly compared
Time: 1.148 ms
Arrays compared using a compute function
Time: 13.329 ms
Table created
Writing CSV file
(pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 1000000
Generating 1000000 random values..
Array explicitly compared
Time: 12.184 ms
Arrays compared using a compute function
**Time: 9.332 ms**
Table created
Writing CSV file
(pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 10000000
Generating 10000000 random values..
Array explicitly compared
Time: 87.849 ms
Arrays compared using a compute function
**Time: 39.409 ms**
Table created
Writing CSV file
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on issue #34845: Performance of explicit array comparison faster vs. Arrow compute
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34845:
URL: https://github.com/apache/arrow/issues/34845#issuecomment-1498181206
@assignUser 's point is a good one. In addition, are you including the construction and resizing of the boolean builder as part of your benchmark? This will be a part of the compute function execution (it needs to allocate a result array).
There's a good chance the compiler is able to optimize that comparison function pretty effectively. It may be able to do a better job than the compute function since the size oft he arrays is known ahead of time.
Are you able to share your benchmark code?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] zfoobar closed issue #34845: Performance of explicit array comparison faster vs. Arrow compute
Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar closed issue #34845: Performance of explicit array comparison faster vs. Arrow compute
URL: https://github.com/apache/arrow/issues/34845
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] assignUser commented on issue #34845: Performance of explicit array comparison faster vs. Arrow compute
Posted by "assignUser (via GitHub)" <gi...@apache.org>.
assignUser commented on issue #34845:
URL: https://github.com/apache/arrow/issues/34845#issuecomment-1494452521
Looking at the example it uses only 8 values so my guess (as I am not a C++ maintainer) is that the overhead of calling the compute function eats up any advantages in comparison time leading tot the above result.
You could try creating much bigger arrays to compare and see if the result changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org