You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "zfoobar (via GitHub)" <gi...@apache.org> on 2023/04/02 18:45:57 UTC

[GitHub] [arrow] zfoobar opened a new issue, #34845: Performance of explicit array comparison faster vs. Arrow compute

zfoobar opened a new issue, #34845:
URL: https://github.com/apache/arrow/issues/34845

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   I wrapped the compute_and_write_csv_example.cc code in some timestamps to measure explicit array comparison vs. the arrow::compute method - the output below suggests the explicit comparison is much faster. This may be a dumb question (apologies) - but is this expected? I am exploring Arrow as both an end-user for my AI consultancy and a contributor - I want to understand the performance benefits/use-cases. 
   
   output:
   
   (pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./compute_and_write_csv_example 
   Array explicitly compared
   0.000183625 seconds
   Arrays compared using a compute function
   0.0404125 seconds
   Table created
   Writing CSV file
   
   
   ### Component(s)
   
   Benchmarking, C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar commented on issue #34845: Performance of explicit array comparison faster vs. Arrow compute

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar commented on issue #34845:
URL: https://github.com/apache/arrow/issues/34845#issuecomment-1500916907

   he indeed made a good point. Here is my test code, eager to hear your thoughts:
   https://gist.github.com/zfoobar/f61c25271f20d54ea72b065b8e3c6a4c
   
   Here are some sample runs with order of magnitude increases in input size. You can see where the performance superiority of Arrow kicks in (marked in green below)
   
   (pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 10000
   Generating 10000 random values..
   Array explicitly compared
   Time: 0.216 ms
   Arrays compared using a compute function
   Time: 9.757 ms
   Table created
   Writing CSV file
   
   (pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 100000
   Generating 100000 random values..
   Array explicitly compared
   Time: 1.148 ms
   Arrays compared using a compute function
   Time: 13.329 ms
   Table created
   Writing CSV file
   
   (pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 1000000
   Generating 1000000 random values..
   Array explicitly compared
   Time: 12.184 ms
   Arrays compared using a compute function
   **Time: 9.332 ms**
   Table created
   Writing CSV file
   
   (pyarrow-dev) Zacharys-MacBook-Air:arrow_scratch zacharyfierstadt$ ./my_example 10000000
   Generating 10000000 random values..
   Array explicitly compared
   Time: 87.849 ms
   Arrays compared using a compute function
   **Time: 39.409 ms**
   Table created
   Writing CSV file
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34845: Performance of explicit array comparison faster vs. Arrow compute

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34845:
URL: https://github.com/apache/arrow/issues/34845#issuecomment-1498181206

   @assignUser 's point is a good one.  In addition, are you including the construction and resizing of the boolean builder as part of your benchmark?  This will be a part of the compute function execution (it needs to allocate a result array).
   
   There's a good chance the compiler is able to optimize that comparison function pretty effectively.  It may be able to do a better job than the compute function since the size oft he arrays is known ahead of time.
   
   Are you able to share your benchmark code?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zfoobar closed issue #34845: Performance of explicit array comparison faster vs. Arrow compute

Posted by "zfoobar (via GitHub)" <gi...@apache.org>.
zfoobar closed issue #34845: Performance of explicit array comparison faster vs. Arrow compute
URL: https://github.com/apache/arrow/issues/34845


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on issue #34845: Performance of explicit array comparison faster vs. Arrow compute

Posted by "assignUser (via GitHub)" <gi...@apache.org>.
assignUser commented on issue #34845:
URL: https://github.com/apache/arrow/issues/34845#issuecomment-1494452521

   Looking at the example it uses only 8 values so my guess (as I am not a C++ maintainer) is that the overhead of calling the compute function eats up any advantages in comparison time leading tot the above result. 
   
   You could try creating much bigger arrays to compare and see if the result changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org