You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Tobias Zagorni (Jira)" <ji...@apache.org> on 2022/05/17 19:19:00 UTC
[jira] [Created] (ARROW-16599) [C++] Implementation of ExecuteScalarExpressionOverhead benchmarks without arrow for comparision
Tobias Zagorni created ARROW-16599:
--------------------------------------
Summary: [C++] Implementation of ExecuteScalarExpressionOverhead benchmarks without arrow for comparision
Key: ARROW-16599
URL: https://issues.apache.org/jira/browse/ARROW-16599
Project: Apache Arrow
Issue Type: Sub-task
Components: C++
Reporter: Tobias Zagorni
Assignee: Tobias Zagorni
The ExecuteScalarExpressionOverhead group of benchmarks for now gives us values we can compare to different batch sizes, or to different expressions. But we don't really see how well arrow does compared to what is possible in general.
The simple_expression and (negate x) complex_expression (x>0 and x<20) benchmarks, which perform an actual operation on data, can be implemented in pure C++ for comparison.
I implemented complex_expression benchmark using technically unnecessary intermediate buffers for the > and < operator results, to match what happens in the arrow expression.
What may seem unfair is that I currently re-use the input/output/intermediate buffers over all iterations. I also tried using new and delete each time, but could not measure a difference in performance. Reusing allowes to use std::vector for sightly cleaner code. Re-creating a vector each time would results in a lot of overhead initializing the vector values and is therefore not useful.
Example output:
{{ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:1 3328161 ns 3326213 ns 1277 batches_per_second=300.466k/s rows_per_second=300.466M/s
ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:16 754880 ns 11940432 ns 5680 batches_per_second=1.32471M/s rows_per_second=1.32471G/s
ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:10000/real_time/threads:1 1370993 ns 1370182 ns 3047 batches_per_second=72.9398k/s rows_per_second=729.398M/s
ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:10000/real_time/threads:16 213412 ns 3377187 ns 20608 batches_per_second=468.578k/s rows_per_second=4.68578G/s
ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:100000/real_time/threads:1 1194552 ns 1192163 ns 3494 batches_per_second=8.37134k/s rows_per_second=837.134M/s
ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:100000/real_time/threads:16 193390 ns 3047981 ns 22576 batches_per_second=51.709k/s rows_per_second=5.1709G/s
ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000000/real_time/threads:1 1243416 ns 1240591 ns 3325 batches_per_second=804.236/s rows_per_second=804.236M/s
ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000000/real_time/threads:16 449956 ns 7057594 ns 9216 batches_per_second=2.22244k/s rows_per_second=2.22244G/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:1000/real_time/threads:1 1153192 ns 1151060 ns 3580 batches_per_second=867.158k/s rows_per_second=867.158M/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:1000/real_time/threads:16 297876 ns 4705702 ns 15152 batches_per_second=3.3571M/s rows_per_second=3.3571G/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:10000/real_time/threads:1 519083 ns 518087 ns 8027 batches_per_second=192.647k/s rows_per_second=1.92647G/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:10000/real_time/threads:16 70329 ns 1106796 ns 62320 batches_per_second=1.42189M/s rows_per_second=14.2189G/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:100000/real_time/threads:1 420460 ns 419404 ns 9878 batches_per_second=23.7835k/s rows_per_second=2.37835G/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:100000/real_time/threads:16 75645 ns 1189925 ns 56864 batches_per_second=132.196k/s rows_per_second=13.2196G/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:1000000/real_time/threads:1 425360 ns 424499 ns 9404 batches_per_second=2.35095k/s rows_per_second=2.35095G/s
ExecuteScalarExpressionOverhead/simple_expression/rows_per_batch:1000000/real_time/threads:16 1057920 ns 16308254 ns 3984 batches_per_second=945.251/s rows_per_second=945.251M/s
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:1000/real_time/threads:1 876620 ns 876032 ns 4787 batches_per_second=1.14075M/s rows_per_second=1.14075G/s}}
{{baseline:
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:1000/real_time/threads:16 106371 ns 1657205 ns 41536 batches_per_second=9.40109M/s rows_per_second=9.40109G/s
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:10000/real_time/threads:1 993787 ns 993262 ns 4219 batches_per_second=100.625k/s rows_per_second=1006.25M/s
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:10000/real_time/threads:16 114770 ns 1812652 ns 37520 batches_per_second=871.311k/s rows_per_second=8.71311G/s
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:100000/real_time/threads:1 996150 ns 995562 ns 4209 batches_per_second=10.0386k/s rows_per_second=1003.86M/s
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:100000/real_time/threads:16 122580 ns 1936209 ns 35168 batches_per_second=81.5791k/s rows_per_second=8.15791G/s
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:1000000/real_time/threads:1 988198 ns 987316 ns 4231 batches_per_second=1011.94/s rows_per_second=1011.94M/s
ExecuteScalarExpressionBaseline<ComplexExpressionBaseline>/rows_per_batch:1000000/real_time/threads:16 445864 ns 6984471 ns 9296 batches_per_second=2.24284k/s rows_per_second=2.24284G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000/real_time/threads:1 362262 ns 361985 ns 11352 batches_per_second=2.76043M/s rows_per_second=2.76043G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000/real_time/threads:16 40944 ns 646932 ns 105312 batches_per_second=24.4234M/s rows_per_second=24.4234G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:10000/real_time/threads:1 375894 ns 375244 ns 11230 batches_per_second=266.032k/s rows_per_second=2.66032G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:10000/real_time/threads:16 44526 ns 703275 ns 96704 batches_per_second=2.2459M/s rows_per_second=22.459G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:100000/real_time/threads:1 377450 ns 376698 ns 11013 batches_per_second=26.4936k/s rows_per_second=2.64936G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:100000/real_time/threads:16 67216 ns 1054881 ns 62400 batches_per_second=148.774k/s rows_per_second=14.8774G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000000/real_time/threads:1 396841 ns 396078 ns 10461 batches_per_second=2.5199k/s rows_per_second=2.5199G/s
ExecuteScalarExpressionBaseline<SimpleExpressionBaseline>/rows_per_batch:1000000/real_time/threads:16 1046650 ns 16071057 ns 4016 batches_per_second=955.429/s rows_per_second=955.429M/s
}}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)