You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/04 23:29:18 UTC
[GitHub] [arrow] westonpace commented on a diff in pull request #12755: ARROW-16014: [C++] Create more benchmarks for measuring expression evaluation overhead

westonpace commented on code in PR #12755:
URL: https://github.com/apache/arrow/pull/12755#discussion_r842194217


##########
cpp/src/arrow/compute/exec/expression_benchmark.cc:
##########
@@ -69,6 +70,25 @@ static void SimplifyFilterWithGuarantee(benchmark::State& state, Expression filt
   }
 }
 
+static void BatchedExecution(benchmark::State& state, Expression expr) {
+  const auto array_batches = static_cast<int32_t>(state.range(0));
+  const auto array_size = 10000000 / array_batches;
+
+  ExecContext ctx;
+  auto dataset_schema = schema({
+      field("x", int64()),
+  });
+  ExecBatch input({Datum(ConstantArrayGenerator::Int64(array_size, 5))},
+                  /*length=*/1);
+
+  ASSIGN_OR_ABORT(auto bound, expr.Bind(*dataset_schema));
+  for (auto _ : state) {
+    for (int it = 0; it < array_batches; ++it)
+      ABORT_NOT_OK(ExecuteScalarExpression(bound, input, &ctx).status());
+  }
+  state.SetItemsProcessed(state.iterations() * array_size * array_batches);

Review Comment:
   Let's put a comment somewhere that we are measuring `rows per second` (my first thought when I saw "items" was "ExecuteScalarExpression calls per second").  Or ideally we can use a custom counter and call the column "rows_per_second".



##########
cpp/src/arrow/compute/exec/expression_benchmark.cc:
##########
@@ -92,6 +112,11 @@ auto guarantee = and_(equal(field_ref("a"), literal(int64_t(99))),
 auto guarantee_dictionary = and_(equal(field_ref("a"), literal(ninety_nine_dict)),
                                  equal(field_ref("b"), literal(ninety_nine_dict)));
 
+auto expression_with_copy =
+    and_(less(field_ref("x"), literal(20)), greater(field_ref("x"), literal(0)));
+
+auto expression_without_copy = field_ref("x");

Review Comment:
   `field_ref("x")` is maybe a little too easy of an expression, based on the results.
   
   Let's test the following:
   
   ```
   auto complex_expression =
       and_(less(field_ref("x"), literal(20)), greater(field_ref("x"), literal(0)));
   auto simple_expression = call("negate", {field_ref("x")});
   auto zero_copy_expression =
       call("cast", {field_ref("x")}, compute::CastOptions::Safe(timestamp(TimeUnit::NANO)));
   auto ref_only_expression = field_ref("x");
   ```



##########
cpp/src/arrow/compute/exec/expression_benchmark.cc:
##########
@@ -119,6 +144,11 @@ BENCHMARK_CAPTURE(BindAndEvaluate, nested_array,
                   field_ref(FieldRef("struct_arr", "float")));
 BENCHMARK_CAPTURE(BindAndEvaluate, nested_scalar,
                   field_ref(FieldRef("struct_scalar", "float")));
-
+BENCHMARK_CAPTURE(BatchedExecution, execution_with_copy, expression_with_copy)
+    ->RangeMultiplier(10)
+    ->Range(1, 10000000);

Review Comment:
   Right now this parameter is "number of batches".  Can we make the parameter "rows per batch" (and give the parameter a name too).  We will have to invert the math a little but I think it will be much more readable.
   
   Also, let's make the minimum "10 rows per batch".  Testing "1 row per batch" takes too long on my system and isn't adding any value.



##########
cpp/src/arrow/compute/exec/expression_benchmark.cc:
##########
@@ -69,6 +70,25 @@ static void SimplifyFilterWithGuarantee(benchmark::State& state, Expression filt
   }
 }
 
+static void BatchedExecution(benchmark::State& state, Expression expr) {

Review Comment:
   We are measuring the overhead of `ExecuteScalarExpression`.  This is important to enable batched execution but it is not directly related and so this benchmark should probably be named `ExecuteScalarExpressionOverhead` or something like that.



##########
cpp/src/arrow/compute/exec/expression_benchmark.cc:
##########
@@ -119,6 +144,11 @@ BENCHMARK_CAPTURE(BindAndEvaluate, nested_array,
                   field_ref(FieldRef("struct_arr", "float")));
 BENCHMARK_CAPTURE(BindAndEvaluate, nested_scalar,
                   field_ref(FieldRef("struct_scalar", "float")));
-
+BENCHMARK_CAPTURE(BatchedExecution, execution_with_copy, expression_with_copy)
+    ->RangeMultiplier(10)
+    ->Range(1, 10000000);
+BENCHMARK_CAPTURE(BatchedExecution, execution_without_copy, expression_without_copy)
+    ->RangeMultiplier(10)
+    ->Range(1, 10000000);

Review Comment:
   Add `->DenseThreadRange(1, std::thread::hardware_concurrency(), std::thread::hardware_concurrency())` to all of these benchmarks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org