You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/16 16:48:47 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #14659: ARROW-18334: [C++] Handle potential non-commutativity by rebinding

westonpace commented on code in PR #14659:
URL: https://github.com/apache/arrow/pull/14659#discussion_r1024116017


##########
cpp/src/arrow/compute/exec/expression_test.cc:
##########
@@ -69,6 +69,54 @@ Expression true_unless_null(Expression argument) {
   return call("true_unless_null", {std::move(argument)});
 }
 
+Expression add(Expression l, Expression r) {
+  return call("add", {std::move(l), std::move(r)});
+}
+
+Expression timestamp_literal(std::string repr, TimeUnit::type unit) {
+  return literal(*MakeScalar(std::move(repr))->CastTo(timestamp(TimeUnit::NANO)));
+}
+
+struct duration_literal {

Review Comment:
   This seems useful too



##########
cpp/src/arrow/compute/exec/expression_test.cc:
##########
@@ -69,6 +69,54 @@ Expression true_unless_null(Expression argument) {
   return call("true_unless_null", {std::move(argument)});
 }
 
+Expression add(Expression l, Expression r) {
+  return call("add", {std::move(l), std::move(r)});
+}
+
+Expression timestamp_literal(std::string repr, TimeUnit::type unit) {
+  return literal(*MakeScalar(std::move(repr))->CastTo(timestamp(TimeUnit::NANO)));
+}

Review Comment:
   This seems generally useful beyond unit tests



##########
cpp/src/arrow/compute/exec/expression_test.cc:
##########
@@ -258,10 +292,11 @@ TEST(Expression, ToString) {
   EXPECT_EQ(literal(std::make_shared<BinaryScalar>(Buffer::FromString("az"))).ToString(),
             "\"617A\"");
 
-  auto ts = *MakeScalar("1990-10-23 10:23:33")->CastTo(timestamp(TimeUnit::NANO));
-  EXPECT_EQ(literal(ts).ToString(), "1990-10-23 10:23:33.000000000");
+  using namespace arrow_literals;

Review Comment:
   This is an unexpected place to find the `using` statement.  Why not pull it outside the test case?



##########
cpp/src/arrow/compute/exec/expression_test.cc:
##########
@@ -69,6 +70,39 @@ Expression true_unless_null(Expression argument) {
   return call("true_unless_null", {std::move(argument)});
 }
 
+Expression add(Expression l, Expression r) {
+  return call("add", {std::move(l), std::move(r)});
+}
+
+namespace arrow_literals {
+
+using namespace std::chrono_literals;
+
+inline auto operator""_ts(const char* c, size_t s) {
+  return [s = StringScalar(std::string{c, s})](TimeUnit::type unit) {
+    return literal(*s.CastTo(timestamp(unit)));
+  };
+}
+
+}  // namespace arrow_literals
+
+template <typename Rep, typename Period>
+Expression literal(std::chrono::duration<Rep, Period> d) {
+  int64_t int_value = d.count();
+  TimeUnit::type unit;
+  if constexpr (std::is_same_v<Period, std::nano>) {
+    unit = TimeUnit::NANO;
+  } else if constexpr (std::is_same_v<Period, std::micro>) {
+    unit = TimeUnit::MICRO;
+  } else if constexpr (std::is_same_v<Period, std::milli>) {
+    unit = TimeUnit::MILLI;
+  } else {
+    unit = TimeUnit::SECOND;
+    int_value = std::chrono::duration_cast<std::chrono::seconds>(d).count();
+  }
+  return literal(*MakeScalar(int_value)->CastTo(duration(unit)));
+}

Review Comment:
   nit: This seems generally useful (note, I thought that on the earlier version before the literals route too) to be something hidden in a test case.  Maybe a follow-up PR to add tests and expose the functionality at some point (though it's probably just one in a sea of QoL features we could offer C++ users).



##########
cpp/src/arrow/compute/exec/expression_test.cc:
##########
@@ -1098,6 +1122,12 @@ TEST(Expression, CanonicalizeAnd) {
 
   // catches and_kleene even when it's a subexpression
   ExpectCanonicalizesTo(is_valid(and_(b, true_)), is_valid(and_(true_, b)));
+
+  auto ts = field_ref("ts_ns");
+  using namespace arrow_literals;
+  ExpectCanonicalizesTo(add(ts, literal(5min)), add(literal(5min), ts));
+  ExpectCanonicalizesTo(add(add(ts, literal(5min)), add(literal(5min), literal(5min))),
+                        add(add(add(literal(5min), literal(5min)), literal(5min)), ts));

Review Comment:
   Wouldn't `add(literal(5min), literal(5min))` simplify to `add(literal(10min)` since all the args are literals?



##########
cpp/src/arrow/compute/exec/expression.cc:
##########
@@ -856,6 +870,27 @@ bool IsBinaryAssociativeCommutative(const Expression::Call& call) {
   return it != binary_associative_commutative.end();
 }
 
+Result<Expression> HandleInconsistentTypes(Expression::Call call,
+                                           compute::ExecContext* exec_context) {
+  // ARROW-18334: due to reordering of arguments, the call may have
+  // inconsistent argument types. For example, the call's kernel may
+  // correspond to `timestamp + duration` but the arguments happen to
+  // be `duration, timestamp`. The addition itself is still commutative,
+  // but the mismatch in declared argument types is potentially problematic
+  // if we ever start using the Expression::Call::kernel field more than
+  // we do currently. Check and rebind if necessary.
+  //
+  // The more correct fix for this problem is to ensure that all kernels of
+  // functions which are commutative be commutative as well, which would
+  // obviate rebinding like this. In the context of ARROW-18334, this
+  // would require rewriting KernelSignature so that a single kernel can
+  // handle both `timestamp + duration` and `duration + timestamp`.
+  if (call.kernel->signature->MatchesInputs(GetTypes(call.arguments))) {

Review Comment:
   Potentially naive question: are we guaranteed to be bound at this point (e.g. is `call.kernel` guaranteed to exist)?



##########
cpp/src/arrow/compute/exec/expression.cc:
##########
@@ -634,6 +634,20 @@ std::optional<Out> FoldLeft(It begin, It end, const BinOp& bin_op) {
   return folded;
 }
 
+template <typename BinOp, typename It,
+          typename Value = typename std::iterator_traits<It>::value_type,
+          typename Out = decltype(std::declval<BinOp>()(std::declval<Value>(),
+                                                        std::declval<Value>()))>
+std::optional<Out> MaybeFoldLeft(It begin, It end, const BinOp& bin_op) {

Review Comment:
   Nit: Is this the same thing as `std::accumulate`?  Could we maybe use `MaybeAccumulate` or, in a brief comment, reference `std::accumulate` for new readers who might not be familiar with the term "folding" (I often think of this as "reduce" but I just looked it up and it seems there is a `std::reduce` but it doesn't guarantee **left**-folding and thus can run in parallel)



##########
cpp/src/arrow/compute/exec/expression_test.cc:
##########
@@ -69,6 +70,39 @@ Expression true_unless_null(Expression argument) {
   return call("true_unless_null", {std::move(argument)});
 }
 
+Expression add(Expression l, Expression r) {
+  return call("add", {std::move(l), std::move(r)});
+}
+
+namespace arrow_literals {
+
+using namespace std::chrono_literals;
+
+inline auto operator""_ts(const char* c, size_t s) {
+  return [s = StringScalar(std::string{c, s})](TimeUnit::type unit) {
+    return literal(*s.CastTo(timestamp(unit)));
+  };
+}
+
+}  // namespace arrow_literals
+
+template <typename Rep, typename Period>
+Expression literal(std::chrono::duration<Rep, Period> d) {
+  int64_t int_value = d.count();
+  TimeUnit::type unit;
+  if constexpr (std::is_same_v<Period, std::nano>) {
+    unit = TimeUnit::NANO;
+  } else if constexpr (std::is_same_v<Period, std::micro>) {
+    unit = TimeUnit::MICRO;
+  } else if constexpr (std::is_same_v<Period, std::milli>) {
+    unit = TimeUnit::MILLI;
+  } else {
+    unit = TimeUnit::SECOND;
+    int_value = std::chrono::duration_cast<std::chrono::seconds>(d).count();

Review Comment:
   So we grab the largest possible resolution and then rely on casting later?  Seems reasonable but just asking for clarification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org