You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "js8544 (via GitHub)" <gi...@apache.org> on 2023/06/22 06:14:39 UTC

[GitHub] [arrow] js8544 opened a new pull request, #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

js8544 opened a new pull request, #36231:
URL: https://github.com/apache/arrow/pull/36231

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Currently durations can only be multiplied with int64, but no other integer types.
   
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Allow duration types to be multiplied with any integer type.
   1. For `multiply`, new kernels are added to support the new types.
   2. For `multiply_checked`, integers will be casted to int64. We can't add new kernels because the MultiplyChecked op class requires both operands to have the same type.
   
   ### Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   4. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   Yes.
   
   ### Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1241864301


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {
+        // MultiplyChecked requires the same type for both operands, therefore we can't
+        // easily make kernels for integers other than int64, so we just cast integers to
+        // int64
+        if ((*types)[0].id() == Type::DURATION && is_integer((*types)[1].id())) {
+          ReplaceTypes(int64(), &(*types)[1], 1);
+        } else if ((*types)[1].id() == Type::DURATION && is_integer((*types)[0].id())) {
+          ReplaceTypes(int64(), &(*types)[0], 1);
+        }
+      }

Review Comment:
   Looks much cleaner now, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36231:
URL: https://github.com/apache/arrow/pull/36231#issuecomment-1602067009

   :warning: GitHub issue #36128 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36231:
URL: https://github.com/apache/arrow/pull/36231#issuecomment-1602418185

   The CI failure on AMD64 Conda C++ is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1239969698


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -616,6 +617,21 @@ ArrayKernelExec GenerateArithmeticWithFixedIntOutType(detail::GetTypeId get_id)
   }
 }
 
+void PromoteIntegerForDurationArithmetic(std::vector<TypeHolder>* types) {
+  bool has_duration = std::any_of(types->begin(), types->end(), [](const TypeHolder& t) {

Review Comment:
   :rocket:



##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -616,6 +617,21 @@ ArrayKernelExec GenerateArithmeticWithFixedIntOutType(detail::GetTypeId get_id)
   }
 }
 
+void PromoteIntegerForDurationArithmetic(std::vector<TypeHolder>* types) {

Review Comment:
   Please declare this in `codegen_internal.h` next to the other DispatchBest helpers for findability



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1238524727


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc:
##########
@@ -1152,6 +1152,53 @@ TEST(TestBinaryArithmetic, AddWithImplicitCastsUint64EdgeCase) {
                                      ArrayFromJSON(uint64(), "[18446744073709551615]")}));
 }
 
+TEST(TestBinaryArithmetic, MultiplyDuration) {

Review Comment:
   Oh, I didn't notice this. I've changed the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1238755160


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {
+        // MultiplyChecked requires the same type for both operands, therefore we can't
+        // easily make kernels for integers other than int64, so we just cast integers to
+        // int64
+        if ((*types)[0].id() == Type::DURATION && is_integer((*types)[1].id())) {
+          ReplaceTypes(int64(), &(*types)[1], 1);
+        } else if ((*types)[1].id() == Type::DURATION && is_integer((*types)[0].id())) {
+          ReplaceTypes(int64(), &(*types)[0], 1);
+        }
+      }

Review Comment:
   I don't remember any, but it's been quite a while since I worked with Acero.



##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {
+        // MultiplyChecked requires the same type for both operands, therefore we can't
+        // easily make kernels for integers other than int64, so we just cast integers to
+        // int64
+        if ((*types)[0].id() == Type::DURATION && is_integer((*types)[1].id())) {
+          ReplaceTypes(int64(), &(*types)[1], 1);
+        } else if ((*types)[1].id() == Type::DURATION && is_integer((*types)[0].id())) {
+          ReplaceTypes(int64(), &(*types)[0], 1);
+        }
+      }

Review Comment:
   I think we can leave it as-is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1239991741


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -616,6 +617,21 @@ ArrayKernelExec GenerateArithmeticWithFixedIntOutType(detail::GetTypeId get_id)
   }
 }
 
+void PromoteIntegerForDurationArithmetic(std::vector<TypeHolder>* types) {

Review Comment:
   My bad. I've moved it over. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36231:
URL: https://github.com/apache/arrow/pull/36231#issuecomment-1612019043

   Conbench analyzed the 6 benchmark runs on commit `5dd4cc08`.
   
   There were 8 benchmark results indicating a performance regression:
   
   - Commit Run on `arm64-m6g-linux-compute` at [2023-06-26 12:10:47Z](http://conbench.ursa.dev/compare/runs/7aba330555f84b6b837ddab44c7939cc...2f6cc1b1aac645e6b61cf67f0464a891/)
     - [params=65536/1, source=cpp-micro, suite=arrow-compute-scalar-compare-benchmark](http://conbench.ursa.dev/compare/benchmarks/064997b8ee3f7d818000396c71d8359d...06499805165a7a9e8000580b8850904e)
     - [params=65536/2, source=cpp-micro, suite=arrow-compute-scalar-compare-benchmark](http://conbench.ursa.dev/compare/benchmarks/064997b8f0dd7fe680005c42754ac3e5...064998051844716680003c3dee8b079c)
   - and 6 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14635808363) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1238394688


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc:
##########
@@ -1152,6 +1152,53 @@ TEST(TestBinaryArithmetic, AddWithImplicitCastsUint64EdgeCase) {
                                      ArrayFromJSON(uint64(), "[18446744073709551615]")}));
 }
 
+TEST(TestBinaryArithmetic, MultiplyDuration) {

Review Comment:
   We already have `TestTemporalMultiplyDuration` in `scalar_temporal_test.cc`, can you augment that test instead?
   
   Also, notice `TestTemporalDivideDuration` which hints that the same improvement could be done for division...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1238395288


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {

Review Comment:
   As pointed out below, duration also supports dividing by an integer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1238525789


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {

Review Comment:
   I've added this to div and div_checked



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1238391248


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {
+        // MultiplyChecked requires the same type for both operands, therefore we can't
+        // easily make kernels for integers other than int64, so we just cast integers to
+        // int64
+        if ((*types)[0].id() == Type::DURATION && is_integer((*types)[1].id())) {
+          ReplaceTypes(int64(), &(*types)[1], 1);
+        } else if ((*types)[1].id() == Type::DURATION && is_integer((*types)[0].id())) {
+          ReplaceTypes(int64(), &(*types)[0], 1);
+        }
+      }

Review Comment:
   Are there other places where a similar pattern may be desired? Should this be factored out somehow?
   cc @lidavidm @bkietz  for opinions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1238901471


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {
+        // MultiplyChecked requires the same type for both operands, therefore we can't
+        // easily make kernels for integers other than int64, so we just cast integers to
+        // int64
+        if ((*types)[0].id() == Type::DURATION && is_integer((*types)[1].id())) {
+          ReplaceTypes(int64(), &(*types)[1], 1);
+        } else if ((*types)[1].id() == Type::DURATION && is_integer((*types)[0].id())) {
+          ReplaceTypes(int64(), &(*types)[0], 1);
+        }
+      }

Review Comment:
   Personally, I'd prefer to see a uniform approach to adding this support to `multiply` and `multiply_checked`: either both should have kernels matching each type or both should just use the numeric promotion. (Adding kernels to multiply_checked would require adding Call overloads to the MultiplyChecked op, but is doable.) Of those two options, I'd prefer to see promotion for consistency; multiply(int8, int64) requires an implicit cast to int64 so it seems odd to specifically add direct support for multiply(int8, duration) but not for any of the other arithmetic kernels.
   
   More directly in answer to your question, I would also like to see this logic extracted to a helper function. It may only be used in this DispatchBest, but just in case another Function requires this logic it'd be best to try to avoid copypasta. Furthermore it'll keep this DispatchBest from getting too unreadable. Something like:
   
   ```c++
   if (name_ == "multiply" || name_ == "multiply_checked" ||
       name_ == "divide" || name_ == "divide_checked") {
     PromoteIntegerForDurationArithmetic(types);
   }
   
   void PromoteIntegerForDurationArithmetic(std::vector<TypeHolder>* types) {
     bool has_duration = false;
     for (const auto& type : *types) {
       if (type.id() == Type::DURATION) {
         has_duration = true;
       }
     }
     if (!has_duration) return;
   
     // Require implicit casts to int64 to match duration's bit width
     for (auto& type : *types) {
       if (is_integer(type.id())) {
         type = int64();
       }
     }
   }
   ```
   
   It should be fine to let this function do the replacement `(int8, duration) -> (int64, duration)` even for divide; this just defers the failed kernel lookup to the following DispatchExact call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #36231:
URL: https://github.com/apache/arrow/pull/36231#issuecomment-1606030280

   CI failures are unrelated. @pitrou @bkietz Is this PR good to merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36231:
URL: https://github.com/apache/arrow/pull/36231#discussion_r1239768483


##########
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:
##########
@@ -640,6 +642,17 @@ struct ArithmeticFunction : ScalarFunction {
           ReplaceTypes(type, types);
         }
       }
+
+      if (name_ == "multiply_checked") {
+        // MultiplyChecked requires the same type for both operands, therefore we can't
+        // easily make kernels for integers other than int64, so we just cast integers to
+        // int64
+        if ((*types)[0].id() == Type::DURATION && is_integer((*types)[1].id())) {
+          ReplaceTypes(int64(), &(*types)[1], 1);
+        } else if ((*types)[1].id() == Type::DURATION && is_integer((*types)[0].id())) {
+          ReplaceTypes(int64(), &(*types)[0], 1);
+        }
+      }

Review Comment:
   It does make more sense considering that muliply uses type promotion already. I've updated the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #36231: GH-36128: [C++][Compute] Allow multiplication between duration and all integer types

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #36231:
URL: https://github.com/apache/arrow/pull/36231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org