You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "js8544 (via GitHub)" <gi...@apache.org> on 2023/06/14 03:32:59 UTC

[GitHub] [arrow] js8544 opened a new pull request, #36058: GH-36047: [C++][Compute] Add support of duration types to IndexIn and IsIn

js8544 opened a new pull request, #36058:
URL: https://github.com/apache/arrow/pull/36058

   Add support of duration types to IndexIn and IsIn


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36058:
URL: https://github.com/apache/arrow/pull/36058#issuecomment-1590534157

   :warning: GitHub issue #36047 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36058:
URL: https://github.com/apache/arrow/pull/36058#issuecomment-1612228959

   Conbench analyzed the 6 benchmark runs on commit `00b599cd`.
   
   There was 1 benchmark result indicating a performance regression:
   
   - Commit Run on `arm64-m6g-linux-compute` at [2023-06-26 16:01:26Z](http://conbench.ursa.dev/compare/runs/2f6cc1b1aac645e6b61cf67f0464a891...63b7a499eae045daa1e3b979a331dbb2/)
     - [params=<Int32Type>/ConvertToSparseCOOTensorInt32, source=cpp-micro, suite=arrow-tensor-conversion-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649980843cd73f48000bead0eb3ae49...06499b6946df784e800083d263add741)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14640144947) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36058:
URL: https://github.com/apache/arrow/pull/36058#discussion_r1238403057


##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc:
##########
@@ -238,6 +239,29 @@ TEST_F(TestIsInKernel, TimeTimestamp) {
             "[true, false, true]");
 }
 
+TEST_F(TestIsInKernel, TimeDuration) {
+  for (const auto& type : DurationTypes()) {
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+
+    // Duplicates in right array
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+  }
+
+  // Different units, invalid cast

Review Comment:
   Interesting... ideally, the values could be cast to MILLI instead of casting the value_set to SECOND. Don't have to solve this here, though. :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36058:
URL: https://github.com/apache/arrow/pull/36058#discussion_r1238466388


##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc:
##########
@@ -238,6 +239,29 @@ TEST_F(TestIsInKernel, TimeTimestamp) {
             "[true, false, true]");
 }
 
+TEST_F(TestIsInKernel, TimeDuration) {
+  for (const auto& type : DurationTypes()) {
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+
+    // Duplicates in right array
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+  }
+
+  // Different units, invalid cast

Review Comment:
   #36204 will solve this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #36058:
URL: https://github.com/apache/arrow/pull/36058#issuecomment-1600062537

   > This looks like a good to me. I tested it out a bit in python and it seems to work. I do find the casting to be a bit arbitrary here. For example, it seems odd that I can do `is_in(milliseconds, seconds)` but not `is_in(seconds, milliseconds)`.
   > 
   > However, users can always explicitly cast, and I think this is better for a follow-up if someone feels strongly about it.
   
   Actually it's possible to do `is_in(seconds, milliseconds)` if the casting is safe. For example, `is_in([1s, 2s], [1000ms, 2000ms])` would return `[true, true]`, while `is_in([1s, 2s], [1ms, 2ms])` would throw an error.
   
   Currently the kernels only try to cast `value_set` into query keys' type. It does make more sense if they cast both ways. I'll try to implement this in another PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36058:
URL: https://github.com/apache/arrow/pull/36058#discussion_r1238401828


##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc:
##########
@@ -238,6 +239,29 @@ TEST_F(TestIsInKernel, TimeTimestamp) {
             "[true, false, true]");
 }
 
+TEST_F(TestIsInKernel, TimeDuration) {
+  for (const auto& type : DurationTypes()) {
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+
+    // Duplicates in right array
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+  }
+
+  // Different units, invalid cast
+  ASSERT_RAISES(Invalid, IsIn(ArrayFromJSON(duration(TimeUnit::SECOND), "[0, 1, 2]"),
+                              ArrayFromJSON(duration(TimeUnit::MILLI), "[0, 2]")));
+
+  // Different units, valid cast
+  CheckIsIn(ArrayFromJSON(duration(TimeUnit::MILLI), "[0, 1, 2000]"),
+            ArrayFromJSON(duration(TimeUnit::SECOND), "[0, 2]"), "[true, false, true]");

Review Comment:
   What happens if conversion from SECOND to MILLI overflows?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #36058:
URL: https://github.com/apache/arrow/pull/36058


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36058: GH-36047: [C++][Compute] Add support of duration types to IndexIn and IsIn

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36058:
URL: https://github.com/apache/arrow/pull/36058#issuecomment-1590404924

   :warning: GitHub issue #36047 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #36058: GH-36047: [C++][Compute] Add support for duration types to IndexIn and IsIn

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #36058:
URL: https://github.com/apache/arrow/pull/36058#discussion_r1238469522


##########
cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc:
##########
@@ -238,6 +239,29 @@ TEST_F(TestIsInKernel, TimeTimestamp) {
             "[true, false, true]");
 }
 
+TEST_F(TestIsInKernel, TimeDuration) {
+  for (const auto& type : DurationTypes()) {
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, null]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+
+    // Duplicates in right array
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, true, false, true, true]", /*skip_nulls=*/false);
+    CheckIsIn(type, "[1, null, 5, 1, 2]", "[2, 1, 1, null, 2]",
+              "[true, false, false, true, true]", /*skip_nulls=*/true);
+  }
+
+  // Different units, invalid cast
+  ASSERT_RAISES(Invalid, IsIn(ArrayFromJSON(duration(TimeUnit::SECOND), "[0, 1, 2]"),
+                              ArrayFromJSON(duration(TimeUnit::MILLI), "[0, 2]")));
+
+  // Different units, valid cast
+  CheckIsIn(ArrayFromJSON(duration(TimeUnit::MILLI), "[0, 1, 2000]"),
+            ArrayFromJSON(duration(TimeUnit::SECOND), "[0, 2]"), "[true, false, true]");

Review Comment:
   It calls Cast with CastOptions::Safe(), so it should return an error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org