You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/15 19:30:46 UTC

[GitHub] [arrow] edponce opened a new pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

edponce opened a new pull request #10727:
URL: https://github.com/apache/arrow/pull/10727


   This PR adds floor, ceiling, and truncate scalar kernels. For all integral inputs, output is a 64-bit floating-point value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671261181



##########
File path: cpp/src/arrow/compute/api_scalar.cc
##########
@@ -357,6 +357,9 @@ SCALAR_ARITHMETIC_BINARY(Power, "power", "power_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftLeft, "shift_left", "shift_left_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftRight, "shift_right", "shift_right_checked")
 SCALAR_EAGER_BINARY(Atan2, "atan2")
+SCALAR_EAGER_UNARY(Floor, "floor")
+SCALAR_EAGER_UNARY(Ceiling, "ceiling")

Review comment:
       I agree that `ceil` should be the name used to invoked the function. The only reason for using the long form for the internal variable names is to be somewhat consistent with other compute functions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #10727:
URL: https://github.com/apache/arrow/pull/10727


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   Tests fail for `std::numeric_limits<signed/unsigned long long int>::min/max()` due to an invalid range when checking integer with floating-point result.
   The tests are:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   
   The error message generated is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   
   Here is some context of what these numbers represent:
   ```
   max(unsigned long long int) = 18446744073709551615
   max(mantissa of double is 54 bits) = 9007199254740992
   ```
   
   This error is most likely due to the [Arrow testing logic when checking integer ranges](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/int_util.cc#L601-L618) for comparing values in test assertions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671414103



##########
File path: docs/source/cpp/compute.rst
##########
@@ -348,6 +348,22 @@ Bit-wise functions
   out of bounds for the data type.  However, an overflow when shifting the
   first input is not error (truncated bits are silently discarded).
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+Rounding functions convert a numeric input into an approximate value with a
+simpler representation based on the rounding strategy.

Review comment:
       Ah, ok, sounds good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671411470



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##########
@@ -1032,47 +1032,90 @@ TEST(TestBinaryArithmetic, AddWithImplicitCastsUint64EdgeCase) {
 }
 
 TEST(TestUnaryArithmetic, DispatchBest) {
-  // All arithmetic
-  for (std::string name : {"negate", "abs", "abs_checked", "sign"}) {
+  // All types (with _checked variant)
+  for (std::string name : {"abs"}) {
+    for (std::string suffix : {"", "_checked"}) {
+      name += suffix;
+      for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(),
+                             uint32(), uint64(), float32(), float64()}) {
+        CheckDispatchBest(name, {ty}, {ty});
+        CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
+      }
+    }
+  }
+
+  // All types
+  for (std::string name : {"negate", "sign"}) {
     for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),
                            uint64(), float32(), float64()}) {
       CheckDispatchBest(name, {ty}, {ty});
       CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
     }
   }
 
-  // Signed arithmetic
+  // Fail on null type (with _checked variant)
+  for (std::string name : {"negate", "abs", "ln", "log2", "log10", "log1p", "sin", "cos",
+                           "tan", "asin", "acos"}) {
+    for (std::string suffix : {"", "_checked"}) {
+      name += suffix;
+      CheckDispatchFails(name, {null()});
+    }
+  }
+
+  // Fail on null type
+  for (std::string name : {"atan", "sign", "floor", "ceil", "trunc"}) {
+    CheckDispatchFails(name, {null()});
+  }
+
+  // Signed types
   for (std::string name : {"negate_checked"}) {
     for (const auto& ty : {int8(), int16(), int32(), int64(), float32(), float64()}) {
       CheckDispatchBest(name, {ty}, {ty});
       CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
     }
   }
 
-  // Null input
-  for (std::string name : {"negate", "negate_checked", "abs", "abs_checked", "sign"}) {
-    CheckDispatchFails(name, {null()});
-  }
-
+  // Float types (with _checked variant)
   for (std::string name :
        {"ln", "log2", "log10", "log1p", "sin", "cos", "tan", "asin", "acos"}) {
     for (std::string suffix : {"", "_checked"}) {
       name += suffix;
+      for (const auto& ty : {float32(), float64()}) {
+        CheckDispatchBest(name, {ty}, {ty});
+        CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
+      }
+    }
+  }
 
-      CheckDispatchBest(name, {int32()}, {float64()});
-      CheckDispatchBest(name, {uint8()}, {float64()});
+  // Float types
+  for (std::string name : {"atan", "floor", "ceil", "trunc"}) {
+    for (const auto& ty : {float32(), float64()}) {
+      CheckDispatchBest(name, {ty}, {ty});
+      CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
+    }
+  }
 
-      CheckDispatchBest(name, {dictionary(int8(), int64())}, {float64()});
+  // Integer -> Float64 (with _checked variant)
+  for (std::string name :
+       {"ln", "log2", "log10", "log1p", "sin", "cos", "tan", "asin", "acos"}) {
+    for (std::string suffix : {"", "_checked"}) {
+      name += suffix;
+      for (const auto& ty :
+           {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(), uint64()}) {
+        CheckDispatchBest(name, {ty}, {float64()});
+        CheckDispatchBest(name, {dictionary(int8(), ty)}, {float64()});
+      }
     }
   }
 
-  CheckDispatchBest("atan", {int32()}, {float64()});
-  CheckDispatchBest("atan2", {int32(), float64()}, {float64(), float64()});
-  CheckDispatchBest("atan2", {int32(), uint8()}, {float64(), float64()});
-  CheckDispatchBest("atan2", {int32(), null()}, {float64(), float64()});
-  CheckDispatchBest("atan2", {float32(), float64()}, {float64(), float64()});
-  // Integer always promotes to double
-  CheckDispatchBest("atan2", {float32(), int8()}, {float64(), float64()});
+  // Integer -> Float64
+  for (std::string name : {"atan", "floor", "ceil", "trunc"}) {
+    for (const auto& ty :
+         {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(), uint64()}) {
+      CheckDispatchBest(name, {ty}, {float64()});
+      CheckDispatchBest(name, {dictionary(int8(), ty)}, {float64()});
+    }
+  }

Review comment:
       Thanks for catching this, I had moved `atan2` to the binary `DispatchBest` but apparently skipped it during a rebase/merge. I will add them back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880957322


   https://issues.apache.org/jira/browse/ARROW-12745


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   *Food for thought:* Tests fail for `std::numeric_limits<Int64/Uint64>::min/max()` due to an invalid range when a `Scalar[Int64/Uint64]` input is checked against a `Scalar[Float64]` output. Similarly, if output is `Scalar[Float32]` tests will fail for `Int32/Uint32` cases. This error is triggered by the [Arrow testing logic when casting integer-to-floating](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc#L195-L209) for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.
   
   An example test is:
   ```
   auto min = std::numeric_limits<unsigned long long>::min();
   auto max = std::numeric_limits<unsigned long long>::max();
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   and the error message is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   The meaning of these numbers is:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992  // mantissa of Float64 = 53
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671366668



##########
File path: docs/source/cpp/compute.rst
##########
@@ -348,6 +348,22 @@ Bit-wise functions
   out of bounds for the data type.  However, an overflow when shifting the
   first input is not error (truncated bits are silently discarded).
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+Rounding functions convert a numeric input into an approximate value with a
+simpler representation based on the rounding strategy.

Review comment:
       This is worded a little confusingly in my opinion. If we're going to reference rounding strategy here, the notes column should describe the rounding behavior for each function (even if it's just the 'obvious' or 'expected' one).
   
   Or alternatively, something like 'rounding functions find the nearest integer (as a floating-point value) to the argument based on a rounding strategy'.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##########
@@ -1032,47 +1032,90 @@ TEST(TestBinaryArithmetic, AddWithImplicitCastsUint64EdgeCase) {
 }
 
 TEST(TestUnaryArithmetic, DispatchBest) {
-  // All arithmetic
-  for (std::string name : {"negate", "abs", "abs_checked", "sign"}) {
+  // All types (with _checked variant)
+  for (std::string name : {"abs"}) {
+    for (std::string suffix : {"", "_checked"}) {
+      name += suffix;
+      for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(),
+                             uint32(), uint64(), float32(), float64()}) {
+        CheckDispatchBest(name, {ty}, {ty});
+        CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
+      }
+    }
+  }
+
+  // All types
+  for (std::string name : {"negate", "sign"}) {
     for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),
                            uint64(), float32(), float64()}) {
       CheckDispatchBest(name, {ty}, {ty});
       CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
     }
   }
 
-  // Signed arithmetic
+  // Fail on null type (with _checked variant)
+  for (std::string name : {"negate", "abs", "ln", "log2", "log10", "log1p", "sin", "cos",
+                           "tan", "asin", "acos"}) {
+    for (std::string suffix : {"", "_checked"}) {
+      name += suffix;
+      CheckDispatchFails(name, {null()});
+    }
+  }
+
+  // Fail on null type
+  for (std::string name : {"atan", "sign", "floor", "ceil", "trunc"}) {
+    CheckDispatchFails(name, {null()});
+  }
+
+  // Signed types
   for (std::string name : {"negate_checked"}) {
     for (const auto& ty : {int8(), int16(), int32(), int64(), float32(), float64()}) {
       CheckDispatchBest(name, {ty}, {ty});
       CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
     }
   }
 
-  // Null input
-  for (std::string name : {"negate", "negate_checked", "abs", "abs_checked", "sign"}) {
-    CheckDispatchFails(name, {null()});
-  }
-
+  // Float types (with _checked variant)
   for (std::string name :
        {"ln", "log2", "log10", "log1p", "sin", "cos", "tan", "asin", "acos"}) {
     for (std::string suffix : {"", "_checked"}) {
       name += suffix;
+      for (const auto& ty : {float32(), float64()}) {
+        CheckDispatchBest(name, {ty}, {ty});
+        CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
+      }
+    }
+  }
 
-      CheckDispatchBest(name, {int32()}, {float64()});
-      CheckDispatchBest(name, {uint8()}, {float64()});
+  // Float types
+  for (std::string name : {"atan", "floor", "ceil", "trunc"}) {
+    for (const auto& ty : {float32(), float64()}) {
+      CheckDispatchBest(name, {ty}, {ty});
+      CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
+    }
+  }
 
-      CheckDispatchBest(name, {dictionary(int8(), int64())}, {float64()});
+  // Integer -> Float64 (with _checked variant)
+  for (std::string name :
+       {"ln", "log2", "log10", "log1p", "sin", "cos", "tan", "asin", "acos"}) {
+    for (std::string suffix : {"", "_checked"}) {
+      name += suffix;
+      for (const auto& ty :
+           {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(), uint64()}) {
+        CheckDispatchBest(name, {ty}, {float64()});
+        CheckDispatchBest(name, {dictionary(int8(), ty)}, {float64()});
+      }
     }
   }
 
-  CheckDispatchBest("atan", {int32()}, {float64()});
-  CheckDispatchBest("atan2", {int32(), float64()}, {float64(), float64()});
-  CheckDispatchBest("atan2", {int32(), uint8()}, {float64(), float64()});
-  CheckDispatchBest("atan2", {int32(), null()}, {float64(), float64()});
-  CheckDispatchBest("atan2", {float32(), float64()}, {float64(), float64()});
-  // Integer always promotes to double
-  CheckDispatchBest("atan2", {float32(), int8()}, {float64(), float64()});
+  // Integer -> Float64
+  for (std::string name : {"atan", "floor", "ceil", "trunc"}) {
+    for (const auto& ty :
+         {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(), uint64()}) {
+      CheckDispatchBest(name, {ty}, {float64()});
+      CheckDispatchBest(name, {dictionary(int8(), ty)}, {float64()});
+    }
+  }

Review comment:
       This drops the tests for atan2?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   Tests fail for `std::numeric_limits<signed/unsigned long long int>::min/max()` due to an invalid range when checking integer with floating-point result.
   The tests are:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   
   The error message generated is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   
   Here is some context of what these numbers represent:
   ```
   max(unsigned long long int) = 18446744073709551615
   max(mantissa of double is 53 bits) = 9007199254740992
   ```
   
   This error is most likely due to the [Arrow testing logic when checking integer ranges](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/int_util.cc#L601-L618) for comparing values in test assertions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   Tests fail for `std::numeric_limits<unsigned long long>::min/max()` due to an invalid range when checking integer with floating-point result.
   The tests are:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   
   The error message generated is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   
   Here is some context of what these numbers represent:
   ```
   max(unsigned long long int) = 18446744073709551615
   max(mantissa of double is 54 bits) = 9007199254740992
   ```
   
   This error is most likely due to the [Arrow testing logic when checking integer ranges](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/int_util.cc#L601-L618) for comparing values in test assertions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   *Food for thought:* Tests fail for `std::numeric_limits<Int64/Uint64>::min/max()` due to an invalid range when a `Scalar[Int64/Uint64]` input is checked against a `Scalar[Float64]` output. Similarly, if output is `Scalar[Float32]` tests will fail for `Int32/Uint32` cases. This error is triggered by the [Arrow testing logic when casting integer-to-floating](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc#L195-L209) for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.
   
   An example test is:
   ```
   auto min = std::numeric_limits<unsigned long long>::min();
   auto max = std::numeric_limits<unsigned long long>::max();
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   and the error message is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   The meaning of these numbers is:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992  // mantissa of Float64 = 53
   ```
   
   There are two alternatives to handle min/max tests:
   1. Do not test min/max for kernels that have integral inputs and floating-point outputs
   2. Create a `TYPED_TEST_SUITE` that uses integral types of up to a width that is less than the mantissa of the floating-point output type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671140639



##########
File path: cpp/src/arrow/compute/api_scalar.cc
##########
@@ -357,6 +357,9 @@ SCALAR_ARITHMETIC_BINARY(Power, "power", "power_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftLeft, "shift_left", "shift_left_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftRight, "shift_right", "shift_right_checked")
 SCALAR_EAGER_BINARY(Atan2, "atan2")
+SCALAR_EAGER_UNARY(Floor, "floor")
+SCALAR_EAGER_UNARY(Ceiling, "ceiling")

Review comment:
       ```suggestion
   SCALAR_EAGER_UNARY(Ceil, "ceil")
   ```
   
   Is there a reason to have it "ceiling" instead of "ceil" (the C++ function is ceil, as well as numpy. SQL seems to have both)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   Tests fail for `std::numeric_limits<Int64/Uint64>::min/max()` due to an invalid range when a `Scalar[Int64/Uint64]` input is checked against a `Scalar[Float64]` output. This failure also occurs for 32-bit floating-point values. This error is triggered by the [Arrow testing logic when casting integer-to-floating](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc#L195-L209) for comparing values in test assertions.
   
   An example test is:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   and the error message is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   The meaning of these numbers is:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992  // mantissa of Float64 = 53
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   Tests fail for `std::numeric_limits<T>::min/max()` due to an invalid range when checking integer with floating-point result.
   The tests are:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   
   The error message generated is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   
   Here is some context of what these numbers represent:
   ```
   max(long long int) = 18446744073709551615
   max(mantissa of double is 54 bits) = 9007199254740992
   ```
   
   This error is most likely due to the [Arrow testing logic when checking integer ranges](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/int_util.cc#L601-L618) for comparing values in test assertions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   *Food for thought:* Tests fail for `std::numeric_limits<Int64/Uint64>::min/max()` due to an invalid range when a `Scalar[Int64/Uint64]` input is checked against a `Scalar[Float64]` output. This failure also occurs for 32-bit floating-point values. This error is triggered by the [Arrow testing logic when casting integer-to-floating](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc#L195-L209) for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.
   
   An example test is:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   and the error message is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   The meaning of these numbers is:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992  // mantissa of Float64 = 53
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   *Food for thought:* Tests fail for `std::numeric_limits<Int64/Uint64>::min/max()` due to an invalid range when a `Scalar[Int64/Uint64]` input is checked against a `Scalar[Float64]` output. This failure also occurs for 32-bit floating-point values. This error is triggered by the [Arrow testing logic when casting integer-to-floating](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc#L195-L209) for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.
   
   An example test is:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(std::numeric_limits<long long>::min()), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(std::numeric_limits<long long>::max()), *arrow::MakeScalar(float64(), max));
   ```
   and the error message is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   The meaning of these numbers is:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992  // mantissa of Float64 = 53
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671261181



##########
File path: cpp/src/arrow/compute/api_scalar.cc
##########
@@ -357,6 +357,9 @@ SCALAR_ARITHMETIC_BINARY(Power, "power", "power_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftLeft, "shift_left", "shift_left_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftRight, "shift_right", "shift_right_checked")
 SCALAR_EAGER_BINARY(Atan2, "atan2")
+SCALAR_EAGER_UNARY(Floor, "floor")
+SCALAR_EAGER_UNARY(Ceiling, "ceiling")

Review comment:
       I agree that `ceil` should be the name. Thanks for this review!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   *Food for thought:* Tests fail for `std::numeric_limits<Int64/Uint64>::min/max()` due to an invalid range when a `Scalar[Int64/Uint64]` input is checked against a `Scalar[Float64]` output. Similarly, if output is `Scalar[Float32]` tests will fail for `Int32/Uint32` cases. This error is triggered by the [Arrow testing logic when casting integer-to-floating](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc#L195-L209) for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.
   
   An example test is:
   ```
   auto min = std::numeric_limits<unsigned long long>::min();
   auto max = std::numeric_limits<unsigned long long>::max();
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   and the error message is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   The meaning of these numbers is:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992  // mantissa of Float64 = 53
   ```
   
   There are two alternatives to handle min/max tests:
   1. Do not test min/max for cases that have integral inputs and floating-point outputs
   2. Create a `TYPED_TEST_SUITE` that uses integral types of up to a width that is less than the mantissa of the floating-point output type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671323690



##########
File path: cpp/src/arrow/compute/api_scalar.cc
##########
@@ -357,6 +357,9 @@ SCALAR_ARITHMETIC_BINARY(Power, "power", "power_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftLeft, "shift_left", "shift_left_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftRight, "shift_right", "shift_right_checked")
 SCALAR_EAGER_BINARY(Atan2, "atan2")
+SCALAR_EAGER_UNARY(Floor, "floor")
+SCALAR_EAGER_UNARY(Ceiling, "ceiling")

Review comment:
       @jorisvandenbossche Compute function names (for both C++ API and `CallFunction` name) are inconsistent w.r.t. to short vs. long form. For example, *Ceiling, Negate, Power*, etc. I think this should be revisited in another JIRA issue and will require updating Python and R bindings.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671413628



##########
File path: docs/source/cpp/compute.rst
##########
@@ -348,6 +348,22 @@ Bit-wise functions
   out of bounds for the data type.  However, an overflow when shifting the
   first input is not error (truncated bits are silently discarded).
 
+Rounding functions
+~~~~~~~~~~~~~~~~~~
+
+Rounding functions convert a numeric input into an approximate value with a
+simpler representation based on the rounding strategy.

Review comment:
       I added the rounding functions section/text based on the soon-to-be-ready [`round` and `mround` functions](https://github.com/apache/arrow/pull/10349). Although, `floor`, `ceil`, and `trunc` do round to the nearest integer, `round/mround` do not necessarily. They have options to specify fractional precision and have options for various rounding strategies.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   Tests fail for `std::numeric_limits<unsigned long long int>::min/max()` due to an invalid range when checking integer with floating-point result.
   The tests are:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   
   The error message generated is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   
   Here is some context of what these numbers represent:
   ```
   max(unsigned long long int) = 18446744073709551615
   max(mantissa of double is 54 bits) = 9007199254740992
   ```
   
   This error is most likely due to the [Arrow testing logic when checking integer ranges](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/int_util.cc#L601-L618) for comparing values in test assertions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671142050



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -1567,6 +1588,24 @@ const FunctionDoc log1p_checked_doc{
      "Use function \"log1p\" if you want non-positive values to return "
      "-inf or NaN."),
     {"x"}};
+
+const FunctionDoc floor_doc{
+    "Calculate the greatest integer in magnitude less than or equal to the "
+    "argument element-wise",
+    "",

Review comment:
       ```suggestion
       "Round down to the nearest integer",
       "Calculate the greatest integer in magnitude less than or equal to the "
       "argument element-wise",
   ```
   
   (that's how you explained it in the api_scalar.h  doc comments, which I find easier to understand as short summary of the function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r671353488



##########
File path: cpp/src/arrow/compute/api_scalar.cc
##########
@@ -357,6 +357,9 @@ SCALAR_ARITHMETIC_BINARY(Power, "power", "power_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftLeft, "shift_left", "shift_left_checked")
 SCALAR_ARITHMETIC_BINARY(ShiftRight, "shift_right", "shift_right_checked")
 SCALAR_EAGER_BINARY(Atan2, "atan2")
+SCALAR_EAGER_UNARY(Floor, "floor")
+SCALAR_EAGER_UNARY(Ceiling, "ceiling")

Review comment:
       I used the short form for compute function names in this PR, and opened [this JIRA](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-13359) to revisit the names of compute functions. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   *Food for thought:* Tests fail for `std::numeric_limits<Int64/Uint64>::min/max()` due to an invalid range when a `Scalar[Int64/Uint64]` input is checked against a `Scalar[Float64]` output. This failure also occurs for 32-bit floating-point values. This error is triggered by the [Arrow testing logic when casting integer-to-floating](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_numeric.cc#L195-L209) for comparing values in test assertions. This is normal behavior as not all integers have a floating-point representation.
   
   An example test is:
   ```
   auto min = std::numeric_limits<unsigned long long>::min();
   auto max = std::numeric_limits<unsigned long long>::max();
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   and the error message is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   The meaning of these numbers is:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992  // mantissa of Float64 = 53
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#discussion_r670807189



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -1495,6 +1516,21 @@ const FunctionDoc log1p_checked_doc{
      "Use function \"log1p\" if you want non-positive values to return "
      "-inf or NaN."),
     {"x"}};
+
+const FunctionDoc floor_doc{
+    "Calculate the greatest integer less than or equal to the argument element-wise",
+    "",
+    {"x"}};
+
+const FunctionDoc ceiling_doc{
+    "Calculate the least integer greater than or equal to the argument element-wise",
+    "",
+    {"x"}};
+
+const FunctionDoc truncate_doc{
+    "Calculate the nearest integer not greater than to the argument element-wise",

Review comment:
       Ah, not greater in *magnitude* according to cppreference. Maybe that should be clarified.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -1495,6 +1516,21 @@ const FunctionDoc log1p_checked_doc{
      "Use function \"log1p\" if you want non-positive values to return "
      "-inf or NaN."),
     {"x"}};
+
+const FunctionDoc floor_doc{
+    "Calculate the greatest integer less than or equal to the argument element-wise",
+    "",
+    {"x"}};
+
+const FunctionDoc ceiling_doc{
+    "Calculate the least integer greater than or equal to the argument element-wise",
+    "",
+    {"x"}};
+
+const FunctionDoc truncate_doc{
+    "Calculate the nearest integer not greater than to the argument element-wise",

Review comment:
       This means we truncate towards negative infinity? (e.g. truncate(-1.1) = -2 since -1 > -1.1?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-881667961


   Thanks @edponce!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #10727: ARROW-12745: [C++][Compute] Add floor, ceiling, and truncate kernels

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #10727:
URL: https://github.com/apache/arrow/pull/10727#issuecomment-880964640


   Tests fail for `std::numeric_limits<signed/unsigned long long int>::min/max()` due to an invalid range when checking integer with floating-point result.
   The tests are:
   ```
   this->AssertUnaryOp(floor, this->MakeScalar(min), *arrow::MakeScalar(float64(), min));
   this->AssertUnaryOp(floor, this->MakeScalar(max), *arrow::MakeScalar(float64(), max));
   ```
   
   The error message generated is:
   ```
   '_error_or_value11.status()' failed with Invalid: Integer value 18446744073709551615
     not in range: 0 to 9007199254740992
   ```
   
   Here is some context of what these numbers represent:
   ```
   max(Uint64) = 18446744073709551615
   max(2^53) = 9007199254740992, mantissa of Float64 = 53
   ```
   
   This error is most likely due to the [Arrow testing logic when checking integer ranges](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/int_util.cc#L601-L618) for comparing values in test assertions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org