You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "danepitkin (via GitHub)" <gi...@apache.org> on 2023/04/06 19:40:36 UTC

[GitHub] [arrow] danepitkin commented on issue #34901: Inconsistent cast behavior between array and scalar for int64

danepitkin commented on issue #34901:
URL: https://github.com/apache/arrow/issues/34901#issuecomment-1499529735

   Hi @rohanjain101,
   
   I am able to reproduce this example on pyarrow v11 using macos 13.3. 
   
   What you are experiencing is the difference between safe vs unsafe casting, since the number you chose probably can not be fully represented in the new type. It is not true that all int64 values can be safely converted to float64. Due to the way precision works in floating point, there are numbers that may be skipped that could otherwise be represented by int64. See https://en.wikipedia.org/wiki/Double-precision_floating-point_format, which states: only `Integers from −253 to 253 (−9,007,199,254,740,992 to 9,007,199,254,740,992) can be exactly represented`. My guess is the underlying implementation enforces this hard limit, since technically I believe there are some int64 numbers that can be larger and still represented safely when cast to float64 (such as 18,014,398,509,481,984, but not 18,014,398,509,481,983).
   ```
   >>> arr = pa.array([18014398509481984], type=pa.int64())
   >>> arr.cast(pa.float64())
   Traceback (most recent call last):
   ...
   pyarrow.lib.ArrowInvalid: Integer value 18014398509481984 not in range: -9007199254740992 to 9007199254740992
   ```
   
   It appears the scalar cast defaults to allow unsafe casting, while the array defaults to safe casting. You can allow unsafe casting in the array like this:
   ```
   >>> arr.cast(pa.float64(), safe=False)
   <pyarrow.lib.DoubleArray object at 0x126a40ee0>
   [
     6.312878760374612e+18
   ]
   ```
   
   There are no options to choose safe vs unsafe cast in scalar APIs at the moment. The documentation does state the scalar will perform a safe cast, though, which it is not doing: https://arrow.apache.org/docs/python/generated/pyarrow.Int64Scalar.html#pyarrow.Int64Scalar
   
   This is either a bug in scalar safe casting or the documentation is wrong. Ideally, Scalars can also allow you to choose safe vs unsafe casting with an option. Either way, some more investigation is still needed.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org