You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "wirable23 (via GitHub)" <gi...@apache.org> on 2023/05/12 20:16:10 UTC

[GitHub] [arrow] wirable23 opened a new issue, #35576: Unexpected float32 to decimal128 cast result

wirable23 opened a new issue, #35576:
URL: https://github.com/apache/arrow/issues/35576

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ```
   a = pa.array([545803904.0], type=pa.float32())
   >>> a.cast(pa.decimal128(38, 18))
   <pyarrow.lib.Decimal128Array object at 0x000001DE8FF59840>
   [
     545803886.966396699654750208
   ]
   >>>
   ```
   
   While encoding of decimal128 and float32 is very different, the difference after the cast  seems much larger than expected.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on issue #35576: [C++] Unexpected float32/64 to decimal128 cast result

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on issue #35576:
URL: https://github.com/apache/arrow/issues/35576#issuecomment-1563012237

   There are also very strange precision-related phenomena going on:
   ```pycon
   >>> pa.array([1234567890.]).cast(pa.decimal128(38, 10))
   <pyarrow.lib.Decimal128Array object at 0x7f05f49238e0>
   [
     1234567890.0000000000
   ]
   >>> pa.array([1234567890.]).cast(pa.decimal128(38, 11))
   <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f1c0>
   [
     1234567889.99999995904
   ]
   >>> pa.array([1234567890.]).cast(pa.decimal128(38, 12))
   <pyarrow.lib.Decimal128Array object at 0x7f05f494f9a0>
   [
     1234567890.000000057344
   ]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on issue #35576: [C++] Unexpected float32/64 to decimal128 cast result

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on issue #35576:
URL: https://github.com/apache/arrow/issues/35576#issuecomment-1563041988

   The issue is with the algorithm used in `Decimal128::FromReal`. We can reproduce the underlying precision problem by repeating the first steps of the algorithm in Python:
   ```pycon
   >>> powers_of_ten = [1e-38, 1e-37, 1e-36, 1e-35, 1e-34, 1e-33, 1e-32, 1e-31, 1e-30, 1e-29, 1e-28,
   ...:     1e-27, 1e-26, 1e-25, 1e-24, 1e-23, 1e-22, 1e-21, 1e-20, 1e-19, 1e-18, 1e-17,
   ...:     1e-16, 1e-15, 1e-14, 1e-13, 1e-12, 1e-11, 1e-10, 1e-9,  1e-8,  1e-7,  1e-6,
   ...:     1e-5,  1e-4,  1e-3,  1e-2,  1e-1,  1e0,   1e1,   1e2,   1e3,   1e4,   1e5,
   ...:     1e6,   1e7,   1e8,   1e9,   1e10,  1e11,  1e12,  1e13,  1e14,  1e15,  1e16,
   ...:     1e17,  1e18,  1e19,  1e20,  1e21,  1e22,  1e23,  1e24,  1e25,  1e26,  1e27,
   ...:     1e28,  1e29,  1e30,  1e31,  1e32,  1e33,  1e34,  1e35,  1e36,  1e37,  1e38]
   >>> int(powers_of_ten[38+10]*1234567890.)
   12345678900000000000
   >>> int(powers_of_ten[38+11]*1234567890.)
   123456788999999995904
   >>> int(powers_of_ten[38+12]*1234567890.)
   1234567890000000057344
   ```
   
   Also it looks like I'm the culprit:
   https://github.com/apache/arrow/pull/7612
   :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on issue #35576: [C++] Unexpected float32/64 to decimal128 cast result

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on issue #35576:
URL: https://github.com/apache/arrow/issues/35576#issuecomment-1563025491

   @rok @felipecrv Does any of you want to take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on issue #35576: [C++] Unexpected float32/64 to decimal128 cast result

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on issue #35576:
URL: https://github.com/apache/arrow/issues/35576#issuecomment-1563007667

   `float32` seems to be a red herring here, as even `float64` exhibits the issue:
   ```pycon
   >>> pa.array([1234567890.]).cast(pa.decimal128(38, 18))
   <pyarrow.lib.Decimal128Array object at 0x7f05f4a3f0a0>
   [
     1234567889.999999973827018752
   ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou closed issue #35576: [C++] Decimal{128,256}::FromReal accuracy loss on non-small scale values

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou closed issue #35576: [C++] Decimal{128,256}::FromReal accuracy loss on non-small scale values
URL: https://github.com/apache/arrow/issues/35576


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] emkornfield commented on issue #35576: Unexpected float32 to decimal128 cast result

Posted by "emkornfield (via GitHub)" <gi...@apache.org>.

emkornfield commented on issue #35576:
URL: https://github.com/apache/arrow/issues/35576#issuecomment-1547404763

   the first part of the error is down-casting from float64 (python default representation to)->float32
   ```
   a = pa.array([545803904.0], type=pa.float32())
   a
   <pyarrow.lib.FloatArray object at 0x3ebb737bb400>
   [
     545803900
   ]
   ```
   Same happens with numpy:
   ```
   numpy.float32(545803904.0)
   545803900.0
   ```
   
   The second part of the error I think is likely due to implementation which looks like we somehow might do an extra cast through an intermediate value:
   
   ```
   a = pa.array([545803900.0], type=pa.float64())
   print(a.cast(pa.decimal128(38, 18)))
   print(a.cast(pa.float32()).cast(pa.decimal128(38, 18)))
   ```
   gives:
   [
     545803899.999999976169013248
   ]
   [
     545803886.966396699654750208
   ]
   
   I think the second source of error might be: https://github.com/apache/arrow/blob/cd6e2a4d2b9373b942da18b4cc82cb41431764d9/cpp/src/arrow/util/decimal.cc#L158
   since this looks like it it is done in float space (instead of casting to double) which potentially causes further loss of precision.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org