You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Chris Comeau <ch...@gmail.com> on 2023/05/08 18:07:24 UTC

[Python] "OverflowError: int too big to convert" with target_type float64 - allow loss of precision?

Is there any way to have pa.compute.cast handle int -> float64 with
accepted loss of precision?

Source value is a python int that's too long for int64, like
12345678901234567890, and I'd like to put into a float64 field in Arrow
table.

Using pyarrow 12.0.0:

pa.array([12345678901234567890], type=pa.float64())
-> ArrowInvalid: PyLong is too large to fit int64

Converting it myself works with expected loss of precision
pa.array([float(12345678901234567890)], type=pa.float64())

-> [1.2345678901234567e+19]


but I can't get pa.compute to do the same. Some examples:


pa.compute.cast([20033613169503999008], target_type=pa.float64(),
safe=False)
-> OverflowError: int too big to convert

pa.compute.cast(
    [12345678901234567890],
    options = pa.compute.CastOptions.unsafe(target_type=pa.float64())
)
-> OverflowError: int too big to convert

I tried the other options like int overflow and float truncate with no luck.

Asking Arrow to infer types hits the same error:
pa_array = pa.array([12345678901234567890])
-> OverflowError: int too big to convert

Cast to decimal128(38,0) works if it's set explicitly
pa.array([12345678901234567890], type=pa.decimal128(38, 0))

<pyarrow.lib.Decimal128Array object at 0x000001C45FAA8B80>
-> [12345678901234567890]

I'm working around it by doing the float() conversion myself, but this
is slower of course.

Re: [Python] "OverflowError: int too big to convert" with target_type float64 - allow loss of precision?

Posted by Chris Comeau <ch...@gmail.com>.
Yes, if I tell it to make a decimal128 array from the same ints, that
works. It's something specific to handling longer python ints that it hits
a snag on these two paths
- make an array with automatic type inference (no target type specified)
- make a float64 array

On Thu, May 11, 2023, 8:00 a.m. Felipe Oliveira Carvalho <
felipekde@gmail.com> wrote:

> Does creating a decimal128 array, then casting that array to float64 work?
>
> On Mon, May 8, 2023 at 3:08 PM Chris Comeau <ch...@gmail.com>
> wrote:
>
>> Is there any way to have pa.compute.cast handle int -> float64 with
>> accepted loss of precision?
>>
>> Source value is a python int that's too long for int64, like
>> 12345678901234567890, and I'd like to put into a float64 field in Arrow
>> table.
>>
>> Using pyarrow 12.0.0:
>>
>> pa.array([12345678901234567890], type=pa.float64())
>> -> ArrowInvalid: PyLong is too large to fit int64
>>
>> Converting it myself works with expected loss of precision
>> pa.array([float(12345678901234567890)], type=pa.float64())
>>
>> -> [1.2345678901234567e+19]
>>
>>
>> but I can't get pa.compute to do the same. Some examples:
>>
>>
>> pa.compute.cast([20033613169503999008], target_type=pa.float64(),
>> safe=False)
>> -> OverflowError: int too big to convert
>>
>> pa.compute.cast(
>>     [12345678901234567890],
>>     options = pa.compute.CastOptions.unsafe(target_type=pa.float64())
>> )
>> -> OverflowError: int too big to convert
>>
>> I tried the other options like int overflow and float truncate with no
>> luck.
>>
>> Asking Arrow to infer types hits the same error:
>> pa_array = pa.array([12345678901234567890])
>> -> OverflowError: int too big to convert
>>
>> Cast to decimal128(38,0) works if it's set explicitly
>> pa.array([12345678901234567890], type=pa.decimal128(38, 0))
>>
>> <pyarrow.lib.Decimal128Array object at 0x000001C45FAA8B80>
>> -> [12345678901234567890]
>>
>> I'm working around it by doing the float() conversion myself, but this is slower of course.
>>
>>

Re: [Python] "OverflowError: int too big to convert" with target_type float64 - allow loss of precision?

Posted by Felipe Oliveira Carvalho <fe...@gmail.com>.
Does creating a decimal128 array, then casting that array to float64 work?

On Mon, May 8, 2023 at 3:08 PM Chris Comeau <ch...@gmail.com> wrote:

> Is there any way to have pa.compute.cast handle int -> float64 with
> accepted loss of precision?
>
> Source value is a python int that's too long for int64, like
> 12345678901234567890, and I'd like to put into a float64 field in Arrow
> table.
>
> Using pyarrow 12.0.0:
>
> pa.array([12345678901234567890], type=pa.float64())
> -> ArrowInvalid: PyLong is too large to fit int64
>
> Converting it myself works with expected loss of precision
> pa.array([float(12345678901234567890)], type=pa.float64())
>
> -> [1.2345678901234567e+19]
>
>
> but I can't get pa.compute to do the same. Some examples:
>
>
> pa.compute.cast([20033613169503999008], target_type=pa.float64(),
> safe=False)
> -> OverflowError: int too big to convert
>
> pa.compute.cast(
>     [12345678901234567890],
>     options = pa.compute.CastOptions.unsafe(target_type=pa.float64())
> )
> -> OverflowError: int too big to convert
>
> I tried the other options like int overflow and float truncate with no
> luck.
>
> Asking Arrow to infer types hits the same error:
> pa_array = pa.array([12345678901234567890])
> -> OverflowError: int too big to convert
>
> Cast to decimal128(38,0) works if it's set explicitly
> pa.array([12345678901234567890], type=pa.decimal128(38, 0))
>
> <pyarrow.lib.Decimal128Array object at 0x000001C45FAA8B80>
> -> [12345678901234567890]
>
> I'm working around it by doing the float() conversion myself, but this is slower of course.
>
>