You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Bryan Cutler <cu...@gmail.com> on 2019/01/10 22:19:51 UTC

pyarrow data type casting problem when safe=True

Hi All,

I have a question about using pyarrow.Array.from_pandas with the safe flag
set to True.  When the Pandas data contains integers and NULL values, it
will get changed to a floating point dtype and then if the type is casted
back to an integer in Arrow, it will raise an error "ArrowInvalid: Floating
point value truncated". Is this the expected behavior? I'm guessing it
doesn't look at the actual values, just what type is being converted. Is
there a way around this specific error besides setting safe to False?  Here
is a concise example:

>>> pa.Array.from_pandas(pd.Series([1, None]), type=pa.int32(), safe=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/array.pxi", line 474, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 169, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Floating point value truncated

I came across this issue in https://github.com/apache/spark/pull/22807,
specifically withi this discussion
https://github.com/apache/spark/pull/22807#discussion_r246859417.

Thanks!
Bryan

Re: pyarrow data type casting problem when safe=True

Posted by Krisztián Szűcs <sz...@gmail.com>.

Verified, issue: https://issues.apache.org/jira/browse/ARROW-4258

On Mon, Jan 14, 2019 at 12:31 AM Wes McKinney <we...@gmail.com> wrote:

> This seems like a bug to me; I would not expect this to fail. It's too
> bad it didn't get fixed in time for 0.12
>
> On Thu, Jan 10, 2019 at 4:20 PM Bryan Cutler <cu...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I have a question about using pyarrow.Array.from_pandas with the safe
> flag
> > set to True.  When the Pandas data contains integers and NULL values, it
> > will get changed to a floating point dtype and then if the type is casted
> > back to an integer in Arrow, it will raise an error "ArrowInvalid:
> Floating
> > point value truncated". Is this the expected behavior? I'm guessing it
> > doesn't look at the actual values, just what type is being converted. Is
> > there a way around this specific error besides setting safe to False?
> Here
> > is a concise example:
> >
> > >>> pa.Array.from_pandas(pd.Series([1, None]), type=pa.int32(),
> safe=True)
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "pyarrow/array.pxi", line 474, in pyarrow.lib.Array.from_pandas
> >   File "pyarrow/array.pxi", line 169, in pyarrow.lib.array
> >   File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array
> >   File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> > pyarrow.lib.ArrowInvalid: Floating point value truncated
> >
> > I came across this issue in https://github.com/apache/spark/pull/22807,
> > specifically withi this discussion
> > https://github.com/apache/spark/pull/22807#discussion_r246859417.
> >
> > Thanks!
> > Bryan
>

Re: pyarrow data type casting problem when safe=True

Posted by Wes McKinney <we...@gmail.com>.

This seems like a bug to me; I would not expect this to fail. It's too
bad it didn't get fixed in time for 0.12

On Thu, Jan 10, 2019 at 4:20 PM Bryan Cutler <cu...@gmail.com> wrote:
>
> Hi All,
>
> I have a question about using pyarrow.Array.from_pandas with the safe flag
> set to True.  When the Pandas data contains integers and NULL values, it
> will get changed to a floating point dtype and then if the type is casted
> back to an integer in Arrow, it will raise an error "ArrowInvalid: Floating
> point value truncated". Is this the expected behavior? I'm guessing it
> doesn't look at the actual values, just what type is being converted. Is
> there a way around this specific error besides setting safe to False?  Here
> is a concise example:
>
> >>> pa.Array.from_pandas(pd.Series([1, None]), type=pa.int32(), safe=True)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyarrow/array.pxi", line 474, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 169, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Floating point value truncated
>
> I came across this issue in https://github.com/apache/spark/pull/22807,
> specifically withi this discussion
> https://github.com/apache/spark/pull/22807#discussion_r246859417.
>
> Thanks!
> Bryan