You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jbrockmendel (via GitHub)" <gi...@apache.org> on 2023/05/22 22:26:10 UTC

[GitHub] [arrow] jbrockmendel opened a new issue, #35717: pa.array([Decimal("nan")]) raises

jbrockmendel opened a new issue, #35717:
URL: https://github.com/apache/arrow/issues/35717

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ```
   pa.array([Decimal("nan")])
   
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/array.pxi", line 320, in pyarrow.lib.array
     File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
     File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Decimal precision out of range [1, 38]: -2147483648
   ```
   
   Based on how pyarrow handles numpy nans I'd expect this to get converted to a pyarrow null.
   
   pyarrow 11.0.0 on mac
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] xbarra commented on issue #35717: [Python] pa.array([Decimal("nan")]) raises

Posted by "xbarra (via GitHub)" <gi...@apache.org>.
xbarra commented on issue #35717:
URL: https://github.com/apache/arrow/issues/35717#issuecomment-1697128121

   In pyarrow 13 we have the same issue. And (the case most interesting to us), we have problems with Infinity
   
   ```
   pa.array([Decimal(2),Decimal('inf'),3])
   
   ---------------------------------------------------------------------------
   TypeError                                 Traceback (most recent call last)
   Cell In[23], line 1
   ----> 1 pa.array([Decimal(2),Decimal('inf'),3])
   
   File [c:\ProgramData\Anaconda3\Lib\site-packages\pyarrow\array.pxi:327](file:///C:/ProgramData/Anaconda3/Lib/site-packages/pyarrow/array.pxi:327), in pyarrow.lib.array()
   
   File [c:\ProgramData\Anaconda3\Lib\site-packages\pyarrow\array.pxi:39](file:///C:/ProgramData/Anaconda3/Lib/site-packages/pyarrow/array.pxi:39), in pyarrow.lib._sequence_to_array()
   
   File [c:\ProgramData\Anaconda3\Lib\site-packages\pyarrow\error.pxi:144](file:///C:/ProgramData/Anaconda3/Lib/site-packages/pyarrow/error.pxi:144), in pyarrow.lib.pyarrow_internal_check_status()
   
   TypeError: 'str' object cannot be interpreted as an integer
   ```
   
   The same happens with `pa.array([Decimal(2),Decimal('inf'),3],from_pandas=True)` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jbrockmendel commented on issue #35717: [Python] pa.array([Decimal("nan")]) raises

Posted by "jbrockmendel (via GitHub)" <gi...@apache.org>.
jbrockmendel commented on issue #35717:
URL: https://github.com/apache/arrow/issues/35717#issuecomment-1559472221

   Thanks @jorisvandenbossche, using from_pandas should help with the motivating use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35717: pa.array([Decimal("nan")]) raises

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35717:
URL: https://github.com/apache/arrow/issues/35717#issuecomment-1558715711

   > Based on how pyarrow handles numpy nans I'd expect this to get converted to a pyarrow null.
   
   We actually preserve NaNs by default when passing them like that:
   
   ```
   In [16]: pa.array([float("nan")])
   Out[16]: 
   <pyarrow.lib.DoubleArray object at 0x7f41b7c90ee0>
   [
     nan
   ]
   ```
   
   It's only when you specify `from_pandas=True` that we convert NaN to null, but this argument is set to True automatically when you pass a pandas object (Series or Index, we should expand this with checking for an array as well ..), see https://arrow.apache.org/docs/python/generated/pyarrow.array.html
   
   That aside, the error you get here is a bit confusing, and that seems to come from a bug in the precision/scale inference. If the first value is not a NaN, we see a better error message:
   
   ```
   In [17]: pa.array([Decimal("1.20"), Decimal("nan")])
   ...
   ArrowInvalid: The string 'NaN' is not a valid decimal128 number
   ```
   
   So we generally don't support NaN (or +/- Inf) for decimal data. Given that we don't support it, we should maybe consider converting it to nulls instead (or at least give the option to do so). Also casting float to decimal will raise an error for NaN/Inf values:
   
   ```
   In [32]: pa.array([1.2, 0.0]).cast(pa.decimal128(3, 2))
   Out[32]: 
   <pyarrow.lib.Decimal128Array object at 0x7f41b7fd8c40>
   [
     1.20,
     0.00
   ]
   
   In [33]: pa.array([1.2, 0.0, np.nan]).cast(pa.decimal128(3, 2))
   ...
   ArrowInvalid: Cannot convert nan to Decimal128
   ```
   
   For the `array(..)` constructor, the `from_pandas=True` argument will already ensure this gets converted to null:
   
   ```
   In [34]: pa.array([Decimal("1.20"), Decimal("nan")], from_pandas=True)
   Out[34]: 
   <pyarrow.lib.Decimal128Array object at 0x7f41c415cac0>
   [
     1.20,
     null
   ]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jbrockmendel commented on issue #35717: [Python] pa.array([Decimal("nan")]) raises

Posted by "jbrockmendel (via GitHub)" <gi...@apache.org>.
jbrockmendel commented on issue #35717:
URL: https://github.com/apache/arrow/issues/35717#issuecomment-1560103010

   Hmm looks like `pa.array([Decimal("nan")], from_pandas=True)` gives a null-type instead of a decimal-type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jbrockmendel commented on issue #35717: [Python] pa.array([Decimal("nan")]) raises

Posted by "jbrockmendel (via GitHub)" <gi...@apache.org>.
jbrockmendel commented on issue #35717:
URL: https://github.com/apache/arrow/issues/35717#issuecomment-1561371009

   Makes sense, thanks.  Context is is implementing something like https://github.com/pandas-dev/pandas/pull/53025 for Decimal objects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35717: [Python] pa.array([Decimal("nan")]) raises

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35717:
URL: https://github.com/apache/arrow/issues/35717#issuecomment-1560707158

   Yes, for a generic sequence of python objects, we currently just check if an object is "null" or not using:
   
   https://github.com/apache/arrow/blob/53c0d338e6e45cb28a3c1973522da1257eaea761/python/pyarrow/src/arrow/python/helpers.cc#L359-L372
   
   but once we determined an object to be null, we don't take the actual type of object into account for the type inference.
   
   So that also means that currently you could mix all kinds of null-likes together when specifying `from_pandas=True` (also for cases where pandas would not allow this), for example:
   
   ```
   In [18]: pa.array([1, pd.NaT], from_pandas=True)
   Out[18]: 
   <pyarrow.lib.Int64Array object at 0x7fcb37d62ec0>
   [
     1,
     null
   ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org