You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "spenczar (via GitHub)" <gi...@apache.org> on 2023/05/16 19:13:28 UTC

[GitHub] [arrow] spenczar opened a new issue, #35624: [Python] Can't provide Arrow array when filling nulls of a fixed size list array

spenczar opened a new issue, #35624:
URL: https://github.com/apache/arrow/issues/35624

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   Suppose you have a fixed-sized list array with null values:
   
   ```py
   array = pa.nulls(3, pa.list_(pa.float64(), 2))
   print(array)
   # [
   #  null,
   #  null,
   #  null
   #]
   ```
   
   How do you fill the nulls? Like, maybe I want it to look like `[[null, null], [null, null], [null, null]]`. Or even `[[0, 0], [0, 0], [0, 0]]`.
   
   You can't call `fill_null` with a pyarrow array value, because the arrays don't have an '`as_py`' method:
   ```py
   array.fill_null(pa.array([0.0, 0.0]))
   ```
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/array.pxi", line 1296, in pyarrow.lib.Array.fill_null
     File "/Users/swnelson/scratch/.arrow-venv/lib/python3.11/site-packages/pyarrow/compute.py", line 523, in fill_null
       fill_value = pa.scalar(fill_value.as_py(), type=values.type)
                              ^^^^^^^^^^^^^^^^
   AttributeError: 'pyarrow.lib.DoubleArray' object has no attribute 'as_py'
   ```
   
   But you _can_ do it with a plain old Python list!
   
   ```py
   array.fill_null([0.0, 0.0])
   ```
   ```
   <pyarrow.lib.FixedSizeListArray object at 0x135647460>
   [
     [
       0,
       0
     ],
     [
       0,
       0
     ],
     [
       0,
       0
     ]
   ]
   ```
   
   This seems like an oversight. If I give an array of the right shape, surely it should be permitted.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35624: [Python] Can't provide Arrow array when filling nulls of a fixed size list array

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35624:
URL: https://github.com/apache/arrow/issues/35624#issuecomment-1560683539

   Thanks for the report!
   
   The code in question is:
   
   https://github.com/apache/arrow/blob/6d3d2fca2c9693231fa1e52c142ceef563fc23f9/python/pyarrow/compute.py#L520-L525
   
   So if the types don't match, it tries to convert the fill_value to a _scalar_ of the correct type. There seems to be multiple things fishy about this: 1) this should probably cast to the correct type instead of going through `as_py`, and 2) the current version also only works for scalars, not arrays .. 
   We can see this with a simple (non-nested array) example as well:
   
   ```python
   >>> arr = pa.array([1, 2, None, 4, None])
   >>> arr.fill_null(pa.array([10, 20, 30, 40, 50]))
   <pyarrow.lib.Int64Array object at 0x7f41c56983a0>
   [
     1,
     2,
     30,
     4,
     50
   ]
   >>> arr.fill_null(pa.array([10, 20, 30, 40, 50], type="int32"))
   ...
   AttributeError: 'pyarrow.lib.Int32Array' object has no attribute 'as_py'
   ```
   
   
   ---
   
   Now, for your original example using a list array, the situation is a bit more complicated. Because for the case where you pass a pyarrow.Array, the types also don't match (list of float vs float), and automatically casting in that case also wouldn't work. 
   The reason that the python list works is because we convert that to a scalar of the correct type under the hood (the `a.scalar(fill_value, type=values.type)` in the code snippet above). 
   
   But so if you pass a pyarrow object, I think we will need to require that you pass a "correct" scalar instead (because when passing an array, we assume that this is meant for filling with an array element-wise, and not for filling with a scalar). If you start from a pyarrow array, you can convert this to a scalar of the correct type before passing it to `fill_null`, but the problem is that this also fails at the moment:
   
   ```
   >>> pa.scalar(pa.array([0.0, 0.0]),  pa.list_(pa.float64(), 2))
   ...
   ArrowInvalid: Could not convert <pyarrow.DoubleScalar: 0.0> with type pyarrow.lib.DoubleScalar: tried to convert to double
   ```
   
   So that's also something we should fix. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35624: [Python] Can't provide Arrow array when filling nulls of a fixed size list array

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35624:
URL: https://github.com/apache/arrow/issues/35624#issuecomment-1561396556

   > Oh wow, `fill_null` works quite differently than I expected. Your example of passing it an array and it working _elementwise_ is very surprising to me! `fill_value`'s documentation is a bit misleading, then; it says
   
   Yeah, that's indeed certainly not properly documented. I am not sure this was fully intentional (maybe when it was added, mostly the scalar fill_value was considered), but this is essentially an alias for `coalesce`, and it is this function that supports this behaviour. 
   And also for example `fillna` in pandas has the same ability of either filling with a scalar or filling with the corresponding values of a same-length array.
   
   Your suggested rewording sounds good!
   
   > This issue seems to fracture into three issues:
   
   Yes, that's a good summary. 
   
   > can't make an explicit scalar element of a fixed-size list (which sounds kind of like https://github.com/apache/arrow/issues/18987 perhaps)
   
   That's indeed a related issue. It's also related to https://github.com/apache/arrow/issues/21761, for accepting pyarrow values in `pa.array(..)` constructor (and not `pa.scalar(..)`). For this one, there is an open PR, but it will probably depend on how that PR implements the fix whether that would also fix the scalar() version (we don't call `pa.array(..)` inside `pa.scalar(..)`, but directly the lower level `ConvertPySequence`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] spenczar commented on issue #35624: [Python] Can't provide Arrow array when filling nulls of a fixed size list array

Posted by "spenczar (via GitHub)" <gi...@apache.org>.
spenczar commented on issue #35624:
URL: https://github.com/apache/arrow/issues/35624#issuecomment-1561345012

   Oh wow, `fill_null` works quite differently than I expected. Your example of passing it an array and it working _elementwise_ is very surprising to me! `fill_value`'s documentation is a bit misleading, then; it says
   
   > Replace each null element in values with fill_value. 
   
   But it _should_ say something like "Replace each null element in values with a corresponding element from fill_value." 
   
   That's not a perfect phrasing, since it's a bit complicated and hard to fit into a short sentence. As I understand it, the behavior is:
    - If `fill_value` is scalar-like, replace each null element in `values` with `fill_value`.
    - If `fill_value` is array-like, replace the `i`th null element in `values` with the `i`th element in `fill_value`.
   
   This issue seems to fracture into three issues:
    - `as_py` is too crude
    - documentation needs a tweak
    - can't make an explicit scalar element of a fixed-size list (which sounds kind of like https://github.com/apache/arrow/issues/18987, perhaps)
   
   Does that seem right?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org