You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "mosalx (via GitHub)" <gi...@apache.org> on 2023/04/27 21:28:35 UTC

[GitHub] [arrow] mosalx opened a new issue, #35358: Inconsistent output between ListArray.flatten() and ChunkedArray.flatten() with struct type

mosalx opened a new issue, #35358:
URL: https://github.com/apache/arrow/issues/35358

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   # Summary
   `ListArray.flatten` and `FixedSizeListArray.flatten` return an array of its values, sliced with offsets of the parent array.
   When the same array is wrapped into a `ChunkedArray`, and the array type is `StructArray` the output is a chunked array of the parent array. In other words, `ChunkedArray.flatten` returns the array itself, instead of its values
   
   
   # Environment
   Observed on Windows 10
   python=3.11.2
   pyarrow=11.0.0
   
   
   # Details
   
   ```python
   import pyarrow as pa
   
   array = pa.array([
       [{'a': 5}, {'a': 6}],
       [{'a': 7}]
   ])
   
   # same array wrapped in a ChunkedArray
   array_chunked = pa.chunked_array([array])
   ```
   
   Now let's flatten each of these two arrays.
   Output of `array.flatten()`
   ```python
   <pyarrow.lib.StructArray object at 0x000001DAA018D8A0>
   -- is_valid: all not null
   -- child 0 type: int64
     [
       5,
       6,
       7
     ]
   ```
   
   Output of `array_chunked.flatten()`
   ```python
   [<pyarrow.lib.ChunkedArray object at 0x000001DAAE852A20>
    [
      [
        -- is_valid: all not null
        -- child 0 type: int64
          [
            5,
            6
          ],
        -- is_valid: all not null
        -- child 0 type: int64
          [
            7
          ]
      ]
    ]]
   ```
   
   In other words, the first chunk of the flattened chunked array is equal to the original array, which should not happen
   ```python
   assert not array.equals(array_chunked.flatten()[0].chunk(0))  # AssertionError
   ```
   
   This issue is observed with `FixedSizeListArray` as well
   ```python
   array = pa.array([[{'a': 5}], [{'a': 7}]], 
                    type=pa.list_(pa.struct([('a', pa.int32())]), list_size=1))
   array_chunked = pa.chunked_array([array])
   assert not arr.equals(carr.flatten()[0].chunk(0))  # AssertionError
   ```
   
   Expected behavior:
   Flattened chunked array is expected to be a chunked array wrapping values of the original array, which is a `StructArray`
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35358: Inconsistent output between ListArray.flatten() and ChunkedArray.flatten() with struct type

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35358:
URL: https://github.com/apache/arrow/issues/35358#issuecomment-1572156332

   Closing as this appears to be expected behavior but feel free to reopen if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace closed issue #35358: Inconsistent output between ListArray.flatten() and ChunkedArray.flatten() with struct type

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace closed issue #35358: Inconsistent output between ListArray.flatten() and ChunkedArray.flatten() with struct type
URL: https://github.com/apache/arrow/issues/35358


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on issue #35358: Inconsistent output between ListArray.flatten() and ChunkedArray.flatten() with struct type

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on issue #35358:
URL: https://github.com/apache/arrow/issues/35358#issuecomment-1546716017

   This is actually expected behavior - although your assumption here isn't unwarranted since there's some unfortunate naming at play.
   
   Basically, `flatten` has two different meanings depending on whether we're dealing with a `ListArray` or `StructArray`... For the former, the list values get concatenated into a single array like you mentioned. For the latter, "flattening" a struct means returning each of its child fields as a separate array. `ChunkedArray`'s `flatten` method strictly mirrors the `StructArray` version - so, since the underlying type isn't a struct, it will just return the `ListArray` chunks as-is.
   
   Note that the original `array`'s list types _should_ be irrelevant here - e.g. flattening a chunked array of `list<int64>` should also yield a single chunked array of `list<int64>` (just like with `list<struct<a: int64>>`).
   
   If you're suggesting that we modify `ChunkedArray::flatten` to handle lists differently than structs then I don't disagree in principle, although I suspect adding an independent method that does what you've described would be more appropriate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org