You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alessandro Molina (Jira)" <ji...@apache.org> on 2021/04/20 14:05:00 UTC

[jira] [Commented] (ARROW-5869) [Python] Need a way to access UnionArray's children as Arrays in pyarrow

    [ https://issues.apache.org/jira/browse/ARROW-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325817#comment-17325817 ] 

Alessandro Molina commented on ARROW-5869:
------------------------------------------

This seems to have been already addressed. It seems it's now possible to access {{UnionArray}} children using {{UnionArray.field}}


{code:python}
>>> first = pa.array([1, 2, 3])
>>> second = pa.array(["A", "B", "C"])
>>> ua = pa.UnionArray.from_sparse(pa.array([0, 0, 1]), [first, second])
>>> ua.field(0)
<pyarrow.lib.Int64Array object at 0x126d84520>
[
  1,
  2,
  3
]
>>> ua.field(1)
<pyarrow.lib.StringArray object at 0x126d844c0>
[
  "A",
  "B",
  "C"
]
{code}



> [Python] Need a way to access UnionArray's children as Arrays in pyarrow
> ------------------------------------------------------------------------
>
>                 Key: ARROW-5869
>                 URL: https://issues.apache.org/jira/browse/ARROW-5869
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0
>            Reporter: Jim Pivarski
>            Priority: Major
>
>  
> There doesn't seem to be a way to get to the children of sparse or dense UnionArrays. For other types, there's
>  * ListType: array.flatten()
>  * StructType: array.field("fieldname")
>  * DictionaryType: array.indices and now array.dictionary (in 0.14.0)
>  * (other types have no children, I think...)
> The reason this comes up now is that I have a downstream library that does a zero-copy view of Arrow by recursively walking over its types and interpreting the list of buffers for each type. In the past, I didn't need the _array_ children of each array—I popped the right number of buffers off the list depending on the type—but now the dictionary for DictionaryType has been moved from the type object to the array object (in 0.14.0). Since it's neither in the buffers list, nor in the type tree, I need to walk the tree of arrays in tandem with the tree of types.
> That would be okay, except that I don't see how to descend from a UnionArray to its children.
> This is the function where I do the walk down types (tpe), and now arrays (array), while interpreting the right number of buffers at each step.
> [https://github.com/scikit-hep/awkward-array/blob/7c5961405cc39bbf2b489fad171652019c8de41b/awkward/arrow.py#L228-L364]
> Simply exposing the std::vector named "children" as a Python sequence or a child(int i) method would provide a way to descend UnionTypes and make this kind of access uniform across all types.
> Alternatively, putting the array.dictionary in the list of buffers would also do it (and make it unnecessary for me to walk over the arrays), but in general it seems like a good idea to make arrays accessible. It seems like it belongs in the buffers, but that would probably be a big change, not to be undertaken for minor reasons.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)