You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2019/12/10 14:36:00 UTC

[jira] [Commented] (ARROW-6775) Proposal for several Array utility functions

    [ https://issues.apache.org/jira/browse/ARROW-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992607#comment-16992607 ] 

Joris Van den Bossche commented on ARROW-6775:
----------------------------------------------

[~brillsp] thanks for opening the issue, and sorry for the slow reply.

I would recommend to open specific issues for the different items you mention (or after some more feedback, if we think they would be good to add).

{quote}1/ ListLengthFromListArray(ListArray&): Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for large lists). For example:{quote}

This can relatively easy be calculated from the offsets, I think? (and the offsets are now exposed in python)

{quote}3/ GetArrayNullBitmapAsByteArray(Array&): Returns the array's null bitmap as a UInt8Array (which can be efficiently converted to a bool numpy array){quote}

I think this is certainly something we want to add somehow. This also related to exposing a "IsNull" that returns a BooleaArray from the bitmap, see ARROW-971 and discussion in the PR. 
Maybe an utility to convert the bitmap to BooleanArray is more general, as the conversion for bitmap BooleanArray to bool/int8 numpy array is already implemented.

{quote}4/ GetFlattenedArrayParentIndices(ListArray&)

Makes a int32 array of the same length as the flattened ListArray. returned_array[i] == j means i-th element in the flattened ListArray came from j-th list in the ListArray.

For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3]{quote}

Can you explain this one a bit more?

> Proposal for several Array utility functions
> --------------------------------------------
>
>                 Key: ARROW-6775
>                 URL: https://issues.apache.org/jira/browse/ARROW-6775
>             Project: Apache Arrow
>          Issue Type: Wish
>            Reporter: Zhuo Peng
>            Priority: Minor
>
> Hi,
> We developed several utilities that computes / accesses certain properties of Arrays and wonder if they make sense to get them into the upstream (into both the C++ API and pyarrow) and assuming yes, where is the best place to put them?
> Maybe I have overlooked existing APIs that already do the same.. in that case please point out.
>  
> 1/ ListLengthFromListArray(ListArray&)
> Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for large lists). For example:
> [[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned array can be converted to numpy)
>  
> 2/ GetBinaryArrayTotalByteSize(BinaryArray&)
> Returns the total byte size of a BinaryArray (basically offset[len - 1] - offset[0]).
> Alternatively, a BinaryArray::Flatten() -> Uint8Array would work.
>  
> 3/ GetArrayNullBitmapAsByteArray(Array&)
> Returns the array's null bitmap as a UInt8Array (which can be efficiently converted to a bool numpy array)
>  
> 4/ GetFlattenedArrayParentIndices(ListArray&)
> Makes a int32 array of the same length as the flattened ListArray. returned_array[i] == j means i-th element in the flattened ListArray came from j-th list in the ListArray.
> For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)