You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/10/19 16:12:00 UTC

[jira] [Commented] (ARROW-18097) [C++] Add a "list_contains" kernel

    [ https://issues.apache.org/jira/browse/ARROW-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620437#comment-17620437 ] 

Antoine Pitrou commented on ARROW-18097:
----------------------------------------

Then there probably should be a "list_index" function as well, similar to "is_in" vs. "index_in" ?
{code}
pc.list_contains(arr, "b")
# -> 1, None, 0
{code}


> [C++] Add a "list_contains" kernel
> ----------------------------------
>
>                 Key: ARROW-18097
>                 URL: https://issues.apache.org/jira/browse/ARROW-18097
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: compute, kernel
>
> Assume you have a list array:
> {code}
> arr = pa.array([["a", "b"], ["a", "c"], ["b", "c", "d"]])
> {code}
> And you want to know for each list if it contains a certain value (of the same type as the list's values). A "list_contains" function (or other name) would be useful for that:
> {code}
> pc.list_contains(arr, "a")
> # -> True, True False
> {code}
> The current workaround that I found was flattening, checking equality, and then reducing again with groupby, but this is quite tedious:
> {code}
> >>> temp = pa.table({'index': pc.list_parent_indices(arr), 'contains_value': pc.equal(pc.list_flatten(arr), "a")})
> >>> temp.group_by('index').aggregate([('contains_value', 'any')])['contains_value_any'].chunk(0)
> <pyarrow.lib.BooleanArray object at 0x7ffaf3f8de20>
> [
>   true,
>   true,
>   false
> ]
> {code}
> But this also only works if there are no empty or missing list values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)