You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/10/19 08:19:00 UTC

[jira] [Created] (ARROW-18097) [C++] Add a "list_contains" kernel

Joris Van den Bossche created ARROW-18097:
---------------------------------------------

             Summary: [C++] Add a "list_contains" kernel
                 Key: ARROW-18097
                 URL: https://issues.apache.org/jira/browse/ARROW-18097
             Project: Apache Arrow
          Issue Type: Task
          Components: C++
            Reporter: Joris Van den Bossche


Assume you have a list array:

{code}
arr = pa.array([["a", "b"], ["a", "c"], ["b", "c", "d"]])
{code}

And you want to know for each list if it contains a certain value (of the same type as the list's values). A "list_contains" function (or other name) would be useful for that:

{code}
pc.list_contains(arr, "a")
# -> True, True False
{code}

The current workaround that I found was flattening, checking equality, and then reducing again with groupby, but this is quite tedious:

{code}
>>> temp = pa.table({'index': pc.list_parent_indices(arr), 'contains_value': pc.equal(pc.list_flatten(arr), "a")})
>>> temp.group_by('index').aggregate([('contains_value', 'any')])['contains_value_any'].chunk(0)
<pyarrow.lib.BooleanArray object at 0x7ffaf3f8de20>
[
  true,
  true,
  false
]
{code}

But this also only works if there are no empty or missing list values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)