You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/12/02 14:26:00 UTC

[jira] [Commented] (ARROW-14946) [C++][Python] An operator for finding indices of a value

    [ https://issues.apache.org/jira/browse/ARROW-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452435#comment-17452435 ] 

Joris Van den Bossche commented on ARROW-14946:
-----------------------------------------------

This is also related to numpy's {{nonzero}} in combination with an equality comparison:

{code}
In [66]: values = np.array([1, 2, 2, 3, 4, 1])

In [67]: np.nonzero(values == 1)
Out[67]: (array([0, 5]),)
{code}

which is also being discussed in ARROW-13035.  
Although for this case having to go through a boolean array to only find the indices might give an additional overhead (this might be worth experimenting with).

---

> This would be a binary vector kernel IMO.

For a scalar right-value (as in your example above), the expected behaviour is clear. But would it be limited to scalars? (the expected behaviour for non-scalars is not really obvious to me)

> [C++][Python] An operator for finding indices of a value 
> ---------------------------------------------------------
>
>                 Key: ARROW-14946
>                 URL: https://issues.apache.org/jira/browse/ARROW-14946
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: Niranda Perera
>            Priority: Major
>
> As discussed in this mail thread [1], it would be nice to have a search operator returning the indices of a Value. 
> ex:
> {code:java}
> values = pa.array([1, 2, 2, 3, 4, 1])
> indices = find_indices(values, 1) #  expected = [0, 5]{code}
> currently there is an option to get the "first index" of a value using aggregates.index method. This would be a binary vector kernel IMO. 
> This is somewhat similar to `numpy.where` [2] but without a `y` input. 
>  
> [1] [https://lists.apache.org/thread/o8d4m905fxswcg0qjjx7gj3ql2d582k4]
> [2] https://numpy.org/doc/stable/reference/generated/numpy.where.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)