You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/11/19 14:44:00 UTC

[jira] [Comment Edited] (ARROW-10640) [C++] A "where" kernel to combine two arrays based on a mask

    [ https://issues.apache.org/jira/browse/ARROW-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235523#comment-17235523 ] 

Joris Van den Bossche edited comment on ARROW-10640 at 11/19/20, 2:43 PM:
--------------------------------------------------------------------------

bq. A boolean entry can be true, false or null. Shouldn't the kernel combine three arrays? (of course, you're free to pass the same input twice).

Yes, that's indeed a possibility. Now, I think we could also decide that a null in the mask (or the indices) becomes a null in the output (I suppose the 3-arrays approach is more generally flexible, but might also be less performant compared to simply propagating the validity bitmask?)

---

{{np.choose}} indeed sounds as a generalization of where. No idea how much it is used though, and if it would be worth the added complexity compared to {{np.where}} (although it might not be that more complex)

The "zip" seems useful as well. Basically you could say it is the same as "where", but where the arrays only have those number of elements as needed to fill in the output array. In that sense, could also have such a generalized zip (or interleave), similarly as "choose" generalizes "where": {{zip(array[int], x1, x2, ..., xn)}} using an array of indices instead of a boolean mask.





was (Author: jorisvandenbossche):
bq. A boolean entry can be true, false or null. Shouldn't the kernel combine three arrays? (of course, you're free to pass the same input twice).

Yes, that's indeed a possibility. Now, I think we could also decide that a null in the mask (or the indices) becomes a null in the output (I suppose the 3-arrays approach is more generally flexible, but might also be less performant compared to simply propagating the validity bitmask?)

{{np.choose}} indeed sounds as a generalization of where. No idea how much it is used though, and if it would be worth the added complexity compared to {{np.where}} (although it might not be that more complex)

The "zip" seems useful as well. Basically you could say it is the same as "where", but where the arrays only have those number of elements as needed to fill in the output array. In that sense, could also have such a generalized zip (or interleave), similarly as "choose" generalizes "where": {{zip(array[int], x1, x2, ..., xn)}} using an array of indices instead of a boolean mask.




> [C++] A "where" kernel to combine two arrays based on a mask
> ------------------------------------------------------------
>
>                 Key: ARROW-10640
>                 URL: https://issues.apache.org/jira/browse/ARROW-10640
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> (from discussion in ARROW-9489 with [~maartenbreddels])
> A general "where" kernel like {{np.where}} (https://numpy.org/doc/stable/reference/generated/numpy.where.html) seems a generally useful kernel to have, and could also help mimicking some other python (setitem-like) operations. 
> The concrete use case in ARROW-9489 is to basically do a {{fill_null(array[string], array[string])}} which could be expressed as {{where(is_null(arr), arr2, arr)}}. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)