You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 14:02:16 UTC

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #9164: ARROW-10663: [C++] Fix is_in and index_in behaviour

jorisvandenbossche edited a comment on pull request #9164:
URL: https://github.com/apache/arrow/pull/9164#issuecomment-758671039


   Thanks for the PR! 
   One additional thing I am wondering, while testing it out, would we ever want a behaviour where a `null` in the input gives a `null` in the output? Right now it's only possible to get `false` (if there is no null in the value_set, or if skip_nulls=True) or `true` (if there is a null in the value_set and skip_nulls=False).
   
   So something like `isin([1, 2, null], value_set=[1, 3]) -> [true, false, null]`
   
   If you see "isin"  as a shortcut to write multiple equality comparisons (`isin(input, value_set=[1, 3, ...]` -> `(input == 1) | (input == 3) | ...`), then you would get such behaviour. 
   But so it's a bit the question whether for "isin" we use "equality" semantics or "identity/lookup" semantics for nulls (and given it's now called "SetLookup" in the function names, we clearly go for the second, but I am not fully sure which of the two are most useful / expected in practice).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org