You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2020/11/08 02:55:00 UTC

[jira] [Comment Edited] (ARROW-10356) [Rust] [DataFusion] Add support for is_in

    [ https://issues.apache.org/jira/browse/ARROW-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227899#comment-17227899 ] 

Jorge Leitão edited comment on ARROW-10356 at 11/8/20, 2:54 AM:
----------------------------------------------------------------

Hi [~apatniv], Sure!

So, broad steps, we can split this issue in two:
 * changes in the arrow crate
 * changes in the datafusion crate

Arrow:
 # fork the repo on github and clone it to your computer
 # bootsrap your working environment (e.g. vs code + git + rust analyzer plugin)
 # create a new module `is_in.rs` on the arrow crate under compute/kernels.
 # Design the function. E.g. `is_in` is something like `fn is_in(array1, array2) -> boolean array`.
 ## Can we leverage the arrow format somehow, or do we need to do it directly on the values? E.g. `is_in` seems to be a NxM problem, as we need to check each element in array1 on array2. Is there any way to speed this process?
 ## which utility methods do we may need?
 ## Any edge cases?
 # Implement. Check other kernels for inspiration. It is fine to PR only a subset of all types. e.g. start with the implementation for only primitive arrays (int32, int64, etc.)
 # Add tests (can be done first)
 # Update README
 # PR (optional, good to get initial feedback :))

DataFusion:
 # Use the kernel defined above on data fusion on physical_plan/functions.rs
 # Add the function `is_in` to the `impl` of Expr in logical_plan.rs
 # Update README
 # PR

Let us know if you have any questions!


was (Author: jorgecarleitao):
Hi [~apatniv], Sure!

So, broad steps, we can split this issue in two:
 * changes in the arrow crate
 * changes in the datafusion crate

Arrow:
 # fork the repo on github and clone it to your computer
 # bootsrap your working environment (e.g. vs code + git + rust analyzer plugin)
 # create a new module `is_in.rs` on the arrow crate under compute/kernels.
 # Design the kernel. E.g. `is_in` is something like `fn is_in(array1, array2) -> boolean array`.
 ## Can we leverage the arrow format somehow, or do we need to do it directly on the values? E.g. `is_in` seems to be a NxM problem, as we need to check each element in array1 on array2. Is there any way to speed this process?
 ## which utility methods do we may need?
 ## Any edge cases?
 # Implement. Check other kernels for inspiration. It is fine to PR only a subset of all types. e.g. start with the implementation for only primitive arrays (int32, int64, etc.)
 # Add tests (can be done first)
 # Update README
 # PR (optional, good to get initial feedback :))

DataFusion:
 # Use the kernel defined above on data fusion on physical_plan/functions.rs
 # Add the function `is_in` to the `impl` of Expr in logical_plan.rs
 # Update README
 # PR

Let us know if you have any questions!

> [Rust] [DataFusion] Add support for is_in
> -----------------------------------------
>
>                 Key: ARROW-10356
>                 URL: https://issues.apache.org/jira/browse/ARROW-10356
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust, Rust - DataFusion
>            Reporter: Jorge Leitão
>            Priority: Major
>              Labels: beginner
>
> Add support for a function that returns true if an element belongs to a finite list of elements in a list.
> The list should be an arrow array (to be dynamically-typed).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)