You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2020/11/08 02:55:00 UTC
[jira] [Comment Edited] (ARROW-10356) [Rust] [DataFusion] Add
support for is_in
[ https://issues.apache.org/jira/browse/ARROW-10356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227899#comment-17227899 ]
Jorge Leitão edited comment on ARROW-10356 at 11/8/20, 2:54 AM:
----------------------------------------------------------------
Hi [~apatniv], Sure!
So, broad steps, we can split this issue in two:
* changes in the arrow crate
* changes in the datafusion crate
Arrow:
# fork the repo on github and clone it to your computer
# bootsrap your working environment (e.g. vs code + git + rust analyzer plugin)
# create a new module `is_in.rs` on the arrow crate under compute/kernels.
# Design the function. E.g. `is_in` is something like `fn is_in(array1, array2) -> boolean array`.
## Can we leverage the arrow format somehow, or do we need to do it directly on the values? E.g. `is_in` seems to be a NxM problem, as we need to check each element in array1 on array2. Is there any way to speed this process?
## which utility methods do we may need?
## Any edge cases?
# Implement. Check other kernels for inspiration. It is fine to PR only a subset of all types. e.g. start with the implementation for only primitive arrays (int32, int64, etc.)
# Add tests (can be done first)
# Update README
# PR (optional, good to get initial feedback :))
DataFusion:
# Use the kernel defined above on data fusion on physical_plan/functions.rs
# Add the function `is_in` to the `impl` of Expr in logical_plan.rs
# Update README
# PR
Let us know if you have any questions!
was (Author: jorgecarleitao):
Hi [~apatniv], Sure!
So, broad steps, we can split this issue in two:
* changes in the arrow crate
* changes in the datafusion crate
Arrow:
# fork the repo on github and clone it to your computer
# bootsrap your working environment (e.g. vs code + git + rust analyzer plugin)
# create a new module `is_in.rs` on the arrow crate under compute/kernels.
# Design the kernel. E.g. `is_in` is something like `fn is_in(array1, array2) -> boolean array`.
## Can we leverage the arrow format somehow, or do we need to do it directly on the values? E.g. `is_in` seems to be a NxM problem, as we need to check each element in array1 on array2. Is there any way to speed this process?
## which utility methods do we may need?
## Any edge cases?
# Implement. Check other kernels for inspiration. It is fine to PR only a subset of all types. e.g. start with the implementation for only primitive arrays (int32, int64, etc.)
# Add tests (can be done first)
# Update README
# PR (optional, good to get initial feedback :))
DataFusion:
# Use the kernel defined above on data fusion on physical_plan/functions.rs
# Add the function `is_in` to the `impl` of Expr in logical_plan.rs
# Update README
# PR
Let us know if you have any questions!
> [Rust] [DataFusion] Add support for is_in
> -----------------------------------------
>
> Key: ARROW-10356
> URL: https://issues.apache.org/jira/browse/ARROW-10356
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Rust, Rust - DataFusion
> Reporter: Jorge Leitão
> Priority: Major
> Labels: beginner
>
> Add support for a function that returns true if an element belongs to a finite list of elements in a list.
> The list should be an arrow array (to be dynamically-typed).
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)