You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by jo...@apache.org on 2021/01/02 23:29:01 UTC

[arrow] branch master updated: ARROW-11086: [Rust] Extend take implementation to more index types

This is an automated email from the ASF dual-hosted git repository.

jorgecarleitao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 5228ede  ARROW-11086: [Rust] Extend take implementation to more index types
5228ede is described below

commit 5228ede9abf8ecf9b4bb68a06075cd16af3523e9
Author: Heres, Daniel <da...@gmail.com>
AuthorDate: Sat Jan 2 23:28:03 2021 +0000

    ARROW-11086: [Rust] Extend take implementation to more index types
    
    ## Context
    The context of this PR is that I want to experiment with a simplified implementation of the hash join in DataFusion which directly can index the build-side array instead of keeping a list of batches. This array could grow beyond 2 ^ 32 billion elements, so would need indexes of type `UInt64` rather than `UInt32`.
    
    ## Implementation
    
    In the PR I just extend the public `take` to take any `IndexType` which implements `ArrowNumericType` and `ToPrimitive`.
    I am not sure about the consideration before to restrict `take` to only `UInt32Array`.
    
    Closes #9057 from Dandandan/take_index
    
    Authored-by: Heres, Daniel <da...@gmail.com>
    Signed-off-by: Jorge C. Leitao <jo...@gmail.com>
---
 rust/arrow/src/compute/kernels/take.rs         | 12 ++++++++----
 rust/datafusion/src/physical_plan/hash_join.rs |  3 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/rust/arrow/src/compute/kernels/take.rs b/rust/arrow/src/compute/kernels/take.rs
index c8a2a02..85567ad 100644
--- a/rust/arrow/src/compute/kernels/take.rs
+++ b/rust/arrow/src/compute/kernels/take.rs
@@ -76,12 +76,16 @@ macro_rules! downcast_dict_take {
 /// # Ok(())
 /// # }
 /// ```
-pub fn take(
+pub fn take<IndexType>(
     values: &Array,
-    indices: &UInt32Array,
+    indices: &PrimitiveArray<IndexType>,
     options: Option<TakeOptions>,
-) -> Result<ArrayRef> {
-    take_impl::<UInt32Type>(values, indices, options)
+) -> Result<ArrayRef>
+where
+    IndexType: ArrowNumericType,
+    IndexType::Native: ToPrimitive,
+{
+    take_impl(values, indices, options)
 }
 
 fn take_impl<IndexType>(
diff --git a/rust/datafusion/src/physical_plan/hash_join.rs b/rust/datafusion/src/physical_plan/hash_join.rs
index a9d5963..50800df 100644
--- a/rust/datafusion/src/physical_plan/hash_join.rs
+++ b/rust/datafusion/src/physical_plan/hash_join.rs
@@ -316,7 +316,8 @@ fn build_batch_from_indices(
     // 2. based on the pick, `take` items from the different recordBatches
     let mut columns: Vec<Arc<dyn Array>> = Vec::with_capacity(schema.fields().len());
 
-    let right_indices = indices.iter().map(|(_, join_index)| join_index).collect();
+    let right_indices: UInt32Array =
+        indices.iter().map(|(_, join_index)| join_index).collect();
 
     for field in schema.fields() {
         // pick the column (left or right) based on the field name.