You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by jo...@apache.org on 2021/01/02 23:29:01 UTC
[arrow] branch master updated: ARROW-11086: [Rust] Extend take
implementation to more index types
This is an automated email from the ASF dual-hosted git repository.
jorgecarleitao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 5228ede ARROW-11086: [Rust] Extend take implementation to more index types
5228ede is described below
commit 5228ede9abf8ecf9b4bb68a06075cd16af3523e9
Author: Heres, Daniel <da...@gmail.com>
AuthorDate: Sat Jan 2 23:28:03 2021 +0000
ARROW-11086: [Rust] Extend take implementation to more index types
## Context
The context of this PR is that I want to experiment with a simplified implementation of the hash join in DataFusion which directly can index the build-side array instead of keeping a list of batches. This array could grow beyond 2 ^ 32 billion elements, so would need indexes of type `UInt64` rather than `UInt32`.
## Implementation
In the PR I just extend the public `take` to take any `IndexType` which implements `ArrowNumericType` and `ToPrimitive`.
I am not sure about the consideration before to restrict `take` to only `UInt32Array`.
Closes #9057 from Dandandan/take_index
Authored-by: Heres, Daniel <da...@gmail.com>
Signed-off-by: Jorge C. Leitao <jo...@gmail.com>
---
rust/arrow/src/compute/kernels/take.rs | 12 ++++++++----
rust/datafusion/src/physical_plan/hash_join.rs | 3 ++-
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/rust/arrow/src/compute/kernels/take.rs b/rust/arrow/src/compute/kernels/take.rs
index c8a2a02..85567ad 100644
--- a/rust/arrow/src/compute/kernels/take.rs
+++ b/rust/arrow/src/compute/kernels/take.rs
@@ -76,12 +76,16 @@ macro_rules! downcast_dict_take {
/// # Ok(())
/// # }
/// ```
-pub fn take(
+pub fn take<IndexType>(
values: &Array,
- indices: &UInt32Array,
+ indices: &PrimitiveArray<IndexType>,
options: Option<TakeOptions>,
-) -> Result<ArrayRef> {
- take_impl::<UInt32Type>(values, indices, options)
+) -> Result<ArrayRef>
+where
+ IndexType: ArrowNumericType,
+ IndexType::Native: ToPrimitive,
+{
+ take_impl(values, indices, options)
}
fn take_impl<IndexType>(
diff --git a/rust/datafusion/src/physical_plan/hash_join.rs b/rust/datafusion/src/physical_plan/hash_join.rs
index a9d5963..50800df 100644
--- a/rust/datafusion/src/physical_plan/hash_join.rs
+++ b/rust/datafusion/src/physical_plan/hash_join.rs
@@ -316,7 +316,8 @@ fn build_batch_from_indices(
// 2. based on the pick, `take` items from the different recordBatches
let mut columns: Vec<Arc<dyn Array>> = Vec::with_capacity(schema.fields().len());
- let right_indices = indices.iter().map(|(_, join_index)| join_index).collect();
+ let right_indices: UInt32Array =
+ indices.iter().map(|(_, join_index)| join_index).collect();
for field in schema.fields() {
// pick the column (left or right) based on the field name.