You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/05 13:04:42 UTC

[GitHub] [arrow] alamb commented on a change in pull request #8556: ARROW-10378: [Rust] Update take() kernel with support for LargeList.

alamb commented on a change in pull request #8556:
URL: https://github.com/apache/arrow/pull/8556#discussion_r518033636



##########
File path: rust/arrow/src/compute/util.rs
##########
@@ -100,41 +99,55 @@ pub(super) fn compare_option_bitmap(
 /// Where a list array has indices `[0,2,5,10]`, taking indices of `[2,0]` returns
 /// an array of the indices `[5..10, 0..2]` and offsets `[0,5,7]` (5 elements and 2
 /// elements)
-pub(super) fn take_value_indices_from_list(
+pub(super) fn take_value_indices_from_list<IndexType, OffsetType>(
     values: &ArrayRef,
-    indices: &UInt32Array,
-) -> (UInt32Array, Vec<i32>) {
+    indices: &PrimitiveArray<IndexType>,
+) -> Result<(PrimitiveArray<OffsetType>, Vec<OffsetType::Native>)>
+where
+    IndexType: ArrowNumericType,
+    IndexType::Native: ToPrimitive,
+    OffsetType: ArrowNumericType,
+    OffsetType::Native: OffsetSizeTrait + Add + Zero + One,
+    PrimitiveArray<OffsetType>: From<Vec<Option<OffsetType::Native>>>,
+{
     // TODO: benchmark this function, there might be a faster unsafe alternative
     // get list array's offsets
-    let list: &ListArray = values.as_any().downcast_ref::<ListArray>().unwrap();
-    let offsets: Vec<u32> = (0..=list.len())
-        .map(|i| list.value_offset(i) as u32)
-        .collect();
+    let list = values
+        .as_any()
+        .downcast_ref::<GenericListArray<OffsetType::Native>>()
+        .unwrap();
+    let offsets: Vec<OffsetType::Native> =
+        (0..=list.len()).map(|i| list.value_offset(i)).collect();
+
     let mut new_offsets = Vec::with_capacity(indices.len());
     let mut values = Vec::new();
-    let mut current_offset = 0;
+    let mut current_offset = OffsetType::Native::zero();
     // add first offset
-    new_offsets.push(0);
+    new_offsets.push(OffsetType::Native::zero());
     // compute the value indices, and set offsets accordingly
     for i in 0..indices.len() {
         if indices.is_valid(i) {
-            let ix = indices.value(i) as usize;
+            let ix = ToPrimitive::to_usize(&indices.value(i)).ok_or_else(|| {
+                ArrowError::ComputeError("Cast to usize failed".to_string())
+            })?;
             let start = offsets[ix];
             let end = offsets[ix + 1];
-            current_offset += (end - start) as i32;
+            current_offset = current_offset + (end - start);
             new_offsets.push(current_offset);
+
+            let mut curr = start;
+
             // if start == end, this slot is empty
-            if start != end {

Review comment:
       is there a reason to change this loop from a `collect`?

##########
File path: rust/arrow/src/compute/kernels/take.rs
##########
@@ -195,15 +217,21 @@ where
 
     let null_slice = null_buf.data_mut();
 
-    let new_values: Vec<T::Native> = (0..data_len)
-        .map(|i| {
-            let index = indices.value(i) as usize;
-            if array.is_null(index) {
-                bit_util::unset_bit(null_slice, i);
-            }
-            array.value(index)
-        })
-        .collect();
+    let mut new_values: Vec<T::Native> = Vec::with_capacity(data_len);
+    let mut i = 0;
+    while i < data_len {

Review comment:
       I wonder if there was any particular reason to rewrite this (at least to me) Rust idomatic `map` to a `while` loop?
   
   It seems like you could perhaps do something like this (untested) if you wanted to keep the same loop structure and only change the index calculation:
   
   ```
       let new_values: Vec<T::Native> = (0..data_len)
           .map(|i| {
               let index = ToPrimitive::to_usize(&indices.value(i)).ok_or_else(|| {
                 ArrowError::ComputeError("Cast to usize failed".to_string())
               })?;
               if array.is_null(index) {
                   bit_util::unset_bit(null_slice, i);
               }
               Ok(array.value(index))
           })
           .collect::<Result<Vec<_>>()?;
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org