You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/29 18:07:04 UTC

[GitHub] [arrow] alamb commented on a change in pull request #8303: ARROW-10136: [Rust][Arrow]: Fix null handling in StringArray and BinaryArray filtering, add BinaryArray::from_opt_vec

alamb commented on a change in pull request #8303:
URL: https://github.com/apache/arrow/pull/8303#discussion_r496930511



##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -353,15 +353,19 @@ impl FilterContext {
                         // foreach bit in batch:
                         if (filter_batch & self.filter_mask[j]) != 0 {
                             let data_index = (i * 64) + j;
-                            values.push(input_array.value(data_index));
+                            if input_array.is_null(data_index) {

Review comment:
       This is the same pattern as in the handling for primative array: https://github.com/apache/arrow/pull/8303/files#diff-d7b0b7cde1850e8744ceda458c6dea81R294-L298

##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -373,7 +377,11 @@ impl FilterContext {
                         // foreach bit in batch:
                         if (filter_batch & self.filter_mask[j]) != 0 {
                             let data_index = (i * 64) + j;
-                            values.push(input_array.value(data_index));
+                            if input_array.is_null(data_index) {

Review comment:
       Likewise, this special case appears to miss the null check too

##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -353,15 +353,19 @@ impl FilterContext {
                         // foreach bit in batch:
                         if (filter_batch & self.filter_mask[j]) != 0 {
                             let data_index = (i * 64) + j;
-                            values.push(input_array.value(data_index));
+                            if input_array.is_null(data_index) {
+                                values.push(None)
+                            } else {
+                                values.push(Some(input_array.value(data_index)))
+                            }
                         }
                     }
                 }
                 Ok(Arc::new(BinaryArray::from(values)))
             }
             DataType::Utf8 => {
                 let input_array = array.as_any().downcast_ref::<StringArray>().unwrap();
-                let mut values: Vec<&str> = Vec::with_capacity(self.filtered_count);
+                let mut values: Vec<Option<&str>> = Vec::with_capacity(self.filtered_count);

Review comment:
       Note using an `Option` is likely to increase the temporary storage requirements a bit.
   
   It would likely be possible to avoid this allocation entirely if we used the lower level  `ArrayBuilder::with_bit_buffer`. 
   
   I chose to follow the style of the rest of this module, though I would love opinions on trying to perf check this / optimize it (maybe a follow on JIRA ticket is enough)?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org