You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/30 02:23:04 UTC

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8303: ARROW-10136: [Rust]: Fix null handling in StringArray and BinaryArray filtering, add BinaryArray::from_opt_vec

jorgecarleitao commented on a change in pull request #8303:
URL: https://github.com/apache/arrow/pull/8303#discussion_r497204788



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -1267,6 +1267,36 @@ impl<OffsetSize: OffsetSizeTrait> GenericBinaryArray<OffsetSize> {
         GenericBinaryArray::<OffsetSize>::from(array_data)
     }
 
+    fn from_opt_vec(v: Vec<Option<&[u8]>>, data_type: DataType) -> Self {

Review comment:
       is this code tested somewhere?

##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -675,6 +683,29 @@ mod tests {
         assert_eq!(true, d.is_null(0));
     }
 
+    #[test]
+    fn test_filter_string_array_with_null() {
+        let a = StringArray::from(vec![Some("hello"), None, Some("world"), None]);
+        let b = BooleanArray::from(vec![true, false, false, true]);
+        let c = filter(&a, &b).unwrap();
+        let d = c.as_ref().as_any().downcast_ref::<StringArray>().unwrap();
+        assert_eq!(2, d.len());
+        assert_eq!("hello", d.value(0));

Review comment:
       I suggest that we test the 3 quantities: `d.is_null(0)`, `d.value(0)`, `d.is_null(1)`. Alternatively, 
   
   ```rust
   let expected = StringArray::from(vec![Some("hello"), None]);
   assert_eq!(d, expected);
   ```

##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -675,6 +683,29 @@ mod tests {
         assert_eq!(true, d.is_null(0));
     }
 
+    #[test]
+    fn test_filter_string_array_with_null() {
+        let a = StringArray::from(vec![Some("hello"), None, Some("world"), None]);
+        let b = BooleanArray::from(vec![true, false, false, true]);
+        let c = filter(&a, &b).unwrap();
+        let d = c.as_ref().as_any().downcast_ref::<StringArray>().unwrap();
+        assert_eq!(2, d.len());
+        assert_eq!("hello", d.value(0));
+        assert_eq!(true, d.is_null(1));
+    }
+
+    #[test]
+    fn test_filter_binary_array_with_null() {
+        let data: Vec<Option<&[u8]>> = vec![Some(b"hello"), None, Some(b"world"), None];
+        let a = BinaryArray::from(data);
+        let b = BooleanArray::from(vec![true, false, false, true]);
+        let c = filter(&a, &b).unwrap();
+        let d = c.as_ref().as_any().downcast_ref::<BinaryArray>().unwrap();
+        assert_eq!(2, d.len());
+        assert_eq!(b"hello", d.value(0));

Review comment:
       same here.

##########
File path: rust/arrow/src/compute/kernels/filter.rs
##########
@@ -353,15 +353,19 @@ impl FilterContext {
                         // foreach bit in batch:
                         if (filter_batch & self.filter_mask[j]) != 0 {
                             let data_index = (i * 64) + j;
-                            values.push(input_array.value(data_index));
+                            if input_array.is_null(data_index) {
+                                values.push(None)
+                            } else {
+                                values.push(Some(input_array.value(data_index)))
+                            }
                         }
                     }
                 }
                 Ok(Arc::new(BinaryArray::from(values)))
             }
             DataType::Utf8 => {
                 let input_array = array.as_any().downcast_ref::<StringArray>().unwrap();
-                let mut values: Vec<&str> = Vec::with_capacity(self.filtered_count);
+                let mut values: Vec<Option<&str>> = Vec::with_capacity(self.filtered_count);

Review comment:
       Doesn't an `Option` of a reference leverages null pointer optimization (on which `None` is represented by the null pointer, e.g. https://github.com/rust-lang/rust/issues/9378)?
   
   IMO we should follow up on this: for kernels we have been using a mutable buffer with null masks as much as possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org