You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/18 10:03:06 UTC

[GitHub] [arrow] alamb commented on a change in pull request #10073: ARROW-12426: [Rust] Fix concatentation of arrow dictionaries

alamb commented on a change in pull request #10073:
URL: https://github.com/apache/arrow/pull/10073#discussion_r615374353



##########
File path: rust/arrow/src/compute/kernels/concat.rs
##########
@@ -384,4 +384,51 @@ mod tests {
 
         Ok(())
     }
+
+    fn collect_string_dictionary(
+        dictionary: &DictionaryArray<Int32Type>,
+    ) -> Vec<Option<String>> {
+        let values = dictionary.values();
+        let values = values.as_any().downcast_ref::<StringArray>().unwrap();
+
+        (0..dictionary.len())
+            .map(move |i| match dictionary.keys().is_valid(i) {
+                true => {
+                    let key = dictionary.keys().value(i);
+                    Some(values.value(key as _).to_string())
+                }
+                false => None,
+            })
+            .collect()
+    }
+
+    #[test]
+    fn test_string_dictionary_array() -> Result<()> {
+        let input_1: DictionaryArray<Int32Type> =
+            vec!["hello", "A", "B", "hello", "hello", "C"]
+                .into_iter()
+                .collect();
+        let input_2: DictionaryArray<Int32Type> =
+            vec!["hello", "E", "E", "hello", "F", "E"]
+                .into_iter()
+                .collect();
+
+        let concat = concat(&[&input_1 as _, &input_2 as _]).unwrap();
+        let concat = concat
+            .as_any()
+            .downcast_ref::<DictionaryArray<Int32Type>>()
+            .unwrap();
+
+        let concat_collected = collect_string_dictionary(concat);
+        let input_1_collected = collect_string_dictionary(&input_1);
+        let input_2_collected = collect_string_dictionary(&input_2);
+        let expected: Vec<_> = input_1_collected
+            .into_iter()
+            .chain(input_2_collected.into_iter())
+            .collect();
+
+        assert_eq!(concat_collected, expected);
+
+        Ok(())

Review comment:
       I suggest adding at least one Null value to the dictionary to test the nulls case

##########
File path: rust/arrow/src/compute/kernels/concat.rs
##########
@@ -384,4 +384,51 @@ mod tests {
 
         Ok(())
     }
+
+    fn collect_string_dictionary(
+        dictionary: &DictionaryArray<Int32Type>,
+    ) -> Vec<Option<String>> {
+        let values = dictionary.values();
+        let values = values.as_any().downcast_ref::<StringArray>().unwrap();
+
+        (0..dictionary.len())
+            .map(move |i| match dictionary.keys().is_valid(i) {
+                true => {
+                    let key = dictionary.keys().value(i);
+                    Some(values.value(key as _).to_string())
+                }
+                false => None,
+            })
+            .collect()
+    }
+
+    #[test]
+    fn test_string_dictionary_array() -> Result<()> {
+        let input_1: DictionaryArray<Int32Type> =
+            vec!["hello", "A", "B", "hello", "hello", "C"]
+                .into_iter()
+                .collect();
+        let input_2: DictionaryArray<Int32Type> =
+            vec!["hello", "E", "E", "hello", "F", "E"]
+                .into_iter()
+                .collect();
+
+        let concat = concat(&[&input_1 as _, &input_2 as _]).unwrap();
+        let concat = concat
+            .as_any()
+            .downcast_ref::<DictionaryArray<Int32Type>>()
+            .unwrap();
+

Review comment:
       I recommend testing the actual values from the dictionary array (aka test the content is as expected) as well here explicitly -- something like
   
   ```
   
           let expected_str = vec![
               "hello", "A", "B", "hello", "hello", "C",
               "hello", "E", "E", "hello", "F", "E",
           ];
           let values = concat.values();
           let values = values
               .as_any()
               .downcast_ref::<StringArray>()
               .unwrap();
   
           let concat_str = concat.keys().iter().map(|key| {
               let key = key.unwrap();
               values.value(key as usize)
           })
               .collect::<Vec<&str>>();
           assert_eq!(expected_str, concat_str);
   ```
   
   (tested and it passes for me locally)

##########
File path: rust/arrow/src/compute/kernels/concat.rs
##########
@@ -384,4 +384,51 @@ mod tests {
 
         Ok(())
     }
+
+    fn collect_string_dictionary(
+        dictionary: &DictionaryArray<Int32Type>,
+    ) -> Vec<Option<String>> {
+        let values = dictionary.values();
+        let values = values.as_any().downcast_ref::<StringArray>().unwrap();
+
+        (0..dictionary.len())
+            .map(move |i| match dictionary.keys().is_valid(i) {
+                true => {
+                    let key = dictionary.keys().value(i);
+                    Some(values.value(key as _).to_string())
+                }
+                false => None,
+            })
+            .collect()

Review comment:
       I think you can probably iterate over the keys directly if you wanted: 
   
   ```suggestion
           dictionary.keys()
               .iter()
               .map(|key| {
                   key.map(|key| values.value(key as _).to_string())
               })
               .collect()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org