You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/14 20:41:12 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #4068: Add DictionaryArray Constructors (#3879)

alamb commented on code in PR #4068:
URL: https://github.com/apache/arrow-rs/pull/4068#discussion_r1167256392


##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -237,33 +238,72 @@ impl<K: ArrowDictionaryKeyType> Clone for DictionaryArray<K> {
 impl<K: ArrowDictionaryKeyType> DictionaryArray<K> {
     /// Attempt to create a new DictionaryArray with a specified keys
     /// (indexes into the dictionary) and values (dictionary)
-    /// array. Returns an error if there are any keys that are outside
-    /// of the dictionary array.
+    /// array.
+    ///
+    /// # Panics
+    ///
+    /// Panics if [`Self::try_new`] returns an error
+    pub fn new(keys: PrimitiveArray<K>, values: ArrayRef) -> Self {
+        Self::try_new(keys, values).unwrap()
+    }
+
+    /// Attempt to create a new DictionaryArray with a specified keys
+    /// (indexes into the dictionary) and values (dictionary)
+    /// array.
+    ///
+    /// # Errors
+    ///
+    /// Returns an error if any `keys[i] >= values.len() || keys[i] < 0`
     pub fn try_new(
-        keys: &PrimitiveArray<K>,
-        values: &dyn Array,
+        keys: PrimitiveArray<K>,
+        values: ArrayRef,
     ) -> Result<Self, ArrowError> {
-        let dict_data_type = DataType::Dictionary(
+        let data_type = DataType::Dictionary(
             Box::new(keys.data_type().clone()),
             Box::new(values.data_type().clone()),
         );
 
-        // Note: This use the ArrayDataBuilder::build_unchecked and afterwards
-        // call the new function which only validates that the keys are in bounds.
-        let data = keys.to_data();
-        let builder = data
-            .into_builder()
-            .data_type(dict_data_type)
-            .add_child_data(values.to_data());
+        let zero = K::Native::usize_as(0);
+        let values_len = values.len();
 
-        // Safety: `validate` ensures key type is correct, and
-        //  `validate_values` ensures all offsets are within range
-        let array = unsafe { builder.build_unchecked() };
+        if let Some((idx, v)) = keys.values().iter().enumerate().find(|(idx, v)| {

Review Comment:
   Isn't this code now redundant with the validity checks in ArrayData?



##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -1057,23 +1096,23 @@ mod tests {
 
     #[test]
     #[should_panic(
-        expected = "Value at position 1 out of bounds: 3 (should be in [0, 1])"
+        expected = "Invalid dictionary key 3 at index 1, expected 0 <= key < 2"

Review Comment:
   that is certainly a nicer error message



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org