You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/06 03:58:24 UTC

[GitHub] [arrow-rs] sunchao commented on a diff in pull request #1990: Support DictionaryArray in unary kernel

sunchao commented on code in PR #1990:
URL: https://github.com/apache/arrow-rs/pull/1990#discussion_r914389732


##########
arrow/src/compute/kernels/arity.rs:
##########
@@ -78,10 +82,120 @@ where
     PrimitiveArray::<O>::from(data)
 }
 
+macro_rules! unary_dict_op {
+    ($array: expr, $op: expr, $value_ty: ty) => {{
+        // Safety justification: Since the inputs are valid Arrow arrays, all values are
+        // valid indexes into the dictionary (which is verified during construction)
+
+        let array_iter = unsafe {
+            $array
+                .values()
+                .as_any()
+                .downcast_ref::<$value_ty>()
+                .unwrap()
+                .take_iter_unchecked($array.keys_iter())
+        };
+
+        let values = array_iter.map(|v| v.map(|value| $op(value))).collect();
+
+        Ok(values)
+    }};
+}
+
+/// A helper function that applies an unary function to a dictionary array with primitive value type.
+fn unary_dict<K, F, T>(array: &DictionaryArray<K>, op: F) -> Result<PrimitiveArray<T>>

Review Comment:
   does this need to be public so it can be used by other mods like `arithmetic.rs`?



##########
arrow/src/compute/kernels/arity.rs:
##########
@@ -78,10 +82,120 @@ where
     PrimitiveArray::<O>::from(data)
 }
 
+macro_rules! unary_dict_op {
+    ($array: expr, $op: expr, $value_ty: ty) => {{
+        // Safety justification: Since the inputs are valid Arrow arrays, all values are
+        // valid indexes into the dictionary (which is verified during construction)
+
+        let array_iter = unsafe {
+            $array
+                .values()
+                .as_any()
+                .downcast_ref::<$value_ty>()
+                .unwrap()
+                .take_iter_unchecked($array.keys_iter())

Review Comment:
   Hmm, is it possible to directly apply the `op` on dictionary values? if values are large strings, the current approach will need to first decode the dictionary and convert it to a "plain" array, and then apply the `op` to each value in there, which is expensive.



##########
arrow/src/compute/kernels/arity.rs:
##########
@@ -78,10 +82,120 @@ where
     PrimitiveArray::<O>::from(data)
 }
 
+macro_rules! unary_dict_op {
+    ($array: expr, $op: expr, $value_ty: ty) => {{
+        // Safety justification: Since the inputs are valid Arrow arrays, all values are
+        // valid indexes into the dictionary (which is verified during construction)
+
+        let array_iter = unsafe {
+            $array
+                .values()
+                .as_any()
+                .downcast_ref::<$value_ty>()
+                .unwrap()
+                .take_iter_unchecked($array.keys_iter())
+        };
+
+        let values = array_iter.map(|v| v.map(|value| $op(value))).collect();
+
+        Ok(values)
+    }};
+}
+
+/// A helper function that applies an unary function to a dictionary array with primitive value type.
+fn unary_dict<K, F, T>(array: &DictionaryArray<K>, op: F) -> Result<PrimitiveArray<T>>
+where
+    K: ArrowNumericType,
+    T: ArrowPrimitiveType,
+    F: Fn(T::Native) -> T::Native,
+{
+    unary_dict_op!(array, op, PrimitiveArray<T>)

Review Comment:
   do we need this macro? I think we can just inline it:
   ```rust
       let array_iter = unsafe {
           array
               .values()
               .as_any()
               .downcast_ref::<PrimitiveArray<T>>()
               .unwrap()
               .take_iter_unchecked(array.keys_iter())
       };
   
       let values = array_iter.map(|v| v.map(|value| op(value))).collect();
   
       Ok(values)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org