You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/16 14:04:36 UTC

[GitHub] [arrow-rs] alamb opened a new issue #1047: Add `Scalar` / `Datum` support to compute kernels

alamb opened a new issue #1047:
URL: https://github.com/apache/arrow-rs/issues/1047


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   When implementing analytics, users often want to do operations like `array` + `constant` or `array + array`
   
   The Rust implementation of Arrow is strongly typed 🙌  , including the [compute kernels](https://docs.rs/arrow/6.4.0/arrow/compute/kernels/index.html), but often data is passed around in `Arc<dyn Array>` / `ArrayRef` so the types can be dynamic. This then places the burden on the user of the library to determine the exact types of the inputs in order to call the appropriate kernels
   
   Let's take for example, trying to compare an array to find all elements that contain the number `5` and your input an array may have `Int64` or `Int32` values
   
   In the current version of arrow, in order to do so you would need to write code like
   
   ```rust
   use arrow::compute::eq_scalar;
   
   fn find_5(array: ArrayRef) -> Result<BooleanArray>{
     match array.data_type() {
       DataType::Int64 => eq_scalar(array.as_any().downcast_ref::<Int64Array>().unwrap(), 5),
       DataType::UInt64 => eq_scalar(array.as_any().downcast_ref::<UInt64Array>().unwrap(), 5),
       ...
     }
   }
   ```
   This ends up being macroized and is non ideal because the user had to do dynamic type dispatch anyways, so there is no runtime performance benefit to strongly typed kernels
   
   **Describe the solution you'd like**
   
   It would be nice to be able to call a single function and let the rust library dynamically (at runtime) pick the correct kernel to call.
   
   As described and suggested by @jorgecarleitao  and @nevi-me  in https://github.com/apache/arrow-rs/pull/984#issuecomment-994093090 this could go all the way and take the form of following the C++ `Datum` model
   
   ```rust
   fn eq(left: Datum, right: Datum) -> BooleanArray {
   ...
   }
   ```
   
   Where `Datum` looks something like
   
   ```rust
   enum Datum {
     Scalar(ScalarValue),
     Array(ArrayRef)
   }
   ```
   
   And `ScalarValue` which would look something like `ScalarValue` from DataFusion https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/scalar.rs or `Scalar` from arrow2 https://github.com/jorgecarleitao/arrow2/tree/main/src/scalar
   
   **Describe alternatives you've considered**
   The alternative is a continued proliferation of individual kernels such as `eq`, `eq_utf8`, etc
   
   **Additional context**
   There is a lot of additional discussion on https://github.com/apache/arrow-rs/pull/984
   
   cc @matthewmturner @Dandandan @Jimexist @houqp 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #1047: Add `Scalar` / `Datum` support to compute kernels

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1047:
URL: https://github.com/apache/arrow-rs/issues/1047#issuecomment-1001185712


   > but it would be good to still keep the specialized kernels around so users can still leverage them when type info is available at the call site.
   
   The reason I was saying the specialized kernels might not be totally useful was,  my mind, that  the overhead of a type dispatch could be almost entirely amortized -- like if we are comparing 1000 elements in an array, an extra check on the array's type might not be a huge deal
   
   I don't see anything wrong with leaving all the specialized kernels available too (maybe it would help inlining 🤔 ) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] houqp commented on issue #1047: Add `Scalar` / `Datum` support to compute kernels

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #1047:
URL: https://github.com/apache/arrow-rs/issues/1047#issuecomment-1000927046


   I support adding higher level easy to use API for non-performance critical code, but it would be good to still keep the specialized kernels around so users can still leverage them when type info is available at the call site.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org