You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "yongda-fan (via GitHub)" <gi...@apache.org> on 2024/03/12 01:35:54 UTC

[I] Support Rust UDF [arrow-ballista]

yongda-fan opened a new issue, #993:
URL: https://github.com/apache/arrow-ballista/issues/993

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   
   Currently Ballista does not support rust UDF, which makes it hard to process data with using some custom function or with external libraries.
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   There are two possible Solutions
   ### Register the UDF directly
   We load rust dynamic library into the executor (similar to this PR: https://github.com/apache/arrow-datafusion/pull/1881, and we have partial code here https://github.com/apache/arrow-ballista/tree/main/ballista/core/src/plugin) and register the UDF directly to the DataFusion. 
   
   issues: 
   1. rust has never guaranteed a stable ABI (i.e. memory layout), therefor the fields in the UDF class in the plugin maybe interpreted incorrectly, e.g. `ColumnarValue` or `Signature`, etc. 
   2. in practice same rustc version + same optimization level gives the same ABI (i.e. memory layout for the class). this suggest the plugin must be complied with the exact same rustc and compiler flags.
   3. or alternatively we could use a stable api library such as `abi_stable` or `stabby` marks all UDF related class
   
   ### Reconstruct a UDF function a function with Arrow data as parameter and Arrow data as return type
   we can only load a function that use Arrow data as parameter and returns, since this is memory layout stable. we could pass the signature using a serialized string or something.
   
   similar to this: https://github.com/apache/arrow-datafusion-python/blob/main/src/udf.rs
   
   issues:
   1. we lost lots of flexibility provided by rust `ScalarUdfImpl`, such as change the signature by value, or provide specialized code path to use the `ColumnarValue::Scalar`
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   alternatively, we could build a custom ballista executor each time we want to add or modify a UDF and deploy it.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org