You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jorge (Jira)" <ji...@apache.org> on 2020/02/08 06:59:00 UTC

[jira] [Comment Edited] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

    [ https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032835#comment-17032835 ] 

Jorge edited comment on ARROW-6947 at 2/8/20 6:58 AM:
------------------------------------------------------

[~andygrove], can I take a shoot at this?

My strategy to tackle this:
 # add support for unary udfs
 # support binary udfs
 # add macro to support arbitrary argument udfs (I do not see an easy way to implement variadic generics)

From what I gathered, we need to
 # Implement a new kernel (ala {{arrow::compute::kernels::arithmetic}}) to execute a closure
 # Implement a generic expression that implements {{PhysicalExpr}}
 # Implement a {{FunctionProvider}} and add it to {{ExecutionContext}} (like the attribute datasources)
 # bits and pieces for integration

The design I was planning for the expression:

 
{code:java}
pub struct UnaryFunctionExpr<T, R>
where
 T: ArrowPrimitiveType, // argument type
 R: ArrowPrimitiveType, // return type
{
 name: String,
/// name of the function
 func:
 Arc<dyn Fn(T::Native) -> arrow::error::Result<R::Native> + Sync + Send + 'static>,
 arg: Arc<dyn PhysicalExpr>,
 arg_type: DataType,
 return_type: DataType,
}{code}
 

for the kernel:
{code:java}
pub fn unary_op<T, R>(
 op: &dyn Fn(T::Native) -> Result<R::Native>,
 arg: &PrimitiveArray<T>,
) -> Result<PrimitiveArray<R>>{code}
 


was (Author: jorgecarleitao):
[~andygrove], can I take a shoot at this?

My strategy to tackle this:
 # add support for unary udfs
 # support binary udfs
 # add macro to support arbitrary argument udfs (I do not see an easy way to implement variadic generics)

From what I gathered, we need to
 # Implement a new kernel (ala {{arrow::compute::kernels::arithmetic}}) to execute a closure
 # Implement a generic expression that implements {{PhysicalExpr}}
 # Implement a {{FunctionProvider}} and add it to {{ExecutionContext}} (like the attribute datasources)
 # bits and pieces for integration (how the )

The design I was planning for the expression:

 
{code:java}
pub struct UnaryFunctionExpr<T, R>
where
 T: ArrowPrimitiveType, // argument type
 R: ArrowPrimitiveType, // return type
{
 name: String,
/// name of the function
 func:
 Arc<dyn Fn(T::Native) -> arrow::error::Result<R::Native> + Sync + Send + 'static>,
 arg: Arc<dyn PhysicalExpr>,
 arg_type: DataType,
 return_type: DataType,
}{code}
 

for the kernel:
{code:java}
pub fn unary_op<T, R>(
 op: &dyn Fn(T::Native) -> Result<R::Native>,
 arg: &PrimitiveArray<T>,
) -> Result<PrimitiveArray<R>>{code}
 

> [Rust] [DataFusion] Add support for scalar UDFs
> -----------------------------------------------
>
>                 Key: ARROW-6947
>                 URL: https://issues.apache.org/jira/browse/ARROW-6947
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust, Rust - DataFusion
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>
> As a user, I would like to be able to define my own functions and then use them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)