You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Kyle McCarthy (Jira)" <ji...@apache.org> on 2019/11/10 20:53:00 UTC

[jira] [Issue Comment Deleted] (ARROW-6947) [Rust] [DataFusion] Add support for scalar UDFs

     [ https://issues.apache.org/jira/browse/ARROW-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kyle McCarthy updated ARROW-6947:
---------------------------------
    Comment: was deleted

(was: I am curious to see if you have any ideas about how this would work. I have been working on a PoC, but will probably need to make some design decisions and would like to see if they align with yours.

At a high level, I see this working by composing a UDF with some general ScalarFunction type. Right now I have the ScalarFunction with type: 
{code:java}
Box<dyn Fn(Vec<ScalarValue>) -> Result<ScalarValue>{code}
so if a users defines a function such as
{code:java}
fn length(s: String) -> usize{code}
we would wrap that and return our ScalarFunction.

I think that the composed functions need to be associated with some "static" metadata, similar to the FunctionMeta in the logical plan. I think we would want to know the DataType of the arguments that the function expects and if they are optional, as well as the return type and if it is fallible/infallible.

If the UDF accepts and returns primitive rust types, generating that meta data should be pretty straight forward. However, if the UDF takes/returns ScalarValues the user would have to specifically provide the metadata.

We would be able to generate most of the data for the logical plan's FunctionMeta but would still need the function name and the field names for the args.

As of right now, I haven't done anything related to Aggregate UDFs or actually registering them with the ExecutionContext. )

> [Rust] [DataFusion] Add support for scalar UDFs
> -----------------------------------------------
>
>                 Key: ARROW-6947
>                 URL: https://issues.apache.org/jira/browse/ARROW-6947
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust, Rust - DataFusion
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>
> As a user, I would like to be able to define my own functions and then use them in SQL statements.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)