You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jaroslaw Nowosad <ya...@gmail.com> on 2023/01/12 15:34:26 UTC

[RUST][Datafusion] SQL UDF in Datafusion

Hi all,

I had a task to investigate how to extend Datafusion to add UDFs written in
plain SQL.
Reason behind:  there is quite a big bunch of SQL UDF in existing java
(spark) solutions, however we are starting to move into the Rust ecosystem
and Datafussion/Arrow/Ballista looks like the proper way.

Question:
Could I get some points on how to extend DF to add "CREATE FUNCTION AAA
(p1:int, p2: int) RETURN INT AS '<sql style body here' "?

I saw some rewrite propositions, extending SQL parser with a new command or
creating separate parser/dialect.

Best Regards,
Jaro

Re: [RUST][Datafusion] SQL UDF in Datafusion

Posted by Andrew Lamb <al...@influxdata.com>.
Hi Jaro,

I do not think DataFusion currently supports creating UDFs in SQL (they
need to be implemented in Rust or the system using DataFusion).

I am not sure if Ballista contains support for them

Andrew

On Thu, Jan 12, 2023 at 4:47 PM Jeremy Dyer <jd...@gmail.com> wrote:

> Hey Jaro,
>
> While not written in Java, nor a UDF, there are some examples in [1]
> dask-sql (python based) where we do this to extend DataFusion for custom
> grammars, CREATE MODEL, for example. In a nutshell you want to write some
> Rust code that extends the DataFusion parser and then performs any binding
> logic required when your custom UDF statement is encountered. The
> processing chain is a little lengthy to follow but you can see where that
> starts [2] here. The `DaskParser` maintains a member which is the
> DataFusion parser itself. Happy to give more details just wanted to give
> you a place to start looking.
>
> Thanks,
> Jeremy Dyer
>
> [1] - https://github.com/dask-contrib/dask-sql
> [2] -
>
> https://github.com/dask-contrib/dask-sql/blob/main/dask_planner/src/parser.rs#L385
>
> On Thu, Jan 12, 2023 at 10:36 AM Jaroslaw Nowosad <ya...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > I had a task to investigate how to extend Datafusion to add UDFs written
> in
> > plain SQL.
> > Reason behind:  there is quite a big bunch of SQL UDF in existing java
> > (spark) solutions, however we are starting to move into the Rust
> ecosystem
> > and Datafussion/Arrow/Ballista looks like the proper way.
> >
> > Question:
> > Could I get some points on how to extend DF to add "CREATE FUNCTION AAA
> > (p1:int, p2: int) RETURN INT AS '<sql style body here' "?
> >
> > I saw some rewrite propositions, extending SQL parser with a new command
> or
> > creating separate parser/dialect.
> >
> > Best Regards,
> > Jaro
> >
>

Re: [RUST][Datafusion] SQL UDF in Datafusion

Posted by Jaroslaw Nowosad <ya...@gmail.com>.
Thanks Jeremy!

Yes, I need some time to dig in - ie: need to figure out how to divide my
problem into smaller tasks.
Thanks for DaskParser - this is exactly where I want to start ... probably
with some simple new sql statement.
I'd really appreciate any details ie: need to find out how to register
functions and retrieve them later.
If you have any more suggestions, thoughts - share please.

Thanks,
Jaro
yarenty@gmail.com



On Thu, Jan 12, 2023 at 3:46 PM Jeremy Dyer <jd...@gmail.com> wrote:

> Hey Jaro,
>
> While not written in Java, nor a UDF, there are some examples in [1]
> dask-sql (python based) where we do this to extend DataFusion for custom
> grammars, CREATE MODEL, for example. In a nutshell you want to write some
> Rust code that extends the DataFusion parser and then performs any binding
> logic required when your custom UDF statement is encountered. The
> processing chain is a little lengthy to follow but you can see where that
> starts [2] here. The `DaskParser` maintains a member which is the
> DataFusion parser itself. Happy to give more details just wanted to give
> you a place to start looking.
>
> Thanks,
> Jeremy Dyer
>
> [1] - https://github.com/dask-contrib/dask-sql
> [2] -
>
> https://github.com/dask-contrib/dask-sql/blob/main/dask_planner/src/parser.rs#L385
>
> On Thu, Jan 12, 2023 at 10:36 AM Jaroslaw Nowosad <ya...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > I had a task to investigate how to extend Datafusion to add UDFs written
> in
> > plain SQL.
> > Reason behind:  there is quite a big bunch of SQL UDF in existing java
> > (spark) solutions, however we are starting to move into the Rust
> ecosystem
> > and Datafussion/Arrow/Ballista looks like the proper way.
> >
> > Question:
> > Could I get some points on how to extend DF to add "CREATE FUNCTION AAA
> > (p1:int, p2: int) RETURN INT AS '<sql style body here' "?
> >
> > I saw some rewrite propositions, extending SQL parser with a new command
> or
> > creating separate parser/dialect.
> >
> > Best Regards,
> > Jaro
> >
>

Re: [RUST][Datafusion] SQL UDF in Datafusion

Posted by Jeremy Dyer <jd...@gmail.com>.
Hey Jaro,

While not written in Java, nor a UDF, there are some examples in [1]
dask-sql (python based) where we do this to extend DataFusion for custom
grammars, CREATE MODEL, for example. In a nutshell you want to write some
Rust code that extends the DataFusion parser and then performs any binding
logic required when your custom UDF statement is encountered. The
processing chain is a little lengthy to follow but you can see where that
starts [2] here. The `DaskParser` maintains a member which is the
DataFusion parser itself. Happy to give more details just wanted to give
you a place to start looking.

Thanks,
Jeremy Dyer

[1] - https://github.com/dask-contrib/dask-sql
[2] -
https://github.com/dask-contrib/dask-sql/blob/main/dask_planner/src/parser.rs#L385

On Thu, Jan 12, 2023 at 10:36 AM Jaroslaw Nowosad <ya...@gmail.com> wrote:

> Hi all,
>
> I had a task to investigate how to extend Datafusion to add UDFs written in
> plain SQL.
> Reason behind:  there is quite a big bunch of SQL UDF in existing java
> (spark) solutions, however we are starting to move into the Rust ecosystem
> and Datafussion/Arrow/Ballista looks like the proper way.
>
> Question:
> Could I get some points on how to extend DF to add "CREATE FUNCTION AAA
> (p1:int, p2: int) RETURN INT AS '<sql style body here' "?
>
> I saw some rewrite propositions, extending SQL parser with a new command or
> creating separate parser/dialect.
>
> Best Regards,
> Jaro
>