You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Ian Bertolacci <ia...@workday.com.INVALID> on 2022/04/04 22:15:02 UTC

Vararg/Variadic UDFs or workarounds?

Howdy!
We’re trying to add a vararg/variadic UDF, but cannot seem to make it work out.

In our system, we define our UDFs to the Schema’s function multimap, and so have classes with methods which are provided to ScalarFunctionImpl.create(theClass, “methodName” ) to create the Function object, and from which the parser/validator draws the SQL function signature.
However, there doesn’t seem to be a way do define such a method where ScalarFunctionImpl infers the variadic function signature, but rather as accepting a single List parameter.

I see that CALCITE-2772 was raised to solve this but the work seems to have stalled in review.

Is there any plan to restart this work?
And in the mean time, is there a good way of defining variadic UDFs?

One solution we’ve come up with is to define a function with all (except the first) optional parameters so that it *looks* like a variadic function, but this is limited to 254 parameters by the JVM.
While 254 parameters is quite a lot, we predict that our users may exceed this limit.
Thanks!
-Ian J. Bertolacci


Re: Vararg/Variadic UDFs or workarounds?

Posted by Julian Hyde <jh...@gmail.com>.
PS I agree with Stamatis that ARRAY arguments are likely to be an effective workaround.

On the other hand I would love it if someone would study this problem and propose a solution consistent with the standard. 

Julian

> On Apr 7, 2022, at 06:34, Julian Hyde <jh...@gmail.com> wrote:
> 
> While reading the standard and implementing function resolution for built-in and user-defined functions I got the impression that there are very different resolution strategies for the two kinds of functions. 
> 
> There’s a “Chesterton fence” between the two, and anyone  proposing to remove or blur the distinction, or just dig in the vicinity as we are doing here with variadic UDFs, needs to demonstrate they understand why it’s there. 
> 
> It relates especially to parameter names and overloading:
> * UDFs are resolved by name alone, not by number or types of arguments
> * UDF parameters may have names and default values and arguments may be specified in different orders when the function is invoked
> * built in functions do not have named parameters and therefore arguments must be specified in a fixed order
> 
> See the work I did on made parameters, read the code, the tests. It’s also possible that recent standards work (see polymorphic table functions) has changed the landscape. 
> 
> Julian
> 
>> On Apr 6, 2022, at 23:16, Stamatis Zampetakis <za...@gmail.com> wrote:
>> 
>> Hi Ian,
>> 
>> From what I recall the work under CALCITE-2772 is an attempt to allow
>> introducing vararg UDFs in a more user friendly way.
>> 
>> Supporting vararg UDFs via Schema and Function interfaces is one way to go
>> although without CALCITE-2772 probably this is not possible.
>> 
>> Another way (and more powerful) would be to use the SqlOperator interface
>> (either directly or extend some existing implementation) and create your
>> own customised operator. Then you can plug-in your own operator table with
>> the custom UDFs in the validator [1].
>> Note that currently Calcite has some vararg functions/operators and the
>> first that comes to mind is CONCAT [2]. You may check the changes
>> introduced by CALCITE-4394 [3] to learn more about this and get inspiration
>> if you end-up going down the path of implementing your own SqlOperator.
>> 
>> Lastly, you could possibly avoid varargs UDF using a small trick that was
>> sufficient for me in some use-cases. Make the UDF accept a parameter of
>> type ARRAY and call it by wrapping the function arguments in an ARRAY
>> constructor. For instance, the queries would look like the following:
>> 
>> SELECT MY_CUSTOM_UDF(ARRAY['A','B','C'])
>> SELECT MY_CUSTOM_UDF(ARRAY['A','C'])
>> 
>> Best,
>> Stamatis
>> 
>> [1]
>> https://github.com/zabetak/calcite-tutorial/blob/31cce59c747e0b763a934109db6a6e7055f175ae/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneQueryProcessor.java#L166
>> [2]
>> https://github.com/apache/calcite/blob/a81cfb2ad001589929e190939cf4db928ebac386/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L491
>> [3] https://issues.apache.org/jira/browse/CALCITE-4394
>> 
>>> On Tue, Apr 5, 2022 at 12:15 AM Ian Bertolacci
>>> <ia...@workday.com.invalid> wrote:
>>> 
>>> Howdy!
>>> We’re trying to add a vararg/variadic UDF, but cannot seem to make it work
>>> out.
>>> 
>>> In our system, we define our UDFs to the Schema’s function multimap, and
>>> so have classes with methods which are provided to
>>> ScalarFunctionImpl.create(theClass, “methodName” ) to create the Function
>>> object, and from which the parser/validator draws the SQL function
>>> signature.
>>> However, there doesn’t seem to be a way do define such a method where
>>> ScalarFunctionImpl infers the variadic function signature, but rather as
>>> accepting a single List parameter.
>>> 
>>> I see that CALCITE-2772 was raised to solve this but the work seems to
>>> have stalled in review.
>>> 
>>> Is there any plan to restart this work?
>>> And in the mean time, is there a good way of defining variadic UDFs?
>>> 
>>> One solution we’ve come up with is to define a function with all (except
>>> the first) optional parameters so that it *looks* like a variadic function,
>>> but this is limited to 254 parameters by the JVM.
>>> While 254 parameters is quite a lot, we predict that our users may exceed
>>> this limit.
>>> Thanks!
>>> -Ian J. Bertolacci
>>> 
>>> 

Re: Vararg/Variadic UDFs or workarounds?

Posted by Julian Hyde <jh...@gmail.com>.
While reading the standard and implementing function resolution for built-in and user-defined functions I got the impression that there are very different resolution strategies for the two kinds of functions. 

There’s a “Chesterton fence” between the two, and anyone  proposing to remove or blur the distinction, or just dig in the vicinity as we are doing here with variadic UDFs, needs to demonstrate they understand why it’s there. 

It relates especially to parameter names and overloading:
 * UDFs are resolved by name alone, not by number or types of arguments
 * UDF parameters may have names and default values and arguments may be specified in different orders when the function is invoked
 * built in functions do not have named parameters and therefore arguments must be specified in a fixed order

See the work I did on made parameters, read the code, the tests. It’s also possible that recent standards work (see polymorphic table functions) has changed the landscape. 

Julian

> On Apr 6, 2022, at 23:16, Stamatis Zampetakis <za...@gmail.com> wrote:
> 
> Hi Ian,
> 
> From what I recall the work under CALCITE-2772 is an attempt to allow
> introducing vararg UDFs in a more user friendly way.
> 
> Supporting vararg UDFs via Schema and Function interfaces is one way to go
> although without CALCITE-2772 probably this is not possible.
> 
> Another way (and more powerful) would be to use the SqlOperator interface
> (either directly or extend some existing implementation) and create your
> own customised operator. Then you can plug-in your own operator table with
> the custom UDFs in the validator [1].
> Note that currently Calcite has some vararg functions/operators and the
> first that comes to mind is CONCAT [2]. You may check the changes
> introduced by CALCITE-4394 [3] to learn more about this and get inspiration
> if you end-up going down the path of implementing your own SqlOperator.
> 
> Lastly, you could possibly avoid varargs UDF using a small trick that was
> sufficient for me in some use-cases. Make the UDF accept a parameter of
> type ARRAY and call it by wrapping the function arguments in an ARRAY
> constructor. For instance, the queries would look like the following:
> 
> SELECT MY_CUSTOM_UDF(ARRAY['A','B','C'])
> SELECT MY_CUSTOM_UDF(ARRAY['A','C'])
> 
> Best,
> Stamatis
> 
> [1]
> https://github.com/zabetak/calcite-tutorial/blob/31cce59c747e0b763a934109db6a6e7055f175ae/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneQueryProcessor.java#L166
> [2]
> https://github.com/apache/calcite/blob/a81cfb2ad001589929e190939cf4db928ebac386/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L491
> [3] https://issues.apache.org/jira/browse/CALCITE-4394
> 
>> On Tue, Apr 5, 2022 at 12:15 AM Ian Bertolacci
>> <ia...@workday.com.invalid> wrote:
>> 
>> Howdy!
>> We’re trying to add a vararg/variadic UDF, but cannot seem to make it work
>> out.
>> 
>> In our system, we define our UDFs to the Schema’s function multimap, and
>> so have classes with methods which are provided to
>> ScalarFunctionImpl.create(theClass, “methodName” ) to create the Function
>> object, and from which the parser/validator draws the SQL function
>> signature.
>> However, there doesn’t seem to be a way do define such a method where
>> ScalarFunctionImpl infers the variadic function signature, but rather as
>> accepting a single List parameter.
>> 
>> I see that CALCITE-2772 was raised to solve this but the work seems to
>> have stalled in review.
>> 
>> Is there any plan to restart this work?
>> And in the mean time, is there a good way of defining variadic UDFs?
>> 
>> One solution we’ve come up with is to define a function with all (except
>> the first) optional parameters so that it *looks* like a variadic function,
>> but this is limited to 254 parameters by the JVM.
>> While 254 parameters is quite a lot, we predict that our users may exceed
>> this limit.
>> Thanks!
>> -Ian J. Bertolacci
>> 
>> 

Re: Vararg/Variadic UDFs or workarounds?

Posted by Ian Bertolacci <ia...@workday.com.INVALID>.
Hello,
So I’ve been trying to make this work for a while now without a lot of success.
I’ve been able to define our operator in an operator table and have success validating queries with it.

However we extend Avatica’s JDBCMeta to do nearly all the heavy lifting.
The problem being that I have absolutely no idea how to get our operator table down into the validator that CalcitePrepareImpl creates.

What would seem to be the “standard” way of adding a separate library of operators would be to add a new SqlLibrary operator table and set our CalciteConfigurationProperties with it, but that requires modifications to the Calcite code base, and we’re doing just about all we can to not maintain our own fork of Calcite.

Any suggestions?
-Ian J. Bertolacci

From: Ian Bertolacci <ia...@workday.com.INVALID>
Date: Friday, April 8, 2022 at 10:04 AM
To: dev@calcite.apache.org <de...@calcite.apache.org>
Subject: [External Sender] Re: Vararg/Variadic UDFs or workarounds?
This is excellent thanks you so much!.
-Ian

From: Stamatis Zampetakis <za...@gmail.com>
Date: Wednesday, April 6, 2022 at 2:16 PM
To: dev@calcite.apache.org <de...@calcite.apache.org>
Subject: [External Sender] Re: Vararg/Variadic UDFs or workarounds?
Hi Ian,

From what I recall the work under CALCITE-2772 is an attempt to allow
introducing vararg UDFs in a more user friendly way.

Supporting vararg UDFs via Schema and Function interfaces is one way to go
although without CALCITE-2772 probably this is not possible.

Another way (and more powerful) would be to use the SqlOperator interface
(either directly or extend some existing implementation) and create your
own customised operator. Then you can plug-in your own operator table with
the custom UDFs in the validator [1].
Note that currently Calcite has some vararg functions/operators and the
first that comes to mind is CONCAT [2]. You may check the changes
introduced by CALCITE-4394 [3] to learn more about this and get inspiration
if you end-up going down the path of implementing your own SqlOperator.

Lastly, you could possibly avoid varargs UDF using a small trick that was
sufficient for me in some use-cases. Make the UDF accept a parameter of
type ARRAY and call it by wrapping the function arguments in an ARRAY
constructor. For instance, the queries would look like the following:

SELECT MY_CUSTOM_UDF(ARRAY['A','B','C'])
SELECT MY_CUSTOM_UDF(ARRAY['A','C'])

Best,
Stamatis

[1]
https://github.com/zabetak/calcite-tutorial/blob/31cce59c747e0b763a934109db6a6e7055f175ae/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneQueryProcessor.java#L166
[2]
https://github.com/apache/calcite/blob/a81cfb2ad001589929e190939cf4db928ebac386/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L491
[3] https://issues.apache.org/jira/browse/CALCITE-4394

On Tue, Apr 5, 2022 at 12:15 AM Ian Bertolacci
<ia...@workday.com.invalid> wrote:

> Howdy!
> We’re trying to add a vararg/variadic UDF, but cannot seem to make it work
> out.
>
> In our system, we define our UDFs to the Schema’s function multimap, and
> so have classes with methods which are provided to
> ScalarFunctionImpl.create(theClass, “methodName” ) to create the Function
> object, and from which the parser/validator draws the SQL function
> signature.
> However, there doesn’t seem to be a way do define such a method where
> ScalarFunctionImpl infers the variadic function signature, but rather as
> accepting a single List parameter.
>
> I see that CALCITE-2772 was raised to solve this but the work seems to
> have stalled in review.
>
> Is there any plan to restart this work?
> And in the mean time, is there a good way of defining variadic UDFs?
>
> One solution we’ve come up with is to define a function with all (except
> the first) optional parameters so that it *looks* like a variadic function,
> but this is limited to 254 parameters by the JVM.
> While 254 parameters is quite a lot, we predict that our users may exceed
> this limit.
> Thanks!
> -Ian J. Bertolacci
>
>

Re: Vararg/Variadic UDFs or workarounds?

Posted by Ian Bertolacci <ia...@workday.com.INVALID>.
This is excellent thanks you so much!.
-Ian

From: Stamatis Zampetakis <za...@gmail.com>
Date: Wednesday, April 6, 2022 at 2:16 PM
To: dev@calcite.apache.org <de...@calcite.apache.org>
Subject: [External Sender] Re: Vararg/Variadic UDFs or workarounds?
Hi Ian,

From what I recall the work under CALCITE-2772 is an attempt to allow
introducing vararg UDFs in a more user friendly way.

Supporting vararg UDFs via Schema and Function interfaces is one way to go
although without CALCITE-2772 probably this is not possible.

Another way (and more powerful) would be to use the SqlOperator interface
(either directly or extend some existing implementation) and create your
own customised operator. Then you can plug-in your own operator table with
the custom UDFs in the validator [1].
Note that currently Calcite has some vararg functions/operators and the
first that comes to mind is CONCAT [2]. You may check the changes
introduced by CALCITE-4394 [3] to learn more about this and get inspiration
if you end-up going down the path of implementing your own SqlOperator.

Lastly, you could possibly avoid varargs UDF using a small trick that was
sufficient for me in some use-cases. Make the UDF accept a parameter of
type ARRAY and call it by wrapping the function arguments in an ARRAY
constructor. For instance, the queries would look like the following:

SELECT MY_CUSTOM_UDF(ARRAY['A','B','C'])
SELECT MY_CUSTOM_UDF(ARRAY['A','C'])

Best,
Stamatis

[1]
https://github.com/zabetak/calcite-tutorial/blob/31cce59c747e0b763a934109db6a6e7055f175ae/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneQueryProcessor.java#L166
[2]
https://github.com/apache/calcite/blob/a81cfb2ad001589929e190939cf4db928ebac386/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L491
[3] https://issues.apache.org/jira/browse/CALCITE-4394

On Tue, Apr 5, 2022 at 12:15 AM Ian Bertolacci
<ia...@workday.com.invalid> wrote:

> Howdy!
> We’re trying to add a vararg/variadic UDF, but cannot seem to make it work
> out.
>
> In our system, we define our UDFs to the Schema’s function multimap, and
> so have classes with methods which are provided to
> ScalarFunctionImpl.create(theClass, “methodName” ) to create the Function
> object, and from which the parser/validator draws the SQL function
> signature.
> However, there doesn’t seem to be a way do define such a method where
> ScalarFunctionImpl infers the variadic function signature, but rather as
> accepting a single List parameter.
>
> I see that CALCITE-2772 was raised to solve this but the work seems to
> have stalled in review.
>
> Is there any plan to restart this work?
> And in the mean time, is there a good way of defining variadic UDFs?
>
> One solution we’ve come up with is to define a function with all (except
> the first) optional parameters so that it *looks* like a variadic function,
> but this is limited to 254 parameters by the JVM.
> While 254 parameters is quite a lot, we predict that our users may exceed
> this limit.
> Thanks!
> -Ian J. Bertolacci
>
>

Re: Vararg/Variadic UDFs or workarounds?

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Ian,

From what I recall the work under CALCITE-2772 is an attempt to allow
introducing vararg UDFs in a more user friendly way.

Supporting vararg UDFs via Schema and Function interfaces is one way to go
although without CALCITE-2772 probably this is not possible.

Another way (and more powerful) would be to use the SqlOperator interface
(either directly or extend some existing implementation) and create your
own customised operator. Then you can plug-in your own operator table with
the custom UDFs in the validator [1].
Note that currently Calcite has some vararg functions/operators and the
first that comes to mind is CONCAT [2]. You may check the changes
introduced by CALCITE-4394 [3] to learn more about this and get inspiration
if you end-up going down the path of implementing your own SqlOperator.

Lastly, you could possibly avoid varargs UDF using a small trick that was
sufficient for me in some use-cases. Make the UDF accept a parameter of
type ARRAY and call it by wrapping the function arguments in an ARRAY
constructor. For instance, the queries would look like the following:

SELECT MY_CUSTOM_UDF(ARRAY['A','B','C'])
SELECT MY_CUSTOM_UDF(ARRAY['A','C'])

Best,
Stamatis

[1]
https://github.com/zabetak/calcite-tutorial/blob/31cce59c747e0b763a934109db6a6e7055f175ae/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneQueryProcessor.java#L166
[2]
https://github.com/apache/calcite/blob/a81cfb2ad001589929e190939cf4db928ebac386/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L491
[3] https://issues.apache.org/jira/browse/CALCITE-4394

On Tue, Apr 5, 2022 at 12:15 AM Ian Bertolacci
<ia...@workday.com.invalid> wrote:

> Howdy!
> We’re trying to add a vararg/variadic UDF, but cannot seem to make it work
> out.
>
> In our system, we define our UDFs to the Schema’s function multimap, and
> so have classes with methods which are provided to
> ScalarFunctionImpl.create(theClass, “methodName” ) to create the Function
> object, and from which the parser/validator draws the SQL function
> signature.
> However, there doesn’t seem to be a way do define such a method where
> ScalarFunctionImpl infers the variadic function signature, but rather as
> accepting a single List parameter.
>
> I see that CALCITE-2772 was raised to solve this but the work seems to
> have stalled in review.
>
> Is there any plan to restart this work?
> And in the mean time, is there a good way of defining variadic UDFs?
>
> One solution we’ve come up with is to define a function with all (except
> the first) optional parameters so that it *looks* like a variadic function,
> but this is limited to 254 parameters by the JVM.
> While 254 parameters is quite a lot, we predict that our users may exceed
> this limit.
> Thanks!
> -Ian J. Bertolacci
>
>