You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Rong Rong <wa...@gmail.com> on 2018/05/04 18:09:34 UTC

Support more intelligent function lookup in FunctionCatalog for UDF

Hi,

We have been looking into more intelligent UDF supports such as creating a
better type inference module to infer automatically composite data
types[1].

One most comment pain point we have are some use cases where users would
like to re-use a rather generic UDF, for example:

public List<String> eval(Map<String, ?> myMap) {

  return new ArrayList<>(myMap.keySet());
>
}
>

In this case, since we are only interested in the key sets of the map,
value type cannot be easily resolved or overrided using concrete types.
Eventually we end up overriding the exact same function with multiple case
classes, so that each one uses a different ValueTypeInfo.

This is rather inefficient in terms of user development cycle. I was
wondering if there's a better way in FunctionCatalog lookup to match a UDF
in context.

Best,
Rong

[1] https://issues.apache.org/jira/browse/FLINK-9294

Re: Support more intelligent function lookup in FunctionCatalog for UDF

Posted by Rong Rong <wa...@gmail.com>.

Thanks Fabian & Timo for the comments and suggestions.

I agree we should go step-by-step to enable these functionalities.
I will start to consider & fill in the implementation part and create
umbrella tickets.

Best,
Rong

On Tue, May 15, 2018 at 6:54 AM, Timo Walther <tw...@apache.org> wrote:

> I added some comments to your documents. I think we should work on these
> limitations step by step. A first step could be to support Map<String, ?>
> by considering only the raw types. Another step would be to allow
> eval(Object) as a wild card for operands.
>
> Regards,
> Timo
>
>
> Am 14.05.18 um 18:23 schrieb Rong Rong:
>
> Thanks for the reply Timo / Fabian,
>>
>> Yes that's what I had in mind. ParameterType can be vague but return type
>> has to be exact.
>> I can image that: depending on the input parameter type, the output type
>> can be different. But I cannot think of a concrete use cases as of now.
>>
>> I actually created a doc [1] regarding the use cases we currently have,
>> and
>> some very preliminary solution possibilities.
>>
>> Please kindly take a look when you have time, any comments and suggestions
>> are highly appreciated.
>>
>> --
>> Rong
>>
>> [1]
>> https://docs.google.com/document/d/1zKSY1z0lvtQdfOgwcLnCMSRH
>> ew3weeJ6QfQjSD0zWas/edit?usp=sharing
>>
>> On Mon, May 14, 2018 at 4:36 AM, Timo Walther <tw...@apache.org> wrote:
>>
>> Hi Rong,
>>>
>>> yes I think we can improve the type infererence at this point. Input
>>> parameter type inference can be more tolerant but return types should be
>>> as
>>> exact as possible.
>>>
>>> The change should only touch ScalarSqlFunction and
>>> UserDefinedFunctionUtils#createEvalOperandTypeInference, right?
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> Am 14.05.18 um 11:52 schrieb Fabian Hueske:
>>>
>>> Hi Rong,
>>>>
>>>> I didn't look into the details of the example that you provided, but I
>>>> think if we can improve the internal type resolution of scalar UDFs we
>>>> should definitely go for it.
>>>> There is quite a bit of information available such as the signatures of
>>>> the eval() methods but also the argument types provided by Calcite's
>>>> analyzer.
>>>> Not sure if we leverage all that information to the full extend.
>>>> The ScalarFunction interface also provides methods to override some of
>>>> the type extraction behavior.
>>>>
>>>> @Timo, what do you think?
>>>>
>>>> Best,
>>>> Fabian
>>>>
>>>>
>>>>
>>>>
>>>> 2018-05-04 20:09 GMT+02:00 Rong Rong <walterddr@gmail.com <mailto:
>>>> walterddr@gmail.com>>:
>>>>
>>>>      Hi,
>>>>
>>>>      We have been looking into more intelligent UDF supports such as
>>>>      creating a
>>>>      better type inference module to infer automatically composite data
>>>>      types[1].
>>>>
>>>>      One most comment pain point we have are some use cases where users
>>>>      would
>>>>      like to re-use a rather generic UDF, for example:
>>>>
>>>>      public List<String> eval(Map<String, ?> myMap) {
>>>>
>>>>        return new ArrayList<>(myMap.keySet());
>>>>      >
>>>>      }
>>>>      >
>>>>
>>>>      In this case, since we are only interested in the key sets of the
>>>> map,
>>>>      value type cannot be easily resolved or overrided using concrete
>>>>      types.
>>>>      Eventually we end up overriding the exact same function with
>>>>      multiple case
>>>>      classes, so that each one uses a different ValueTypeInfo.
>>>>
>>>>      This is rather inefficient in terms of user development cycle. I
>>>> was
>>>>      wondering if there's a better way in FunctionCatalog lookup to
>>>>      match a UDF
>>>>      in context.
>>>>
>>>>      Best,
>>>>      Rong
>>>>
>>>>      [1] https://issues.apache.org/jira/browse/FLINK-9294
>>>>      <https://issues.apache.org/jira/browse/FLINK-9294>
>>>>
>>>>
>>>>
>>>>
>

Re: Support more intelligent function lookup in FunctionCatalog for UDF

Posted by Timo Walther <tw...@apache.org>.

I added some comments to your documents. I think we should work on these 
limitations step by step. A first step could be to support Map<String, 
?> by considering only the raw types. Another step would be to allow 
eval(Object) as a wild card for operands.

Regards,
Timo


Am 14.05.18 um 18:23 schrieb Rong Rong:
> Thanks for the reply Timo / Fabian,
>
> Yes that's what I had in mind. ParameterType can be vague but return type
> has to be exact.
> I can image that: depending on the input parameter type, the output type
> can be different. But I cannot think of a concrete use cases as of now.
>
> I actually created a doc [1] regarding the use cases we currently have, and
> some very preliminary solution possibilities.
>
> Please kindly take a look when you have time, any comments and suggestions
> are highly appreciated.
>
> --
> Rong
>
> [1]
> https://docs.google.com/document/d/1zKSY1z0lvtQdfOgwcLnCMSRHew3weeJ6QfQjSD0zWas/edit?usp=sharing
>
> On Mon, May 14, 2018 at 4:36 AM, Timo Walther <tw...@apache.org> wrote:
>
>> Hi Rong,
>>
>> yes I think we can improve the type infererence at this point. Input
>> parameter type inference can be more tolerant but return types should be as
>> exact as possible.
>>
>> The change should only touch ScalarSqlFunction and
>> UserDefinedFunctionUtils#createEvalOperandTypeInference, right?
>>
>> Regards,
>> Timo
>>
>>
>> Am 14.05.18 um 11:52 schrieb Fabian Hueske:
>>
>>> Hi Rong,
>>>
>>> I didn't look into the details of the example that you provided, but I
>>> think if we can improve the internal type resolution of scalar UDFs we
>>> should definitely go for it.
>>> There is quite a bit of information available such as the signatures of
>>> the eval() methods but also the argument types provided by Calcite's
>>> analyzer.
>>> Not sure if we leverage all that information to the full extend.
>>> The ScalarFunction interface also provides methods to override some of
>>> the type extraction behavior.
>>>
>>> @Timo, what do you think?
>>>
>>> Best,
>>> Fabian
>>>
>>>
>>>
>>>
>>> 2018-05-04 20:09 GMT+02:00 Rong Rong <walterddr@gmail.com <mailto:
>>> walterddr@gmail.com>>:
>>>
>>>      Hi,
>>>
>>>      We have been looking into more intelligent UDF supports such as
>>>      creating a
>>>      better type inference module to infer automatically composite data
>>>      types[1].
>>>
>>>      One most comment pain point we have are some use cases where users
>>>      would
>>>      like to re-use a rather generic UDF, for example:
>>>
>>>      public List<String> eval(Map<String, ?> myMap) {
>>>
>>>        return new ArrayList<>(myMap.keySet());
>>>      >
>>>      }
>>>      >
>>>
>>>      In this case, since we are only interested in the key sets of the map,
>>>      value type cannot be easily resolved or overrided using concrete
>>>      types.
>>>      Eventually we end up overriding the exact same function with
>>>      multiple case
>>>      classes, so that each one uses a different ValueTypeInfo.
>>>
>>>      This is rather inefficient in terms of user development cycle. I was
>>>      wondering if there's a better way in FunctionCatalog lookup to
>>>      match a UDF
>>>      in context.
>>>
>>>      Best,
>>>      Rong
>>>
>>>      [1] https://issues.apache.org/jira/browse/FLINK-9294
>>>      <https://issues.apache.org/jira/browse/FLINK-9294>
>>>
>>>
>>>

Re: Support more intelligent function lookup in FunctionCatalog for UDF

Posted by Rong Rong <wa...@gmail.com>.

Thanks for the reply Timo / Fabian,

Yes that's what I had in mind. ParameterType can be vague but return type
has to be exact.
I can image that: depending on the input parameter type, the output type
can be different. But I cannot think of a concrete use cases as of now.

I actually created a doc [1] regarding the use cases we currently have, and
some very preliminary solution possibilities.

Please kindly take a look when you have time, any comments and suggestions
are highly appreciated.

--
Rong

[1]
https://docs.google.com/document/d/1zKSY1z0lvtQdfOgwcLnCMSRHew3weeJ6QfQjSD0zWas/edit?usp=sharing

On Mon, May 14, 2018 at 4:36 AM, Timo Walther <tw...@apache.org> wrote:

> Hi Rong,
>
> yes I think we can improve the type infererence at this point. Input
> parameter type inference can be more tolerant but return types should be as
> exact as possible.
>
> The change should only touch ScalarSqlFunction and
> UserDefinedFunctionUtils#createEvalOperandTypeInference, right?
>
> Regards,
> Timo
>
>
> Am 14.05.18 um 11:52 schrieb Fabian Hueske:
>
>> Hi Rong,
>>
>> I didn't look into the details of the example that you provided, but I
>> think if we can improve the internal type resolution of scalar UDFs we
>> should definitely go for it.
>> There is quite a bit of information available such as the signatures of
>> the eval() methods but also the argument types provided by Calcite's
>> analyzer.
>> Not sure if we leverage all that information to the full extend.
>> The ScalarFunction interface also provides methods to override some of
>> the type extraction behavior.
>>
>> @Timo, what do you think?
>>
>> Best,
>> Fabian
>>
>>
>>
>>
>> 2018-05-04 20:09 GMT+02:00 Rong Rong <walterddr@gmail.com <mailto:
>> walterddr@gmail.com>>:
>>
>>     Hi,
>>
>>     We have been looking into more intelligent UDF supports such as
>>     creating a
>>     better type inference module to infer automatically composite data
>>     types[1].
>>
>>     One most comment pain point we have are some use cases where users
>>     would
>>     like to re-use a rather generic UDF, for example:
>>
>>     public List<String> eval(Map<String, ?> myMap) {
>>
>>       return new ArrayList<>(myMap.keySet());
>>     >
>>     }
>>     >
>>
>>     In this case, since we are only interested in the key sets of the map,
>>     value type cannot be easily resolved or overrided using concrete
>>     types.
>>     Eventually we end up overriding the exact same function with
>>     multiple case
>>     classes, so that each one uses a different ValueTypeInfo.
>>
>>     This is rather inefficient in terms of user development cycle. I was
>>     wondering if there's a better way in FunctionCatalog lookup to
>>     match a UDF
>>     in context.
>>
>>     Best,
>>     Rong
>>
>>     [1] https://issues.apache.org/jira/browse/FLINK-9294
>>     <https://issues.apache.org/jira/browse/FLINK-9294>
>>
>>
>>
>

Re: Support more intelligent function lookup in FunctionCatalog for UDF

Posted by Timo Walther <tw...@apache.org>.

Hi Rong,

yes I think we can improve the type infererence at this point. Input 
parameter type inference can be more tolerant but return types should be 
as exact as possible.

The change should only touch ScalarSqlFunction and 
UserDefinedFunctionUtils#createEvalOperandTypeInference, right?

Regards,
Timo


Am 14.05.18 um 11:52 schrieb Fabian Hueske:
> Hi Rong,
>
> I didn't look into the details of the example that you provided, but I 
> think if we can improve the internal type resolution of scalar UDFs we 
> should definitely go for it.
> There is quite a bit of information available such as the signatures 
> of the eval() methods but also the argument types provided by 
> Calcite's analyzer.
> Not sure if we leverage all that information to the full extend.
> The ScalarFunction interface also provides methods to override some of 
> the type extraction behavior.
>
> @Timo, what do you think?
>
> Best,
> Fabian
>
>
>
>
> 2018-05-04 20:09 GMT+02:00 Rong Rong <walterddr@gmail.com 
> <ma...@gmail.com>>:
>
>     Hi,
>
>     We have been looking into more intelligent UDF supports such as
>     creating a
>     better type inference module to infer automatically composite data
>     types[1].
>
>     One most comment pain point we have are some use cases where users
>     would
>     like to re-use a rather generic UDF, for example:
>
>     public List<String> eval(Map<String, ?> myMap) {
>
>       return new ArrayList<>(myMap.keySet());
>     >
>     }
>     >
>
>     In this case, since we are only interested in the key sets of the map,
>     value type cannot be easily resolved or overrided using concrete
>     types.
>     Eventually we end up overriding the exact same function with
>     multiple case
>     classes, so that each one uses a different ValueTypeInfo.
>
>     This is rather inefficient in terms of user development cycle. I was
>     wondering if there's a better way in FunctionCatalog lookup to
>     match a UDF
>     in context.
>
>     Best,
>     Rong
>
>     [1] https://issues.apache.org/jira/browse/FLINK-9294
>     <https://issues.apache.org/jira/browse/FLINK-9294>
>
>

Re: Support more intelligent function lookup in FunctionCatalog for UDF

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Rong,

I didn't look into the details of the example that you provided, but I
think if we can improve the internal type resolution of scalar UDFs we
should definitely go for it.
There is quite a bit of information available such as the signatures of the
eval() methods but also the argument types provided by Calcite's analyzer.
Not sure if we leverage all that information to the full extend.
The ScalarFunction interface also provides methods to override some of the
type extraction behavior.

@Timo, what do you think?

Best,
Fabian




2018-05-04 20:09 GMT+02:00 Rong Rong <wa...@gmail.com>:

> Hi,
>
> We have been looking into more intelligent UDF supports such as creating a
> better type inference module to infer automatically composite data
> types[1].
>
> One most comment pain point we have are some use cases where users would
> like to re-use a rather generic UDF, for example:
>
> public List<String> eval(Map<String, ?> myMap) {
>
>   return new ArrayList<>(myMap.keySet());
> >
> }
> >
>
> In this case, since we are only interested in the key sets of the map,
> value type cannot be easily resolved or overrided using concrete types.
> Eventually we end up overriding the exact same function with multiple case
> classes, so that each one uses a different ValueTypeInfo.
>
> This is rather inefficient in terms of user development cycle. I was
> wondering if there's a better way in FunctionCatalog lookup to match a UDF
> in context.
>
> Best,
> Rong
>
> [1] https://issues.apache.org/jira/browse/FLINK-9294
>