You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Darshan Singh <da...@gmail.com> on 2018/03/30 20:07:56 UTC

how to query the output of the scalar table function

Hi,

I am not able to find what is best way to query the output of a scalar
table function.

Suppose I have table which has column col1 which is string.

I have a scalar function and returns a POJO
{col1V1 String, col1V2 String , col1V3 String}.


I am using following.

so table.select("sf(col1) as sfc")
.select("sfc.get('col1V1') as v1, sfc.get('col1V2') as v2 ,
sfc.get('col1V13') as v3 ")

It is working fine but strangely enough it calls the sf 3 times(each per
get) for same row. However, If I use table function it calls just once.

So to me it seems that scalar function is very expensive so if I have 10
columns and the computation is expensive it will do 10 times. Thats why I
thought maybe there is a better way to return pojo from scalar function
rather than what I have been using.

If this is the best way then I wonder if scalar functions should be used to
return single value only.

Thanks

Re: how to query the output of the scalar table function

Posted by Darshan Singh <da...@gmail.com>.

Thanks Fabian

We are going to replace all scalar functions with the table functions.

Thanks

On Wed, Apr 4, 2018 at 12:16 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Darshan,
>
> What you observe is the result of what's supposed to be an optimization.
> By fusing the two select() calls, we reduce the number of operators in the
> resulting plan (one MapFunction less).
> This optimization is only applied for ScalarFunctions but not for
> TableFunctions.
> With a better cost-modeling that increases the cost of user-defined
> ScalarFunctions it should be possible to "convince" the optimizer to not
> fuse the operators.
>
> For now, you could either use the TableFunction approach or convert the
> result of the first select() into a DataStream (or DataSet depending on
> your setup) and register that again as a table.
> That would split the plan into two parts which are independently optimized
> and hence the select() operators would not be merged.
>
> Best, Fabian
>
>
>
> 2018-03-30 22:07 GMT+02:00 Darshan Singh <da...@gmail.com>:
>
>> Hi,
>>
>> I am not able to find what is best way to query the output of a scalar
>> table function.
>>
>> Suppose I have table which has column col1 which is string.
>>
>> I have a scalar function and returns a POJO
>> {col1V1 String, col1V2 String , col1V3 String}.
>>
>>
>> I am using following.
>>
>> so table.select("sf(col1) as sfc")
>> .select("sfc.get('col1V1') as v1, sfc.get('col1V2') as v2 ,
>> sfc.get('col1V13') as v3 ")
>>
>> It is working fine but strangely enough it calls the sf 3 times(each per
>> get) for same row. However, If I use table function it calls just once.
>>
>> So to me it seems that scalar function is very expensive so if I have 10
>> columns and the computation is expensive it will do 10 times. Thats why I
>> thought maybe there is a better way to return pojo from scalar function
>> rather than what I have been using.
>>
>> If this is the best way then I wonder if scalar functions should be used
>> to return single value only.
>>
>> Thanks
>>
>
>

Re: how to query the output of the scalar table function

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Darshan,

What you observe is the result of what's supposed to be an optimization. By
fusing the two select() calls, we reduce the number of operators in the
resulting plan (one MapFunction less).
This optimization is only applied for ScalarFunctions but not for
TableFunctions.
With a better cost-modeling that increases the cost of user-defined
ScalarFunctions it should be possible to "convince" the optimizer to not
fuse the operators.

For now, you could either use the TableFunction approach or convert the
result of the first select() into a DataStream (or DataSet depending on
your setup) and register that again as a table.
That would split the plan into two parts which are independently optimized
and hence the select() operators would not be merged.

Best, Fabian



2018-03-30 22:07 GMT+02:00 Darshan Singh <da...@gmail.com>:

> Hi,
>
> I am not able to find what is best way to query the output of a scalar
> table function.
>
> Suppose I have table which has column col1 which is string.
>
> I have a scalar function and returns a POJO
> {col1V1 String, col1V2 String , col1V3 String}.
>
>
> I am using following.
>
> so table.select("sf(col1) as sfc")
> .select("sfc.get('col1V1') as v1, sfc.get('col1V2') as v2 ,
> sfc.get('col1V13') as v3 ")
>
> It is working fine but strangely enough it calls the sf 3 times(each per
> get) for same row. However, If I use table function it calls just once.
>
> So to me it seems that scalar function is very expensive so if I have 10
> columns and the computation is expensive it will do 10 times. Thats why I
> thought maybe there is a better way to return pojo from scalar function
> rather than what I have been using.
>
> If this is the best way then I wonder if scalar functions should be used
> to return single value only.
>
> Thanks
>