You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "winifred.wenhui.tang@gmail.com" <wi...@gmail.com> on 2018/12/03 08:13:16 UTC

[DISCUSS] Support Higher-order functions in Flink sql

Hello all，

Spark 2.4.0 was released last month. I noticed that Spark 2.4 
“Add a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.”[1]
I wonder if it's necessary for Flink to add higher-order functions to enhance it's ability.

By the way, I found that if we wants to enhance the functionality of Flink sql, we often need to modify Calcite. It may be a little inconvenient，so may be we can extend Calcite core parser in Flink to deal with some non-standard SQL syntax, as mentioned in Flink SQL DDL Design[2].

Look forward to your feedback.

Best,
Wen-hui Tang

[1] https://issues.apache.org/jira/browse/SPARK-23899
[2] https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#



Winifred-wenhui Tang

Re: [DISCUSS] Support Higher-order functions in Flink sql

Posted by Timo Walther <tw...@apache.org>.

Hi everyone,

thanks for starting the discussion. In general, I like the idea of 
making Flink SQL queries more concise.

However, I don't like to diverge from standard SQL. So far, we managed 
to add a lot of operators and functionality while being standard 
compliant. Personally, I don't see a good reason for forking the Calcite 
parser just for little helper functions that could also be expressed as 
subqueries.

Instead, we could think about some user-defined functions. Instead of:

TRANSFORM(arrays, element -> element + 1)

we could do:

TRANSFORM(arrays, "element -> element + 1")

The second argument could either be SQL or some more domain-specific 
standard language.

Similar efforts have been done for querying JSON data in the new SQL 
JSON standard [1] (they are using XQuery or XPath syntax).

Just some ideas from my side.

Regards,
Timo

[1] 
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/adjsn/query-json-data.html#GUID-119E5069-77F2-45DC-B6F0-A1B312945590


Am 05.12.18 um 09:54 schrieb TANG Wen-hui:
> Hi XueFu, Jark,
>   
> Thanks for your feedback. That's really helpful.
> Since Flink has already supported some complex types like MAP and ARRAY,
> it would be possible to add some higher-order functions to deal with MAP and ARRAY, like Presto[1,2] and Spark have done.
> As for "syntax for the lambda function ", I have started a discussion in Calcite's mail list to look forward some feedbacks.
> I am willing to follow up the topic and come up with a design doc later.
>   
> Best,
> Wen-hui
>
>
>
> winifred.wenhui.tang@gmail.com
>   
> From: Jark Wu
> Date: 2018-12-05 10:27
> To: dev; xuefu.z
> Subject: Re: [DISCUSS] Support Higher-order functions in Flink sql
> Hi Wenhui,
>   
> This is a meaningful direction to improve the functionality for Flink SQL.
> As Xuefu suggested, you can come up with a design doc covering the
> functions you'd like to support and the improvements.
> IMO, the main obstacle might be the syntax for the lambda function which is
> not supported in Calcite currently, such as: "TRANSFORM(arrays, element ->
> element + 1)". In order to support this syntax,
> we might need to discuss it in Calcite community. It is not like DDL
> parser, the DDL parser is easy to extend in a plugin way which is Calcite
> suggests.
>   
> It would be great if you can share more thoughts or works on this.
>   
> Best,
> Jark
>   
> On Mon, 3 Dec 2018 at 17:20, Zhang, Xuefu <xu...@alibaba-inc.com> wrote:
>   
>> Hi Wenhui,
>>
>> Thanks for bringing the topics up. Both make sense to me. For higher-order
>> functions, I'd suggest you come up with a list of things you'd like to add.
>> Overall, Flink SQL is weak in handling complex types. Ideally we should
>> have a doc covering the gaps and provide a roadmap for enhancement. It
>> would be great if you can broaden the topic a bit.
>>
>> Thanks,
>> Xuefu
>>
>>
>> ------------------------------------------------------------------
>> Sender:winifred.wenhui.tang@gmail.com <wi...@gmail.com>
>> Sent at:2018 Dec 3 (Mon) 16:13
>> Recipient:dev <de...@flink.apache.org>
>> Subject:[DISCUSS] Support Higher-order functions in Flink sql
>>
>> Hello all，
>>
>> Spark 2.4.0 was released last month. I noticed that Spark 2.4
>> “Add a lot of new built-in functions, including higher-order functions, to
>> deal with complex data types easier.”[1]
>> I wonder if it's necessary for Flink to add higher-order functions to
>> enhance it's ability.
>>
>> By the way, I found that if we wants to enhance the functionality of Flink
>> sql, we often need to modify Calcite. It may be a little inconvenient，so
>> may be we can extend Calcite core parser in Flink to deal with some
>> non-standard SQL syntax, as mentioned in Flink SQL DDL Design[2].
>>
>> Look forward to your feedback.
>>
>> Best,
>> Wen-hui Tang
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-23899
>> [2]
>> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#
>>
>>
>>
>> Winifred-wenhui Tang
>>
>>

Re: Re: [DISCUSS] Support Higher-order functions in Flink sql

Posted by TANG Wen-hui <wi...@gmail.com>.

Hi XueFu, Jark,

Thanks for your feedback. That's really helpful.
Since Flink has already supported some complex types like MAP and ARRAY,
it would be possible to add some higher-order functions to deal with MAP and ARRAY, like Presto[1,2] and Spark have done. 
As for "syntax for the lambda function ", I have started a discussion in Calcite's mail list to look forward some feedbacks. 
I am willing to follow up the topic and come up with a design doc later.

Best,
Wen-hui

winifred.wenhui.tang@gmail.com

From: Jark Wu
Date: 2018-12-05 10:27
To: dev; xuefu.z
Subject: Re: [DISCUSS] Support Higher-order functions in Flink sql
Hi Wenhui,

This is a meaningful direction to improve the functionality for Flink SQL.
As Xuefu suggested, you can come up with a design doc covering the
functions you'd like to support and the improvements.
IMO, the main obstacle might be the syntax for the lambda function which is
not supported in Calcite currently, such as: "TRANSFORM(arrays, element ->
element + 1)". In order to support this syntax,
we might need to discuss it in Calcite community. It is not like DDL
parser, the DDL parser is easy to extend in a plugin way which is Calcite
suggests.

It would be great if you can share more thoughts or works on this.

Best,
Jark

On Mon, 3 Dec 2018 at 17:20, Zhang, Xuefu <xu...@alibaba-inc.com> wrote:

> Hi Wenhui,
>
> Thanks for bringing the topics up. Both make sense to me. For higher-order
> functions, I'd suggest you come up with a list of things you'd like to add.
> Overall, Flink SQL is weak in handling complex types. Ideally we should
> have a doc covering the gaps and provide a roadmap for enhancement. It
> would be great if you can broaden the topic a bit.
>
> Thanks,
> Xuefu
>
>
> ------------------------------------------------------------------
> Sender:winifred.wenhui.tang@gmail.com <wi...@gmail.com>
> Sent at:2018 Dec 3 (Mon) 16:13
> Recipient:dev <de...@flink.apache.org>
> Subject:[DISCUSS] Support Higher-order functions in Flink sql
>
> Hello all，
>
> Spark 2.4.0 was released last month. I noticed that Spark 2.4
> “Add a lot of new built-in functions, including higher-order functions, to
> deal with complex data types easier.”[1]
> I wonder if it's necessary for Flink to add higher-order functions to
> enhance it's ability.
>
> By the way, I found that if we wants to enhance the functionality of Flink
> sql, we often need to modify Calcite. It may be a little inconvenient，so
> may be we can extend Calcite core parser in Flink to deal with some
> non-standard SQL syntax, as mentioned in Flink SQL DDL Design[2].
>
> Look forward to your feedback.
>
> Best,
> Wen-hui Tang
>
> [1] https://issues.apache.org/jira/browse/SPARK-23899
> [2]
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#
>
>
>
> Winifred-wenhui Tang
>
>

Re: [DISCUSS] Support Higher-order functions in Flink sql

Posted by Jark Wu <im...@gmail.com>.

Hi Wenhui,

This is a meaningful direction to improve the functionality for Flink SQL.
As Xuefu suggested, you can come up with a design doc covering the
functions you'd like to support and the improvements.
IMO, the main obstacle might be the syntax for the lambda function which is
not supported in Calcite currently, such as: "TRANSFORM(arrays, element ->
element + 1)". In order to support this syntax,
we might need to discuss it in Calcite community. It is not like DDL
parser, the DDL parser is easy to extend in a plugin way which is Calcite
suggests.

It would be great if you can share more thoughts or works on this.

Best,
Jark

On Mon, 3 Dec 2018 at 17:20, Zhang, Xuefu <xu...@alibaba-inc.com> wrote:

> Hi Wenhui,
>
> Thanks for bringing the topics up. Both make sense to me. For higher-order
> functions, I'd suggest you come up with a list of things you'd like to add.
> Overall, Flink SQL is weak in handling complex types. Ideally we should
> have a doc covering the gaps and provide a roadmap for enhancement. It
> would be great if you can broaden the topic a bit.
>
> Thanks,
> Xuefu
>
>
> ------------------------------------------------------------------
> Sender:winifred.wenhui.tang@gmail.com <wi...@gmail.com>
> Sent at:2018 Dec 3 (Mon) 16:13
> Recipient:dev <de...@flink.apache.org>
> Subject:[DISCUSS] Support Higher-order functions in Flink sql
>
> Hello all，
>
> Spark 2.4.0 was released last month. I noticed that Spark 2.4
> “Add a lot of new built-in functions, including higher-order functions, to
> deal with complex data types easier.”[1]
> I wonder if it's necessary for Flink to add higher-order functions to
> enhance it's ability.
>
> By the way, I found that if we wants to enhance the functionality of Flink
> sql, we often need to modify Calcite. It may be a little inconvenient，so
> may be we can extend Calcite core parser in Flink to deal with some
> non-standard SQL syntax, as mentioned in Flink SQL DDL Design[2].
>
> Look forward to your feedback.
>
> Best,
> Wen-hui Tang
>
> [1] https://issues.apache.org/jira/browse/SPARK-23899
> [2]
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#
>
>
>
> Winifred-wenhui Tang
>
>

Re: [DISCUSS] Support Higher-order functions in Flink sql

Posted by "Zhang, Xuefu" <xu...@alibaba-inc.com>.

Hi Wenhui,

Thanks for bringing the topics up. Both make sense to me. For higher-order functions, I'd suggest you come up with a list of things you'd like to add. Overall, Flink SQL is weak in handling complex types. Ideally we should have a doc covering the gaps and provide a roadmap for enhancement. It would be great if you can broaden the topic a bit.

Thanks,
Xuefu 


------------------------------------------------------------------
Sender:winifred.wenhui.tang@gmail.com <wi...@gmail.com>
Sent at:2018 Dec 3 (Mon) 16:13
Recipient:dev <de...@flink.apache.org>
Subject:[DISCUSS] Support Higher-order functions in Flink sql

Hello all，

Spark 2.4.0 was released last month. I noticed that Spark 2.4 
“Add a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.”[1]
I wonder if it's necessary for Flink to add higher-order functions to enhance it's ability.

By the way, I found that if we wants to enhance the functionality of Flink sql, we often need to modify Calcite. It may be a little inconvenient，so may be we can extend Calcite core parser in Flink to deal with some non-standard SQL syntax, as mentioned in Flink SQL DDL Design[2].

Look forward to your feedback.

Best,
Wen-hui Tang

[1] https://issues.apache.org/jira/browse/SPARK-23899
[2] https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#



Winifred-wenhui Tang