You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "DONG, Weike" <ky...@connect.hku.hk> on 2020/03/09 04:46:17 UTC

[DISCUSS] Use SET statement to set table config in Flink SQL and implement a unified SQL call method

Hi dev,

Recently we have tested the brand-new SQLClient and Flink SQL module, and
we are amazed at this simple way of programming for streaming data
analysis. However, as far as I know, the SET command is only available in
the SQL Client, but not in SQL API.

Although we understand that developers could simply set TableConfig via tEnv
.getConfig().getConfiguration() API, however, we hope that there could be
an API like sqlSet() or something like that, to allow for setting table
configurations within SQL statements themselves, which paves the way for a
unified interface for users to write a Flink SQL job, without the need of
writing any Java or Scala code in a production environment.

Moreover, it could be much better if there could be an API that
automatically detect the type of SQL statement and choose the write logic
to execute, instead of manually choosing sqlUpdate or sqlQuery, i.e.

sql("CREATE TABLE abc ( a VARCHAR(10), b BIGINT ) WITH ( 'xxx' = 'yyy' )");
sql("SET table.exec.mini-batch.enabled = 'true'");
sql("INSERT INTO sink SELECT * FROM abc");

then, users could simply write their SQL code within .sql files and Flink
could read them line by line and call sql() method to parse the code, and
eventually submit to the ExecutionEnvironment and run the program in the
cluster, which is different from current SQL client whose interactive way
of programming is not well suited for production usage.

We would like to know if these proposals contradicts with the current plan
of the community, or if any other issues that should be addressed before
implementing such features.

Thanks,
Weike

Re: [DISCUSS] Use SET statement to set table config in Flink SQL and implement a unified SQL call method

Posted by Jark Wu <im...@gmail.com>.
Hi Weike and Tison,

This is already covered in FLIP-84 [1], we will propose a new method
"executeStatement(String statement)"
which can execute arbitrary statement including SET, CREATE. This is in the
progress [2].

Best,
Jark

[1]:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
[2]: https://issues.apache.org/jira/browse/FLINK-16366

On Mon, 9 Mar 2020 at 13:22, tison <wa...@gmail.com> wrote:

> Hi Weike,
>
> Thanks for kicking off this discussion! I cannot agree more on the
> proposal for
> a universal sql() method. It confuses & annoys our users a lot to
> distinguish
> sqlUpdate/sqlQuery and even insertInto and so on.
>
> IIRC there is an ongoing FLIP[1] dealing with the problem. You can
> checkout to
> see if it fits into your requirements.
>
> Besides, for enabling SET in sql statement, I agree that it helps on
> consistent user
> experience using *just* SQL to describe their Flink job. Looking forward
> to maintainers'
> idea on the possibility & plan.
>
> Best,
> tison.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>
>
> DONG, Weike <ky...@connect.hku.hk> 于2020年3月9日周一 下午12:46写道:
>
>> Hi dev,
>>
>> Recently we have tested the brand-new SQLClient and Flink SQL module, and
>> we are amazed at this simple way of programming for streaming data
>> analysis. However, as far as I know, the SET command is only available in
>> the SQL Client, but not in SQL API.
>>
>> Although we understand that developers could simply set TableConfig via
>> tEnv
>> .getConfig().getConfiguration() API, however, we hope that there could be
>> an API like sqlSet() or something like that, to allow for setting table
>> configurations within SQL statements themselves, which paves the way for a
>> unified interface for users to write a Flink SQL job, without the need of
>> writing any Java or Scala code in a production environment.
>>
>> Moreover, it could be much better if there could be an API that
>> automatically detect the type of SQL statement and choose the write logic
>> to execute, instead of manually choosing sqlUpdate or sqlQuery, i.e.
>>
>> sql("CREATE TABLE abc ( a VARCHAR(10), b BIGINT ) WITH ( 'xxx' = 'yyy'
>> )");
>> sql("SET table.exec.mini-batch.enabled = 'true'");
>> sql("INSERT INTO sink SELECT * FROM abc");
>>
>> then, users could simply write their SQL code within .sql files and Flink
>> could read them line by line and call sql() method to parse the code, and
>> eventually submit to the ExecutionEnvironment and run the program in the
>> cluster, which is different from current SQL client whose interactive way
>> of programming is not well suited for production usage.
>>
>> We would like to know if these proposals contradicts with the current plan
>> of the community, or if any other issues that should be addressed before
>> implementing such features.
>>
>> Thanks,
>> Weike
>>
>

Re: [DISCUSS] Use SET statement to set table config in Flink SQL and implement a unified SQL call method

Posted by "DONG, Weike" <ky...@connect.hku.hk>.
Hi Timo,

After carefully read FLIP-91 (SQL Client Gateway), I have found that it
still focuses on ad-hoc (or realtime) queries of batch data, which is quite
different from the streaming case.

Here I suppose if we could combine some features in FLIP-84 (generic
all-purpose executeStatement() ) with JDBC compliant SQL Gateway (FLIP-91),
to make submitting online streaming SQL jobs feasible.

Just an immature thought, and would like to know if the community plans to
do so in the foreseeable future : )

Best,
Weike

On Mon, Mar 9, 2020 at 6:07 PM Timo Walther <tw...@apache.org> wrote:

> Hi Weike,
>
> thanks for your feedback. Your use case is definitely on our agenda. The
> redesign of big parts of the API is still in progress. In the mid-term,
> most of the SQL Client commands should be present in the SQL API as well
> such that platform teams can built their custom logic (like REST APIs
> etc.) around it.
>
> For pure SQL users, there are discussions of making the SQL Client
> richer in the future see:
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>
> Regards,
> Timo
>
>
> On 09.03.20 08:15, DONG, Weike wrote:
> > Hi Tison and all,
> >
> > Thanks for the timely response, and I have carefully examined the
> > aforementioned FLIP-84. As I see it, executeStatement() is kind of akin
> to
> > our original design of sql() method, but with more detailed
> considerations
> > included.
> >
> > However, it does not cover SET statement to tune TableConfig, and the
> > differences among all those Environments (TableEnvironment,
> > StreamTableEnvironment, ExecutionEnvironment, StreamExecutionEnvironment,
> > etc.) might still confuse users, especially about the effects of
> execute()
> > / executeStatements() methods when old APIs are not yet completely
> removed,
> > which poses as a heavy burden for newcomers and hinders user-adoption
> > process.
> >
> > Therefore I believe that the table API needs a further cohesive re-design
> > by improving FLIP-84, or provide a purely SQL interface which removes the
> > burden of learning all those complex concepts (run Flink streaming or
> batch
> > programs from SQL files).
> >
> > Hope to hear any suggestions or questions, thanks : )
> >
> > Sincerely,
> > Weike
> >
> > On Mon, Mar 9, 2020 at 1:23 PM tison <wa...@gmail.com> wrote:
> >
> >> Hi Weike,
> >>
> >> Thanks for kicking off this discussion! I cannot agree more on the
> proposal
> >> for
> >> a universal sql() method. It confuses & annoys our users a lot to
> >> distinguish
> >> sqlUpdate/sqlQuery and even insertInto and so on.
> >>
> >> IIRC there is an ongoing FLIP[1] dealing with the problem. You can
> checkout
> >> to
> >> see if it fits into your requirements.
> >>
> >> Besides, for enabling SET in sql statement, I agree that it helps on
> >> consistent user
> >> experience using *just* SQL to describe their Flink job. Looking
> forward to
> >> maintainers'
> >> idea on the possibility & plan.
> >>
> >> Best,
> >> tison.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>
> >>
> >> DONG, Weike <ky...@connect.hku.hk> 于2020年3月9日周一 下午12:46写道:
> >>
> >>> Hi dev,
> >>>
> >>> Recently we have tested the brand-new SQLClient and Flink SQL module,
> and
> >>> we are amazed at this simple way of programming for streaming data
> >>> analysis. However, as far as I know, the SET command is only available
> in
> >>> the SQL Client, but not in SQL API.
> >>>
> >>> Although we understand that developers could simply set TableConfig via
> >>> tEnv
> >>> .getConfig().getConfiguration() API, however, we hope that there could
> be
> >>> an API like sqlSet() or something like that, to allow for setting table
> >>> configurations within SQL statements themselves, which paves the way
> for
> >> a
> >>> unified interface for users to write a Flink SQL job, without the need
> of
> >>> writing any Java or Scala code in a production environment.
> >>>
> >>> Moreover, it could be much better if there could be an API that
> >>> automatically detect the type of SQL statement and choose the write
> logic
> >>> to execute, instead of manually choosing sqlUpdate or sqlQuery, i.e.
> >>>
> >>> sql("CREATE TABLE abc ( a VARCHAR(10), b BIGINT ) WITH ( 'xxx' = 'yyy'
> >> )");
> >>> sql("SET table.exec.mini-batch.enabled = 'true'");
> >>> sql("INSERT INTO sink SELECT * FROM abc");
> >>>
> >>> then, users could simply write their SQL code within .sql files and
> Flink
> >>> could read them line by line and call sql() method to parse the code,
> and
> >>> eventually submit to the ExecutionEnvironment and run the program in
> the
> >>> cluster, which is different from current SQL client whose interactive
> way
> >>> of programming is not well suited for production usage.
> >>>
> >>> We would like to know if these proposals contradicts with the current
> >> plan
> >>> of the community, or if any other issues that should be addressed
> before
> >>> implementing such features.
> >>>
> >>> Thanks,
> >>> Weike
> >>>
> >>
> >
>
>

Re: [DISCUSS] Use SET statement to set table config in Flink SQL and implement a unified SQL call method

Posted by Timo Walther <tw...@apache.org>.
Hi Weike,

thanks for your feedback. Your use case is definitely on our agenda. The 
redesign of big parts of the API is still in progress. In the mid-term, 
most of the SQL Client commands should be present in the SQL API as well 
such that platform teams can built their custom logic (like REST APIs 
etc.) around it.

For pure SQL users, there are discussions of making the SQL Client 
richer in the future see:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway

Regards,
Timo


On 09.03.20 08:15, DONG, Weike wrote:
> Hi Tison and all,
> 
> Thanks for the timely response, and I have carefully examined the
> aforementioned FLIP-84. As I see it, executeStatement() is kind of akin to
> our original design of sql() method, but with more detailed considerations
> included.
> 
> However, it does not cover SET statement to tune TableConfig, and the
> differences among all those Environments (TableEnvironment,
> StreamTableEnvironment, ExecutionEnvironment, StreamExecutionEnvironment,
> etc.) might still confuse users, especially about the effects of execute()
> / executeStatements() methods when old APIs are not yet completely removed,
> which poses as a heavy burden for newcomers and hinders user-adoption
> process.
> 
> Therefore I believe that the table API needs a further cohesive re-design
> by improving FLIP-84, or provide a purely SQL interface which removes the
> burden of learning all those complex concepts (run Flink streaming or batch
> programs from SQL files).
> 
> Hope to hear any suggestions or questions, thanks : )
> 
> Sincerely,
> Weike
> 
> On Mon, Mar 9, 2020 at 1:23 PM tison <wa...@gmail.com> wrote:
> 
>> Hi Weike,
>>
>> Thanks for kicking off this discussion! I cannot agree more on the proposal
>> for
>> a universal sql() method. It confuses & annoys our users a lot to
>> distinguish
>> sqlUpdate/sqlQuery and even insertInto and so on.
>>
>> IIRC there is an ongoing FLIP[1] dealing with the problem. You can checkout
>> to
>> see if it fits into your requirements.
>>
>> Besides, for enabling SET in sql statement, I agree that it helps on
>> consistent user
>> experience using *just* SQL to describe their Flink job. Looking forward to
>> maintainers'
>> idea on the possibility & plan.
>>
>> Best,
>> tison.
>>
>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>
>>
>> DONG, Weike <ky...@connect.hku.hk> 于2020年3月9日周一 下午12:46写道:
>>
>>> Hi dev,
>>>
>>> Recently we have tested the brand-new SQLClient and Flink SQL module, and
>>> we are amazed at this simple way of programming for streaming data
>>> analysis. However, as far as I know, the SET command is only available in
>>> the SQL Client, but not in SQL API.
>>>
>>> Although we understand that developers could simply set TableConfig via
>>> tEnv
>>> .getConfig().getConfiguration() API, however, we hope that there could be
>>> an API like sqlSet() or something like that, to allow for setting table
>>> configurations within SQL statements themselves, which paves the way for
>> a
>>> unified interface for users to write a Flink SQL job, without the need of
>>> writing any Java or Scala code in a production environment.
>>>
>>> Moreover, it could be much better if there could be an API that
>>> automatically detect the type of SQL statement and choose the write logic
>>> to execute, instead of manually choosing sqlUpdate or sqlQuery, i.e.
>>>
>>> sql("CREATE TABLE abc ( a VARCHAR(10), b BIGINT ) WITH ( 'xxx' = 'yyy'
>> )");
>>> sql("SET table.exec.mini-batch.enabled = 'true'");
>>> sql("INSERT INTO sink SELECT * FROM abc");
>>>
>>> then, users could simply write their SQL code within .sql files and Flink
>>> could read them line by line and call sql() method to parse the code, and
>>> eventually submit to the ExecutionEnvironment and run the program in the
>>> cluster, which is different from current SQL client whose interactive way
>>> of programming is not well suited for production usage.
>>>
>>> We would like to know if these proposals contradicts with the current
>> plan
>>> of the community, or if any other issues that should be addressed before
>>> implementing such features.
>>>
>>> Thanks,
>>> Weike
>>>
>>
> 


Re: [DISCUSS] Use SET statement to set table config in Flink SQL and implement a unified SQL call method

Posted by "DONG, Weike" <ky...@connect.hku.hk>.
Hi Tison and all,

Thanks for the timely response, and I have carefully examined the
aforementioned FLIP-84. As I see it, executeStatement() is kind of akin to
our original design of sql() method, but with more detailed considerations
included.

However, it does not cover SET statement to tune TableConfig, and the
differences among all those Environments (TableEnvironment,
StreamTableEnvironment, ExecutionEnvironment, StreamExecutionEnvironment,
etc.) might still confuse users, especially about the effects of execute()
/ executeStatements() methods when old APIs are not yet completely removed,
which poses as a heavy burden for newcomers and hinders user-adoption
process.

Therefore I believe that the table API needs a further cohesive re-design
by improving FLIP-84, or provide a purely SQL interface which removes the
burden of learning all those complex concepts (run Flink streaming or batch
programs from SQL files).

Hope to hear any suggestions or questions, thanks : )

Sincerely,
Weike

On Mon, Mar 9, 2020 at 1:23 PM tison <wa...@gmail.com> wrote:

> Hi Weike,
>
> Thanks for kicking off this discussion! I cannot agree more on the proposal
> for
> a universal sql() method. It confuses & annoys our users a lot to
> distinguish
> sqlUpdate/sqlQuery and even insertInto and so on.
>
> IIRC there is an ongoing FLIP[1] dealing with the problem. You can checkout
> to
> see if it fits into your requirements.
>
> Besides, for enabling SET in sql statement, I agree that it helps on
> consistent user
> experience using *just* SQL to describe their Flink job. Looking forward to
> maintainers'
> idea on the possibility & plan.
>
> Best,
> tison.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>
>
> DONG, Weike <ky...@connect.hku.hk> 于2020年3月9日周一 下午12:46写道:
>
> > Hi dev,
> >
> > Recently we have tested the brand-new SQLClient and Flink SQL module, and
> > we are amazed at this simple way of programming for streaming data
> > analysis. However, as far as I know, the SET command is only available in
> > the SQL Client, but not in SQL API.
> >
> > Although we understand that developers could simply set TableConfig via
> > tEnv
> > .getConfig().getConfiguration() API, however, we hope that there could be
> > an API like sqlSet() or something like that, to allow for setting table
> > configurations within SQL statements themselves, which paves the way for
> a
> > unified interface for users to write a Flink SQL job, without the need of
> > writing any Java or Scala code in a production environment.
> >
> > Moreover, it could be much better if there could be an API that
> > automatically detect the type of SQL statement and choose the write logic
> > to execute, instead of manually choosing sqlUpdate or sqlQuery, i.e.
> >
> > sql("CREATE TABLE abc ( a VARCHAR(10), b BIGINT ) WITH ( 'xxx' = 'yyy'
> )");
> > sql("SET table.exec.mini-batch.enabled = 'true'");
> > sql("INSERT INTO sink SELECT * FROM abc");
> >
> > then, users could simply write their SQL code within .sql files and Flink
> > could read them line by line and call sql() method to parse the code, and
> > eventually submit to the ExecutionEnvironment and run the program in the
> > cluster, which is different from current SQL client whose interactive way
> > of programming is not well suited for production usage.
> >
> > We would like to know if these proposals contradicts with the current
> plan
> > of the community, or if any other issues that should be addressed before
> > implementing such features.
> >
> > Thanks,
> > Weike
> >
>

Re: [DISCUSS] Use SET statement to set table config in Flink SQL and implement a unified SQL call method

Posted by tison <wa...@gmail.com>.
Hi Weike,

Thanks for kicking off this discussion! I cannot agree more on the proposal
for
a universal sql() method. It confuses & annoys our users a lot to
distinguish
sqlUpdate/sqlQuery and even insertInto and so on.

IIRC there is an ongoing FLIP[1] dealing with the problem. You can checkout
to
see if it fits into your requirements.

Besides, for enabling SET in sql statement, I agree that it helps on
consistent user
experience using *just* SQL to describe their Flink job. Looking forward to
maintainers'
idea on the possibility & plan.

Best,
tison.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878


DONG, Weike <ky...@connect.hku.hk> 于2020年3月9日周一 下午12:46写道:

> Hi dev,
>
> Recently we have tested the brand-new SQLClient and Flink SQL module, and
> we are amazed at this simple way of programming for streaming data
> analysis. However, as far as I know, the SET command is only available in
> the SQL Client, but not in SQL API.
>
> Although we understand that developers could simply set TableConfig via
> tEnv
> .getConfig().getConfiguration() API, however, we hope that there could be
> an API like sqlSet() or something like that, to allow for setting table
> configurations within SQL statements themselves, which paves the way for a
> unified interface for users to write a Flink SQL job, without the need of
> writing any Java or Scala code in a production environment.
>
> Moreover, it could be much better if there could be an API that
> automatically detect the type of SQL statement and choose the write logic
> to execute, instead of manually choosing sqlUpdate or sqlQuery, i.e.
>
> sql("CREATE TABLE abc ( a VARCHAR(10), b BIGINT ) WITH ( 'xxx' = 'yyy' )");
> sql("SET table.exec.mini-batch.enabled = 'true'");
> sql("INSERT INTO sink SELECT * FROM abc");
>
> then, users could simply write their SQL code within .sql files and Flink
> could read them line by line and call sql() method to parse the code, and
> eventually submit to the ExecutionEnvironment and run the program in the
> cluster, which is different from current SQL client whose interactive way
> of programming is not well suited for production usage.
>
> We would like to know if these proposals contradicts with the current plan
> of the community, or if any other issues that should be addressed before
> implementing such features.
>
> Thanks,
> Weike
>