You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Rui Li <li...@gmail.com> on 2021/02/01 06:46:17 UTC

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Thanks Shengkai for the update! The proposed changes look good to me.

On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com> wrote:

> Hi, Rui.
> You are right. I have already modified the FLIP.
>
> The main changes:
>
> # -f parameter has no restriction about the statement type.
> Sometimes, users use the pipe to redirect the result of queries to debug
> when submitting job by -f parameter. It's much convenient comparing to
> writing INSERT INTO statements.
>
> # Add a new sql client option `sql-client.job.detach` .
> Users prefer to execute jobs one by one in the batch mode. Users can set
> this option false and the client will process the next job until the
> current job finishes. The default value of this option is false, which
> means the client will execute the next job when the current job is
> submitted.
>
> Best,
> Shengkai
>
>
>
> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>
>> Hi Shengkai,
>>
>> Regarding #2, maybe the -f options in flink and hive have different
>> implications, and we should clarify the behavior. For example, if the
>> client just submits the job and exits, what happens if the file contains
>> two INSERT statements? I don't think we should treat them as a statement
>> set, because users should explicitly write BEGIN STATEMENT SET in that
>> case. And the client shouldn't asynchronously submit the two jobs, because
>> the 2nd may depend on the 1st, right?
>>
>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com> wrote:
>>
>>> Hi Rui,
>>> Thanks for your feedback. I agree with your suggestions.
>>>
>>> For the suggestion 1: Yes. we are plan to strengthen the set command. In
>>> the implementation, it will just put the key-value into the
>>> `Configuration`, which will be used to generate the table config. If hive
>>> supports to read the setting from the table config, users are able to set
>>> the hive-related settings.
>>>
>>> For the suggestion 2: The -f parameter will submit the job and exit. If
>>> the queries never end, users have to cancel the job by themselves, which is
>>> not reliable(people may forget their jobs). In most case, queries are used
>>> to analyze the data. Users should use queries in the interactive mode.
>>>
>>> Best,
>>> Shengkai
>>>
>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>>>
>>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>>>> lot of useful features which will dramatically improve the usability of our
>>>> SQL Client. I have two questions regarding the FLIP.
>>>>
>>>> 1. Do you think we can let users set arbitrary configurations via the
>>>> SET command? A connector may have its own configurations and we don't have
>>>> a way to dynamically change such configurations in SQL Client. For example,
>>>> users may want to be able to change hive conf when using hive connector [1].
>>>> 2. Any reason why we have to forbid queries in SQL files specified with
>>>> the -f option? Hive supports a similar -f option but allows queries in the
>>>> file. And a common use case is to run some query and redirect the results
>>>> to a file. So I think maybe flink users would like to do the same,
>>>> especially in batch scenarios.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>
>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <li...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Shengkai,
>>>>>
>>>>> Glad to see this improvement. And I have some additional suggestions:
>>>>>
>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>> StreamTableEnvironment for both streaming and batch sql.
>>>>> #2. Improve the way of results retrieval: sql client collect the
>>>>> results
>>>>> locally all at once using accumulators at present,
>>>>>       which may have memory issues in JM or Local for the big query
>>>>> result.
>>>>> Accumulator is only suitable for testing purpose.
>>>>>       We may change to use SelectTableSink, which is based
>>>>> on CollectSinkOperatorCoordinator.
>>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
>>>>> that this FLIP has not moved forward for a long time.
>>>>>       Provide a long running service out of the box to facilitate the
>>>>> sql
>>>>> submission is necessary.
>>>>>
>>>>> What do you think of these?
>>>>>
>>>>> [1]
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>
>>>>>
>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
>>>>>
>>>>> > Hi devs,
>>>>> >
>>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>>>>> > Improvements.
>>>>> >
>>>>> > Many users have complained about the problems of the sql client. For
>>>>> > example, users can not register the table proposed by FLIP-95.
>>>>> >
>>>>> > The main changes in this FLIP:
>>>>> >
>>>>> > - use -i parameter to specify the sql file to initialize the table
>>>>> > environment and deprecated YAML file;
>>>>> > - add -f to submit sql file and deprecated '-u' parameter;
>>>>> > - add more interactive commands, e.g ADD JAR;
>>>>> > - support statement set syntax;
>>>>> >
>>>>> >
>>>>> > For more detailed changes, please refer to FLIP-163[1].
>>>>> >
>>>>> > Look forward to your feedback.
>>>>> >
>>>>> >
>>>>> > Best,
>>>>> > Shengkai
>>>>> >
>>>>> > [1]
>>>>> >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *With kind regards
>>>>> ------------------------------------------------------------
>>>>> Sebastian Liu 刘洋
>>>>> Institute of Computing Technology, Chinese Academy of Science
>>>>> Mobile\WeChat: +86—15201613655
>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
>>>>> QQ: 3239559*
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>

-- 
Best regards!
Rui Li

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

+1 for option 2

Regards,
Timo


On 02.03.21 03:50, Jark Wu wrote:
> I prefer option#2 and I think this can make everyone happy.
> 
> Best,
> Jark
> 
> On Mon, 1 Mar 2021 at 18:22, Shengkai Fang <fs...@gmail.com> wrote:
> 
>> Hi, everyone.
>>
>> After the long discussion, I am fine with both choices. But I prefer the
>> second option that applies to both table modules and sql client. Just as
>> Timo said, the option `table.dml-sync` can improve the SQL script
>> portability. Users don't need to modify the script and execute the script
>> in different platforms e.g gateway.
>>
>> What do you think? CC Timo, Jark, Leonard.
>>
>> Best,
>> Shengkai.
>>
>> Kurt Young <yk...@gmail.com> 于2021年3月1日周一 下午5:11写道：
>>
>>> I'm +1 for either:
>>> 1. introduce a sql client specific option, or
>>> 2. Introduce a table config option and make it apply to both table
>> module &
>>> sql client.
>>>
>>> It would be the FLIP owner's call to decide.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <tw...@apache.org> wrote:
>>>
>>>> We could also think about reading this config option in Table API. The
>>>> effect would be to call `await()` directly in an execute call. I could
>>>> also imagine this to be useful esp. when you fire a lot of insert into
>>>> queries. We had the case before that users where confused that the
>>>> execution happens asynchronously, such an option could prevent this to
>>>> happen again.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>> On 01.03.21 05:14, Kurt Young wrote:
>>>>> I also asked some users about their opinion that if we introduce some
>>>>> config prefixed with "table" but doesn't
>>>>> have affection with methods in Table API and SQL. All of them are
>> kind
>>> of
>>>>> shocked by such question, asking
>>>>> why we would do anything like this.
>>>>>
>>>>> This kind of reaction actually doesn't surprise me a lot, so I jumped
>>> in
>>>>> and challenged this config option even
>>>>> after the FLIP had already been accepted.
>>>>>
>>>>> If we only have to define the execution behavior for multiple
>>> statements
>>>> in
>>>>> SQL client, we should only introduce
>>>>> a config option which would tell users it's affection scope by its
>>> name.
>>>>> Prefixing with "table" is definitely not a good
>>>>> idea here.
>>>>>
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>>
>>>>> On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <xb...@gmail.com>
>> wrote:
>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> Look like there’s only one divergence about option [ table |
>>> sql-client
>>>>>> ].dml-sync in this thread, correct me if I’m wrong.
>>>>>>
>>>>>> 1. Leaving the context of this thread, from a user's perspective,
>>>>>> the table.xx configurations should take effect in Table API & SQL,
>>>>>> the sql-client.xx configurations should only take effect in
>>> sql-client.
>>>>>>    In my(the user's) opinion, other explanations are
>> counterintuitive.
>>>>>>
>>>>>> 2.  It should be pointed out that both all existed table.xx
>>>> configurations
>>>>>> like table.exec.state.ttl, table.optimizer.agg-phase-strategy,
>>>>>> table.local-time-zone,etc..  and the proposed sql-client.xx
>>>> configurations
>>>>>> like sql-client.verbose, sql-client.execution.max-table-result.rows
>>>>>> comply with this convention.
>>>>>>
>>>>>> 3. Considering the portability to support different CLI tools
>>>> (sql-client,
>>>>>> sql-gateway, etc.), I prefer table.dml-sync.
>>>>>>
>>>>>> In addition, I think sql-client/sql-gateway/other CLI tools can be
>>>> placed
>>>>>> out of flink-table module even in an external project, this should
>> not
>>>>>> affect our conclusion.
>>>>>>
>>>>>>
>>>>>> Hope this can help you.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Leonard
>>>>>>
>>>>>>
>>>>>>
>>>>>>> 在 2021年2月25日，18:51，Shengkai Fang <fs...@gmail.com> 写道：
>>>>>>>
>>>>>>> Hi, everyone.
>>>>>>>
>>>>>>> I do some summaries about the discussion about the option. If the
>>>> summary
>>>>>>> has errors, please correct me.
>>>>>>>
>>>>>>> `table.dml-sync`:
>>>>>>> - take effect for `executeMultiSql` and sql client
>>>>>>> - benefit: SQL script portability. One script for all platforms.
>>>>>>> - drawback: Don't work for `TableEnvironment#executeSql`.
>>>>>>>
>>>>>>> `table.multi-dml-sync`:
>>>>>>> - take effect for `executeMultiSql` and sql client
>>>>>>> - benefit: SQL script portability
>>>>>>> - drawback: It's confused when the sql script has one dml statement
>>> but
>>>>>>> need to set option `table.multi-dml-sync`
>>>>>>>
>>>>>>> `client.dml-sync`:
>>>>>>> - take effect for sql client only
>>>>>>> - benefit: clear definition.
>>>>>>> - drawback: Every platform needs to define its own option. Bad SQL
>>>> script
>>>>>>> portability.
>>>>>>>
>>>>>>> Just as Jark said, I think the `table.dml-sync` is a good choice if
>>> we
>>>>>> can
>>>>>>> extend its scope and make this option works for `executeSql`.
>>>>>>> It's straightforward and users can use this option now in table
>> api.
>>>> The
>>>>>>> drawback is the  `TableResult#await` plays the same role as the
>>> option.
>>>>>> I
>>>>>>> don't think the drawback is really critical because many systems
>> have
>>>>>>> commands play the same role with the different names.
>>>>>>>
>>>>>>> Best,
>>>>>>> Shengkai
>>>>>>>
>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：
>>>>>>>
>>>>>>>> The `table.` prefix is meant to be a general option in the table
>>>>>>>> ecosystem. Not necessarily attached to Table API or SQL Client.
>>> That's
>>>>>>>> why SQL Client is also located in the `flink-table` module.
>>>>>>>>
>>>>>>>> My main concern is the SQL script portability. Declaring the
>>>> sync/async
>>>>>>>> behavior will happen in many SQL scripts. And users should be
>> easily
>>>>>>>> switch from SQL Client to some commercial product without the need
>>> of
>>>>>>>> changing the script again.
>>>>>>>>
>>>>>>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync`
>>>> later
>>>>>>>> but that would mean introducing future confusion. An app name
>> (what
>>>>>>>> `sql-client` kind of is) should not be part of a config option key
>>> if
>>>>>>>> other apps will need the same kind of option.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Timo
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24.02.21 08:59, Jark Wu wrote:
>>>>>>>>>>   From my point of view, I also prefer "sql-client.dml-sync",
>>>>>>>>> because the behavior of this configuration is very clear.
>>>>>>>>> Even if we introduce a new config in the future, e.g.
>>>> `table.dml-sync`,
>>>>>>>>> we can also deprecate the sql-client one.
>>>>>>>>>
>>>>>>>>> Introducing a "table."  configuration without any implementation
>>>>>>>>> will confuse users a lot, as they expect it should take effect on
>>>>>>>>> the Table API.
>>>>>>>>>
>>>>>>>>> If we want to introduce an unified "table.dml-sync" option, I
>>> prefer
>>>>>>>>> it should be implemented on Table API and affect all the DMLs on
>>>>>>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`,
>>> `StatementSet`),
>>>>>>>>> as I have mentioned before [1].
>>>>>>>>>
>>>>>>>>>> It would be very straightforward that it affects all the DMLs on
>>> SQL
>>>>>> CLI
>>>>>>>>> and
>>>>>>>>> TableEnvironment (including `executeSql`, `StatementSet`,
>>>>>>>>> `Table#executeInsert`, etc.).
>>>>>>>>> This can also make SQL CLI easy to support this configuration by
>>>>>> passing
>>>>>>>>> through to the TableEnv.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com>
>> wrote:
>>>>>>>>>
>>>>>>>>>> If we all agree the option should only be handled by sql client,
>>>> then
>>>>>>>> why
>>>>>>>>>> don't we
>>>>>>>>>> just call it `sql-client.dml-sync`? As you said, calling it
>>>>>>>>>> `table.dml-sync` but has no
>>>>>>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also
>> cause
>>> a
>>>>>> big
>>>>>>>>>> confusion for
>>>>>>>>>> users.
>>>>>>>>>>
>>>>>>>>>> The only concern I saw is if we introduce
>>>>>>>>>> "TableEnvironment.executeMultiSql()" in the
>>>>>>>>>> future, how do we control the synchronization between
>> statements?
>>>> TBH
>>>>>> I
>>>>>>>>>> don't really
>>>>>>>>>> see a strong requirement for such interfaces. Right now, we
>> have a
>>>>>>>> pretty
>>>>>>>>>> clear semantic
>>>>>>>>>> of `TableEnv.executeSql`, and it's very convenient for users if
>>> they
>>>>>>>> want
>>>>>>>>>> to execute multiple
>>>>>>>>>> sql statements. They can simulate either synced or async
>> execution
>>>>>> with
>>>>>>>>>> this building block.
>>>>>>>>>>
>>>>>>>>>> This will introduce slight overhead for users, but compared to
>> the
>>>>>>>>>> confusion we might
>>>>>>>>>> cause if we introduce such a method of our own, I think it's
>>> better
>>>> to
>>>>>>>> wait
>>>>>>>>>> for some more
>>>>>>>>>> feedback.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kurt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <
>> twalthr@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Kurt,
>>>>>>>>>>>
>>>>>>>>>>> we can also shorten it to `table.dml-sync` if that would help.
>>> Then
>>>>>> it
>>>>>>>>>>> would confuse users that do a regular `.executeSql("INSERT
>>> INTO")`
>>>>>> in a
>>>>>>>>>>> notebook session.
>>>>>>>>>>>
>>>>>>>>>>> In any case users will need to learn the semantics of this
>>> option.
>>>>>>>>>>> `table.multi-dml-sync` should be described as "If a you are in
>> a
>>>>>> multi
>>>>>>>>>>> statement environment, execute DMLs synchrounous.". I don't
>> have
>>> a
>>>>>>>>>>> strong opinion on shortening it to `table.dml-sync`.
>>>>>>>>>>>
>>>>>>>>>>> Just to clarify the implementation: The option should be
>> handled
>>> by
>>>>>> the
>>>>>>>>>>> SQL Client only, but the name can be shared accross platforms.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 23.02.21 09:54, Kurt Young wrote:
>>>>>>>>>>>> Sorry for the late reply, but I'm confused by
>>>>>> `table.multi-dml-sync`.
>>>>>>>>>>>>
>>>>>>>>>>>> IIUC this config will take effect with 2 use cases:
>>>>>>>>>>>> 1. SQL client, either interactive mode or executing multiple
>>>>>>>> statements
>>>>>>>>>>> via
>>>>>>>>>>>> -f. In most cases,
>>>>>>>>>>>> there will be only one INSERT INTO statement but we are
>>>> controlling
>>>>>>>> the
>>>>>>>>>>>> sync/async behavior
>>>>>>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of
>>> users.
>>>>>>>>>>> Besides,
>>>>>>>>>>>>
>>>>>>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future
>> work,
>>> we
>>>>>> are
>>>>>>>>>>> also
>>>>>>>>>>>> not sure if we will
>>>>>>>>>>>> really introduce this in the future.
>>>>>>>>>>>>
>>>>>>>>>>>> I would prefer to introduce this option for only sql client.
>> For
>>>>>>>>>>> platforms
>>>>>>>>>>>> Timo mentioned which
>>>>>>>>>>>> need to control such behavior, I think it's easy and flexible
>> to
>>>>>>>>>>> introduce
>>>>>>>>>>>> one on their own.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Kurt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <
>>> fskmine@gmail.com
>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi everyone.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for the late response.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For `execution.runtime-mode`, I think it's much better than
>>>>>>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
>>>>>>>>>>>>>
>>>>>>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We
>> should
>>>>>>>>>> clarify
>>>>>>>>>>> the
>>>>>>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be
>> allowed
>>>> to
>>>>>>>>>>> specify
>>>>>>>>>>>>> the table that is fully qualified and only works for the
>> table
>>>> that
>>>>>>>> is
>>>>>>>>>>>>> created by the sql statements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have updated the FLIP with suggestions. It seems we have
>>>> reached
>>>>>> a
>>>>>>>>>>>>> consensus, I'd like to start a formal vote for the FLIP.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ingo,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) I think you are right, the table path should be
>>>>>> fully-qualified.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
>>>>>>>>>>>>>> only aims to print DDL for the tables registered using SQL
>>>> CREATE
>>>>>>>>>> TABLE
>>>>>>>>>>>>>> DDL.
>>>>>>>>>>>>>> If a table is registered using Table API,  e.g.
>>>>>>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String,
>>>> DataStream)`,
>>>>>>>>>>>>>> currently it's not possible to print DDL for such tables.
>>>>>>>>>>>>>> I think we should point it out in the FLIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <ingo@ververica.com
>>>
>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE
>>>> statement.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned
>>> DDL
>>>>>>>>>> should
>>>>>>>>>>>>>>> always have the table identifier fully-qualified. Otherwise
>>> the
>>>>>> DDL
>>>>>>>>>>>>>> depends
>>>>>>>>>>>>>>> on the current context (catalog/database), which could be
>>>>>>>>>> surprising,
>>>>>>>>>>>>>>> especially since "the same" table can behave differently if
>>>>>> created
>>>>>>>>>> in
>>>>>>>>>>>>>>> different catalogs.
>>>>>>>>>>>>>>> 2) How should this handle tables which cannot be fully
>>>>>>>> characterized
>>>>>>>>>>> by
>>>>>>>>>>>>>>> properties only? I don't know if there's an example for
>> this
>>>> yet,
>>>>>>>>>> but
>>>>>>>>>>>>>>> hypothetically this is not currently a requirement, right?
>>> This
>>>>>>>>>> isn't
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> much of a problem if this syntax is SQL-client-specific,
>> but
>>> if
>>>>>>>> it's
>>>>>>>>>>>>>>> general Flink SQL syntax we should consider this (one way
>> or
>>>>>>>>>> another).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Ingo
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <
>>>> twalthr@apache.org
>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thanks for updating the FLIP.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have one last comment for the option
>>> `table.execution.mode`.
>>>>>>>>>> Should
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> already use the global Flink option
>> `execution.runtime-mode`
>>>>>>>>>> instead?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We are using Flink's options where possible (e.g. `
>>>>>> pipeline.name`
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> `parallism.default`) why not also for batch/streaming
>> mode?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The description of the option matches to the Blink planner
>>>>>>>>>> behavior:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> Among other things, this controls task scheduling, network
>>>>>> shuffle
>>>>>>>>>>>>>>>> behavior, and time semantics.
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
>>>>>>>>>>>>>>>>> Hi, guys.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have updated the FLIP.  It seems we have reached
>>> agreement.
>>>>>>>>>> Maybe
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> start the vote soon. If anyone has other questions,
>> please
>>>>>> leave
>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>> comments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best，
>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The conclusion sounds good to me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <
>>>>>> fskmine@gmail.com
>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi, Timo, Jark.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am fine with the new option name.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日
>> 周二下午5:35写道：
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be
>> future
>>>>>> work.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this
>> conclusion?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
>>>>>>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI
>>>> looks
>>>>>>>>>> like
>>>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>>> statement.
>>>>>>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting
>>> statements
>>>>>>>> from
>>>>>>>>>>>>>>>>>> opening
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> closing.
>>>>>>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
>>>>>>>>>> default),
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>> support this config
>>>>>>>>>>>>>>>>>>>>> in SQL CLI first, will support it in
>>>>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql()
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> the future, right?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
>>>>>>>> twalthr@apache.org
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should
>>> not
>>>>>>>>>> apply
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes
>> only
>>>>>> sense
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>> executing multi statements. Once we have a
>>>>>>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config
>> could
>>>> be
>>>>>>>>>>>>>>>>>> considered.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other
>>> platforms
>>>>>>>> will
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>> to have this config option, which is why I would
>> like
>>> to
>>>>>>>>>>>>> avoid a
>>>>>>>>>>>>>>> SQL
>>>>>>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has
>>> to
>>>>>> come
>>>>>>>>>>>>> up
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> this important config option separately.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync`
>> `table.multi-stmt-sync`?
>>> Or
>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>> opinions?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi, all.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main
>> problem
>>> is
>>>>>> we
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users
>> set
>>>> the
>>>>>>>>>>>>> option
>>>>>>>>>>>>>>>>>> true
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> use `TableResult#await` together.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> Shengkai.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

I prefer option#2 and I think this can make everyone happy.

Best,
Jark

On Mon, 1 Mar 2021 at 18:22, Shengkai Fang <fs...@gmail.com> wrote:

> Hi, everyone.
>
> After the long discussion, I am fine with both choices. But I prefer the
> second option that applies to both table modules and sql client. Just as
> Timo said, the option `table.dml-sync` can improve the SQL script
> portability. Users don't need to modify the script and execute the script
> in different platforms e.g gateway.
>
> What do you think? CC Timo, Jark, Leonard.
>
> Best,
> Shengkai.
>
> Kurt Young <yk...@gmail.com> 于2021年3月1日周一 下午5:11写道：
>
> > I'm +1 for either:
> > 1. introduce a sql client specific option, or
> > 2. Introduce a table config option and make it apply to both table
> module &
> > sql client.
> >
> > It would be the FLIP owner's call to decide.
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <tw...@apache.org> wrote:
> >
> > > We could also think about reading this config option in Table API. The
> > > effect would be to call `await()` directly in an execute call. I could
> > > also imagine this to be useful esp. when you fire a lot of insert into
> > > queries. We had the case before that users where confused that the
> > > execution happens asynchronously, such an option could prevent this to
> > > happen again.
> > >
> > > Regards,
> > > Timo
> > >
> > > On 01.03.21 05:14, Kurt Young wrote:
> > > > I also asked some users about their opinion that if we introduce some
> > > > config prefixed with "table" but doesn't
> > > > have affection with methods in Table API and SQL. All of them are
> kind
> > of
> > > > shocked by such question, asking
> > > > why we would do anything like this.
> > > >
> > > > This kind of reaction actually doesn't surprise me a lot, so I jumped
> > in
> > > > and challenged this config option even
> > > > after the FLIP had already been accepted.
> > > >
> > > > If we only have to define the execution behavior for multiple
> > statements
> > > in
> > > > SQL client, we should only introduce
> > > > a config option which would tell users it's affection scope by its
> > name.
> > > > Prefixing with "table" is definitely not a good
> > > > idea here.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <xb...@gmail.com>
> wrote:
> > > >
> > > >> Hi, all
> > > >>
> > > >> Look like there’s only one divergence about option [ table |
> > sql-client
> > > >> ].dml-sync in this thread, correct me if I’m wrong.
> > > >>
> > > >> 1. Leaving the context of this thread, from a user's perspective,
> > > >> the table.xx configurations should take effect in Table API & SQL,
> > > >> the sql-client.xx configurations should only take effect in
> > sql-client.
> > > >>   In my(the user's) opinion, other explanations are
> counterintuitive.
> > > >>
> > > >> 2.  It should be pointed out that both all existed table.xx
> > > configurations
> > > >> like table.exec.state.ttl, table.optimizer.agg-phase-strategy,
> > > >> table.local-time-zone,etc..  and the proposed sql-client.xx
> > > configurations
> > > >> like sql-client.verbose, sql-client.execution.max-table-result.rows
> > > >> comply with this convention.
> > > >>
> > > >> 3. Considering the portability to support different CLI tools
> > > (sql-client,
> > > >> sql-gateway, etc.), I prefer table.dml-sync.
> > > >>
> > > >> In addition, I think sql-client/sql-gateway/other CLI tools can be
> > > placed
> > > >> out of flink-table module even in an external project, this should
> not
> > > >> affect our conclusion.
> > > >>
> > > >>
> > > >> Hope this can help you.
> > > >>
> > > >>
> > > >> Best,
> > > >> Leonard
> > > >>
> > > >>
> > > >>
> > > >>> 在 2021年2月25日，18:51，Shengkai Fang <fs...@gmail.com> 写道：
> > > >>>
> > > >>> Hi, everyone.
> > > >>>
> > > >>> I do some summaries about the discussion about the option. If the
> > > summary
> > > >>> has errors, please correct me.
> > > >>>
> > > >>> `table.dml-sync`:
> > > >>> - take effect for `executeMultiSql` and sql client
> > > >>> - benefit: SQL script portability. One script for all platforms.
> > > >>> - drawback: Don't work for `TableEnvironment#executeSql`.
> > > >>>
> > > >>> `table.multi-dml-sync`:
> > > >>> - take effect for `executeMultiSql` and sql client
> > > >>> - benefit: SQL script portability
> > > >>> - drawback: It's confused when the sql script has one dml statement
> > but
> > > >>> need to set option `table.multi-dml-sync`
> > > >>>
> > > >>> `client.dml-sync`:
> > > >>> - take effect for sql client only
> > > >>> - benefit: clear definition.
> > > >>> - drawback: Every platform needs to define its own option. Bad SQL
> > > script
> > > >>> portability.
> > > >>>
> > > >>> Just as Jark said, I think the `table.dml-sync` is a good choice if
> > we
> > > >> can
> > > >>> extend its scope and make this option works for `executeSql`.
> > > >>> It's straightforward and users can use this option now in table
> api.
> > > The
> > > >>> drawback is the  `TableResult#await` plays the same role as the
> > option.
> > > >> I
> > > >>> don't think the drawback is really critical because many systems
> have
> > > >>> commands play the same role with the different names.
> > > >>>
> > > >>> Best,
> > > >>> Shengkai
> > > >>>
> > > >>> Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：
> > > >>>
> > > >>>> The `table.` prefix is meant to be a general option in the table
> > > >>>> ecosystem. Not necessarily attached to Table API or SQL Client.
> > That's
> > > >>>> why SQL Client is also located in the `flink-table` module.
> > > >>>>
> > > >>>> My main concern is the SQL script portability. Declaring the
> > > sync/async
> > > >>>> behavior will happen in many SQL scripts. And users should be
> easily
> > > >>>> switch from SQL Client to some commercial product without the need
> > of
> > > >>>> changing the script again.
> > > >>>>
> > > >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync`
> > > later
> > > >>>> but that would mean introducing future confusion. An app name
> (what
> > > >>>> `sql-client` kind of is) should not be part of a config option key
> > if
> > > >>>> other apps will need the same kind of option.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Timo
> > > >>>>
> > > >>>>
> > > >>>> On 24.02.21 08:59, Jark Wu wrote:
> > > >>>>>>  From my point of view, I also prefer "sql-client.dml-sync",
> > > >>>>> because the behavior of this configuration is very clear.
> > > >>>>> Even if we introduce a new config in the future, e.g.
> > > `table.dml-sync`,
> > > >>>>> we can also deprecate the sql-client one.
> > > >>>>>
> > > >>>>> Introducing a "table."  configuration without any implementation
> > > >>>>> will confuse users a lot, as they expect it should take effect on
> > > >>>>> the Table API.
> > > >>>>>
> > > >>>>> If we want to introduce an unified "table.dml-sync" option, I
> > prefer
> > > >>>>> it should be implemented on Table API and affect all the DMLs on
> > > >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`,
> > `StatementSet`),
> > > >>>>> as I have mentioned before [1].
> > > >>>>>
> > > >>>>>> It would be very straightforward that it affects all the DMLs on
> > SQL
> > > >> CLI
> > > >>>>> and
> > > >>>>> TableEnvironment (including `executeSql`, `StatementSet`,
> > > >>>>> `Table#executeInsert`, etc.).
> > > >>>>> This can also make SQL CLI easy to support this configuration by
> > > >> passing
> > > >>>>> through to the TableEnv.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Jark
> > > >>>>>
> > > >>>>>
> > > >>>>> [1]:
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
> > > >>>>>
> > > >>>>>
> > > >>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com>
> wrote:
> > > >>>>>
> > > >>>>>> If we all agree the option should only be handled by sql client,
> > > then
> > > >>>> why
> > > >>>>>> don't we
> > > >>>>>> just call it `sql-client.dml-sync`? As you said, calling it
> > > >>>>>> `table.dml-sync` but has no
> > > >>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also
> cause
> > a
> > > >> big
> > > >>>>>> confusion for
> > > >>>>>> users.
> > > >>>>>>
> > > >>>>>> The only concern I saw is if we introduce
> > > >>>>>> "TableEnvironment.executeMultiSql()" in the
> > > >>>>>> future, how do we control the synchronization between
> statements?
> > > TBH
> > > >> I
> > > >>>>>> don't really
> > > >>>>>> see a strong requirement for such interfaces. Right now, we
> have a
> > > >>>> pretty
> > > >>>>>> clear semantic
> > > >>>>>> of `TableEnv.executeSql`, and it's very convenient for users if
> > they
> > > >>>> want
> > > >>>>>> to execute multiple
> > > >>>>>> sql statements. They can simulate either synced or async
> execution
> > > >> with
> > > >>>>>> this building block.
> > > >>>>>>
> > > >>>>>> This will introduce slight overhead for users, but compared to
> the
> > > >>>>>> confusion we might
> > > >>>>>> cause if we introduce such a method of our own, I think it's
> > better
> > > to
> > > >>>> wait
> > > >>>>>> for some more
> > > >>>>>> feedback.
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Kurt
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <
> twalthr@apache.org>
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi Kurt,
> > > >>>>>>>
> > > >>>>>>> we can also shorten it to `table.dml-sync` if that would help.
> > Then
> > > >> it
> > > >>>>>>> would confuse users that do a regular `.executeSql("INSERT
> > INTO")`
> > > >> in a
> > > >>>>>>> notebook session.
> > > >>>>>>>
> > > >>>>>>> In any case users will need to learn the semantics of this
> > option.
> > > >>>>>>> `table.multi-dml-sync` should be described as "If a you are in
> a
> > > >> multi
> > > >>>>>>> statement environment, execute DMLs synchrounous.". I don't
> have
> > a
> > > >>>>>>> strong opinion on shortening it to `table.dml-sync`.
> > > >>>>>>>
> > > >>>>>>> Just to clarify the implementation: The option should be
> handled
> > by
> > > >> the
> > > >>>>>>> SQL Client only, but the name can be shared accross platforms.
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>> Timo
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On 23.02.21 09:54, Kurt Young wrote:
> > > >>>>>>>> Sorry for the late reply, but I'm confused by
> > > >> `table.multi-dml-sync`.
> > > >>>>>>>>
> > > >>>>>>>> IIUC this config will take effect with 2 use cases:
> > > >>>>>>>> 1. SQL client, either interactive mode or executing multiple
> > > >>>> statements
> > > >>>>>>> via
> > > >>>>>>>> -f. In most cases,
> > > >>>>>>>> there will be only one INSERT INTO statement but we are
> > > controlling
> > > >>>> the
> > > >>>>>>>> sync/async behavior
> > > >>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of
> > users.
> > > >>>>>>> Besides,
> > > >>>>>>>>
> > > >>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future
> work,
> > we
> > > >> are
> > > >>>>>>> also
> > > >>>>>>>> not sure if we will
> > > >>>>>>>> really introduce this in the future.
> > > >>>>>>>>
> > > >>>>>>>> I would prefer to introduce this option for only sql client.
> For
> > > >>>>>>> platforms
> > > >>>>>>>> Timo mentioned which
> > > >>>>>>>> need to control such behavior, I think it's easy and flexible
> to
> > > >>>>>>> introduce
> > > >>>>>>>> one on their own.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Kurt
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <
> > fskmine@gmail.com
> > > >
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi everyone.
> > > >>>>>>>>>
> > > >>>>>>>>> Sorry for the late response.
> > > >>>>>>>>>
> > > >>>>>>>>> For `execution.runtime-mode`, I think it's much better than
> > > >>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
> > > >>>>>>>>>
> > > >>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We
> should
> > > >>>>>> clarify
> > > >>>>>>> the
> > > >>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be
> allowed
> > > to
> > > >>>>>>> specify
> > > >>>>>>>>> the table that is fully qualified and only works for the
> table
> > > that
> > > >>>> is
> > > >>>>>>>>> created by the sql statements.
> > > >>>>>>>>>
> > > >>>>>>>>> I have updated the FLIP with suggestions. It seems we have
> > > reached
> > > >> a
> > > >>>>>>>>> consensus, I'd like to start a formal vote for the FLIP.
> > > >>>>>>>>>
> > > >>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>> Shengkai
> > > >>>>>>>>>
> > > >>>>>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi Ingo,
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1) I think you are right, the table path should be
> > > >> fully-qualified.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
> > > >>>>>>>>>> only aims to print DDL for the tables registered using SQL
> > > CREATE
> > > >>>>>> TABLE
> > > >>>>>>>>>> DDL.
> > > >>>>>>>>>> If a table is registered using Table API,  e.g.
> > > >>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String,
> > > DataStream)`,
> > > >>>>>>>>>> currently it's not possible to print DDL for such tables.
> > > >>>>>>>>>> I think we should point it out in the FLIP.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Jark
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <ingo@ververica.com
> >
> > > >> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE
> > > statement.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned
> > DDL
> > > >>>>>> should
> > > >>>>>>>>>>> always have the table identifier fully-qualified. Otherwise
> > the
> > > >> DDL
> > > >>>>>>>>>> depends
> > > >>>>>>>>>>> on the current context (catalog/database), which could be
> > > >>>>>> surprising,
> > > >>>>>>>>>>> especially since "the same" table can behave differently if
> > > >> created
> > > >>>>>> in
> > > >>>>>>>>>>> different catalogs.
> > > >>>>>>>>>>> 2) How should this handle tables which cannot be fully
> > > >>>> characterized
> > > >>>>>>> by
> > > >>>>>>>>>>> properties only? I don't know if there's an example for
> this
> > > yet,
> > > >>>>>> but
> > > >>>>>>>>>>> hypothetically this is not currently a requirement, right?
> > This
> > > >>>>>> isn't
> > > >>>>>>>>> as
> > > >>>>>>>>>>> much of a problem if this syntax is SQL-client-specific,
> but
> > if
> > > >>>> it's
> > > >>>>>>>>>>> general Flink SQL syntax we should consider this (one way
> or
> > > >>>>>> another).
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regards
> > > >>>>>>>>>>> Ingo
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <
> > > twalthr@apache.org
> > > >>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Shengkai,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> thanks for updating the FLIP.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I have one last comment for the option
> > `table.execution.mode`.
> > > >>>>>> Should
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>> already use the global Flink option
> `execution.runtime-mode`
> > > >>>>>> instead?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> We are using Flink's options where possible (e.g. `
> > > >> pipeline.name`
> > > >>>>>>>>> and
> > > >>>>>>>>>>>> `parallism.default`) why not also for batch/streaming
> mode?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> The description of the option matches to the Blink planner
> > > >>>>>> behavior:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> ```
> > > >>>>>>>>>>>> Among other things, this controls task scheduling, network
> > > >> shuffle
> > > >>>>>>>>>>>> behavior, and time semantics.
> > > >>>>>>>>>>>> ```
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>> Timo
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
> > > >>>>>>>>>>>>> Hi, guys.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I have updated the FLIP.  It seems we have reached
> > agreement.
> > > >>>>>> Maybe
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>> can
> > > >>>>>>>>>>>>> start the vote soon. If anyone has other questions,
> please
> > > >> leave
> > > >>>>>>>>> your
> > > >>>>>>>>>>>>> comments.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Best，
> > > >>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi guys,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> The conclusion sounds good to me.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <
> > > >> fskmine@gmail.com
> > > >>>>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hi, Timo, Jark.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> I am fine with the new option name.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日
> 周二下午5:35写道：
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be
> future
> > > >> work.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this
> conclusion?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>> Timo
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
> > > >>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI
> > > looks
> > > >>>>>> like
> > > >>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>> statement.
> > > >>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting
> > statements
> > > >>>> from
> > > >>>>>>>>>>>>>> opening
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> closing.
> > > >>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
> > > >>>>>> default),
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>> support this config
> > > >>>>>>>>>>>>>>>>> in SQL CLI first, will support it in
> > > >>>>>>>>>>>>>> TableEnvironment#executeMultiSql()
> > > >>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>> the future, right?
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>> Jark
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
> > > >>>> twalthr@apache.org
> > > >>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hi everyone,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should
> > not
> > > >>>>>> apply
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes
> only
> > > >> sense
> > > >>>>>>>>>> when
> > > >>>>>>>>>>>>>>>>>> executing multi statements. Once we have a
> > > >>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config
> could
> > > be
> > > >>>>>>>>>>>>>> considered.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other
> > platforms
> > > >>>> will
> > > >>>>>>>>>> also
> > > >>>>>>>>>>>>>>> need
> > > >>>>>>>>>>>>>>>>>> to have this config option, which is why I would
> like
> > to
> > > >>>>>>>>> avoid a
> > > >>>>>>>>>>> SQL
> > > >>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has
> > to
> > > >> come
> > > >>>>>>>>> up
> > > >>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>> this important config option separately.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync`
> `table.multi-stmt-sync`?
> > Or
> > > >>>>>> other
> > > >>>>>>>>>>>>>>> opinions?
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>>>>>>> Timo
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> > > >>>>>>>>>>>>>>>>>>> Hi, all.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main
> problem
> > is
> > > >> we
> > > >>>>>>>>>> have
> > > >>>>>>>>>>> no
> > > >>>>>>>>>>>>>>>> means
> > > >>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users
> set
> > > the
> > > >>>>>>>>> option
> > > >>>>>>>>>>>>>> true
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> use `TableResult#await` together.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>> Shengkai.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>> Best regards!
> > > >>>>>>>>>>>>>> Rui Li
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi, everyone.

After the long discussion, I am fine with both choices. But I prefer the
second option that applies to both table modules and sql client. Just as
Timo said, the option `table.dml-sync` can improve the SQL script
portability. Users don't need to modify the script and execute the script
in different platforms e.g gateway.

What do you think? CC Timo, Jark, Leonard.

Best,
Shengkai.

Kurt Young <yk...@gmail.com> 于2021年3月1日周一 下午5:11写道：

> I'm +1 for either:
> 1. introduce a sql client specific option, or
> 2. Introduce a table config option and make it apply to both table module &
> sql client.
>
> It would be the FLIP owner's call to decide.
>
> Best,
> Kurt
>
>
> On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <tw...@apache.org> wrote:
>
> > We could also think about reading this config option in Table API. The
> > effect would be to call `await()` directly in an execute call. I could
> > also imagine this to be useful esp. when you fire a lot of insert into
> > queries. We had the case before that users where confused that the
> > execution happens asynchronously, such an option could prevent this to
> > happen again.
> >
> > Regards,
> > Timo
> >
> > On 01.03.21 05:14, Kurt Young wrote:
> > > I also asked some users about their opinion that if we introduce some
> > > config prefixed with "table" but doesn't
> > > have affection with methods in Table API and SQL. All of them are kind
> of
> > > shocked by such question, asking
> > > why we would do anything like this.
> > >
> > > This kind of reaction actually doesn't surprise me a lot, so I jumped
> in
> > > and challenged this config option even
> > > after the FLIP had already been accepted.
> > >
> > > If we only have to define the execution behavior for multiple
> statements
> > in
> > > SQL client, we should only introduce
> > > a config option which would tell users it's affection scope by its
> name.
> > > Prefixing with "table" is definitely not a good
> > > idea here.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <xb...@gmail.com> wrote:
> > >
> > >> Hi, all
> > >>
> > >> Look like there’s only one divergence about option [ table |
> sql-client
> > >> ].dml-sync in this thread, correct me if I’m wrong.
> > >>
> > >> 1. Leaving the context of this thread, from a user's perspective,
> > >> the table.xx configurations should take effect in Table API & SQL,
> > >> the sql-client.xx configurations should only take effect in
> sql-client.
> > >>   In my(the user's) opinion, other explanations are counterintuitive.
> > >>
> > >> 2.  It should be pointed out that both all existed table.xx
> > configurations
> > >> like table.exec.state.ttl, table.optimizer.agg-phase-strategy,
> > >> table.local-time-zone,etc..  and the proposed sql-client.xx
> > configurations
> > >> like sql-client.verbose, sql-client.execution.max-table-result.rows
> > >> comply with this convention.
> > >>
> > >> 3. Considering the portability to support different CLI tools
> > (sql-client,
> > >> sql-gateway, etc.), I prefer table.dml-sync.
> > >>
> > >> In addition, I think sql-client/sql-gateway/other CLI tools can be
> > placed
> > >> out of flink-table module even in an external project, this should not
> > >> affect our conclusion.
> > >>
> > >>
> > >> Hope this can help you.
> > >>
> > >>
> > >> Best,
> > >> Leonard
> > >>
> > >>
> > >>
> > >>> 在 2021年2月25日，18:51，Shengkai Fang <fs...@gmail.com> 写道：
> > >>>
> > >>> Hi, everyone.
> > >>>
> > >>> I do some summaries about the discussion about the option. If the
> > summary
> > >>> has errors, please correct me.
> > >>>
> > >>> `table.dml-sync`:
> > >>> - take effect for `executeMultiSql` and sql client
> > >>> - benefit: SQL script portability. One script for all platforms.
> > >>> - drawback: Don't work for `TableEnvironment#executeSql`.
> > >>>
> > >>> `table.multi-dml-sync`:
> > >>> - take effect for `executeMultiSql` and sql client
> > >>> - benefit: SQL script portability
> > >>> - drawback: It's confused when the sql script has one dml statement
> but
> > >>> need to set option `table.multi-dml-sync`
> > >>>
> > >>> `client.dml-sync`:
> > >>> - take effect for sql client only
> > >>> - benefit: clear definition.
> > >>> - drawback: Every platform needs to define its own option. Bad SQL
> > script
> > >>> portability.
> > >>>
> > >>> Just as Jark said, I think the `table.dml-sync` is a good choice if
> we
> > >> can
> > >>> extend its scope and make this option works for `executeSql`.
> > >>> It's straightforward and users can use this option now in table api.
> > The
> > >>> drawback is the  `TableResult#await` plays the same role as the
> option.
> > >> I
> > >>> don't think the drawback is really critical because many systems have
> > >>> commands play the same role with the different names.
> > >>>
> > >>> Best,
> > >>> Shengkai
> > >>>
> > >>> Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：
> > >>>
> > >>>> The `table.` prefix is meant to be a general option in the table
> > >>>> ecosystem. Not necessarily attached to Table API or SQL Client.
> That's
> > >>>> why SQL Client is also located in the `flink-table` module.
> > >>>>
> > >>>> My main concern is the SQL script portability. Declaring the
> > sync/async
> > >>>> behavior will happen in many SQL scripts. And users should be easily
> > >>>> switch from SQL Client to some commercial product without the need
> of
> > >>>> changing the script again.
> > >>>>
> > >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync`
> > later
> > >>>> but that would mean introducing future confusion. An app name (what
> > >>>> `sql-client` kind of is) should not be part of a config option key
> if
> > >>>> other apps will need the same kind of option.
> > >>>>
> > >>>> Regards,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>> On 24.02.21 08:59, Jark Wu wrote:
> > >>>>>>  From my point of view, I also prefer "sql-client.dml-sync",
> > >>>>> because the behavior of this configuration is very clear.
> > >>>>> Even if we introduce a new config in the future, e.g.
> > `table.dml-sync`,
> > >>>>> we can also deprecate the sql-client one.
> > >>>>>
> > >>>>> Introducing a "table."  configuration without any implementation
> > >>>>> will confuse users a lot, as they expect it should take effect on
> > >>>>> the Table API.
> > >>>>>
> > >>>>> If we want to introduce an unified "table.dml-sync" option, I
> prefer
> > >>>>> it should be implemented on Table API and affect all the DMLs on
> > >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`,
> `StatementSet`),
> > >>>>> as I have mentioned before [1].
> > >>>>>
> > >>>>>> It would be very straightforward that it affects all the DMLs on
> SQL
> > >> CLI
> > >>>>> and
> > >>>>> TableEnvironment (including `executeSql`, `StatementSet`,
> > >>>>> `Table#executeInsert`, etc.).
> > >>>>> This can also make SQL CLI easy to support this configuration by
> > >> passing
> > >>>>> through to the TableEnv.
> > >>>>>
> > >>>>> Best,
> > >>>>> Jark
> > >>>>>
> > >>>>>
> > >>>>> [1]:
> > >>>>>
> > >>>>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
> > >>>>>
> > >>>>>
> > >>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:
> > >>>>>
> > >>>>>> If we all agree the option should only be handled by sql client,
> > then
> > >>>> why
> > >>>>>> don't we
> > >>>>>> just call it `sql-client.dml-sync`? As you said, calling it
> > >>>>>> `table.dml-sync` but has no
> > >>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause
> a
> > >> big
> > >>>>>> confusion for
> > >>>>>> users.
> > >>>>>>
> > >>>>>> The only concern I saw is if we introduce
> > >>>>>> "TableEnvironment.executeMultiSql()" in the
> > >>>>>> future, how do we control the synchronization between statements?
> > TBH
> > >> I
> > >>>>>> don't really
> > >>>>>> see a strong requirement for such interfaces. Right now, we have a
> > >>>> pretty
> > >>>>>> clear semantic
> > >>>>>> of `TableEnv.executeSql`, and it's very convenient for users if
> they
> > >>>> want
> > >>>>>> to execute multiple
> > >>>>>> sql statements. They can simulate either synced or async execution
> > >> with
> > >>>>>> this building block.
> > >>>>>>
> > >>>>>> This will introduce slight overhead for users, but compared to the
> > >>>>>> confusion we might
> > >>>>>> cause if we introduce such a method of our own, I think it's
> better
> > to
> > >>>> wait
> > >>>>>> for some more
> > >>>>>> feedback.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Kurt
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> Hi Kurt,
> > >>>>>>>
> > >>>>>>> we can also shorten it to `table.dml-sync` if that would help.
> Then
> > >> it
> > >>>>>>> would confuse users that do a regular `.executeSql("INSERT
> INTO")`
> > >> in a
> > >>>>>>> notebook session.
> > >>>>>>>
> > >>>>>>> In any case users will need to learn the semantics of this
> option.
> > >>>>>>> `table.multi-dml-sync` should be described as "If a you are in a
> > >> multi
> > >>>>>>> statement environment, execute DMLs synchrounous.". I don't have
> a
> > >>>>>>> strong opinion on shortening it to `table.dml-sync`.
> > >>>>>>>
> > >>>>>>> Just to clarify the implementation: The option should be handled
> by
> > >> the
> > >>>>>>> SQL Client only, but the name can be shared accross platforms.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Timo
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 23.02.21 09:54, Kurt Young wrote:
> > >>>>>>>> Sorry for the late reply, but I'm confused by
> > >> `table.multi-dml-sync`.
> > >>>>>>>>
> > >>>>>>>> IIUC this config will take effect with 2 use cases:
> > >>>>>>>> 1. SQL client, either interactive mode or executing multiple
> > >>>> statements
> > >>>>>>> via
> > >>>>>>>> -f. In most cases,
> > >>>>>>>> there will be only one INSERT INTO statement but we are
> > controlling
> > >>>> the
> > >>>>>>>> sync/async behavior
> > >>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of
> users.
> > >>>>>>> Besides,
> > >>>>>>>>
> > >>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work,
> we
> > >> are
> > >>>>>>> also
> > >>>>>>>> not sure if we will
> > >>>>>>>> really introduce this in the future.
> > >>>>>>>>
> > >>>>>>>> I would prefer to introduce this option for only sql client. For
> > >>>>>>> platforms
> > >>>>>>>> Timo mentioned which
> > >>>>>>>> need to control such behavior, I think it's easy and flexible to
> > >>>>>>> introduce
> > >>>>>>>> one on their own.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Kurt
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <
> fskmine@gmail.com
> > >
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi everyone.
> > >>>>>>>>>
> > >>>>>>>>> Sorry for the late response.
> > >>>>>>>>>
> > >>>>>>>>> For `execution.runtime-mode`, I think it's much better than
> > >>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
> > >>>>>>>>>
> > >>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
> > >>>>>> clarify
> > >>>>>>> the
> > >>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed
> > to
> > >>>>>>> specify
> > >>>>>>>>> the table that is fully qualified and only works for the table
> > that
> > >>>> is
> > >>>>>>>>> created by the sql statements.
> > >>>>>>>>>
> > >>>>>>>>> I have updated the FLIP with suggestions. It seems we have
> > reached
> > >> a
> > >>>>>>>>> consensus, I'd like to start a formal vote for the FLIP.
> > >>>>>>>>>
> > >>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Shengkai
> > >>>>>>>>>
> > >>>>>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
> > >>>>>>>>>
> > >>>>>>>>>> Hi Ingo,
> > >>>>>>>>>>
> > >>>>>>>>>> 1) I think you are right, the table path should be
> > >> fully-qualified.
> > >>>>>>>>>>
> > >>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
> > >>>>>>>>>> only aims to print DDL for the tables registered using SQL
> > CREATE
> > >>>>>> TABLE
> > >>>>>>>>>> DDL.
> > >>>>>>>>>> If a table is registered using Table API,  e.g.
> > >>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String,
> > DataStream)`,
> > >>>>>>>>>> currently it's not possible to print DDL for such tables.
> > >>>>>>>>>> I think we should point it out in the FLIP.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Jark
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com>
> > >> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi all,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE
> > statement.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned
> DDL
> > >>>>>> should
> > >>>>>>>>>>> always have the table identifier fully-qualified. Otherwise
> the
> > >> DDL
> > >>>>>>>>>> depends
> > >>>>>>>>>>> on the current context (catalog/database), which could be
> > >>>>>> surprising,
> > >>>>>>>>>>> especially since "the same" table can behave differently if
> > >> created
> > >>>>>> in
> > >>>>>>>>>>> different catalogs.
> > >>>>>>>>>>> 2) How should this handle tables which cannot be fully
> > >>>> characterized
> > >>>>>>> by
> > >>>>>>>>>>> properties only? I don't know if there's an example for this
> > yet,
> > >>>>>> but
> > >>>>>>>>>>> hypothetically this is not currently a requirement, right?
> This
> > >>>>>> isn't
> > >>>>>>>>> as
> > >>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, but
> if
> > >>>> it's
> > >>>>>>>>>>> general Flink SQL syntax we should consider this (one way or
> > >>>>>> another).
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards
> > >>>>>>>>>>> Ingo
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <
> > twalthr@apache.org
> > >>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Shengkai,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> thanks for updating the FLIP.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I have one last comment for the option
> `table.execution.mode`.
> > >>>>>> Should
> > >>>>>>>>>> we
> > >>>>>>>>>>>> already use the global Flink option `execution.runtime-mode`
> > >>>>>> instead?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> We are using Flink's options where possible (e.g. `
> > >> pipeline.name`
> > >>>>>>>>> and
> > >>>>>>>>>>>> `parallism.default`) why not also for batch/streaming mode?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The description of the option matches to the Blink planner
> > >>>>>> behavior:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> ```
> > >>>>>>>>>>>> Among other things, this controls task scheduling, network
> > >> shuffle
> > >>>>>>>>>>>> behavior, and time semantics.
> > >>>>>>>>>>>> ```
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>> Timo
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
> > >>>>>>>>>>>>> Hi, guys.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I have updated the FLIP.  It seems we have reached
> agreement.
> > >>>>>> Maybe
> > >>>>>>>>>> we
> > >>>>>>>>>>>> can
> > >>>>>>>>>>>>> start the vote soon. If anyone has other questions, please
> > >> leave
> > >>>>>>>>> your
> > >>>>>>>>>>>>> comments.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best，
> > >>>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi guys,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The conclusion sounds good to me.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <
> > >> fskmine@gmail.com
> > >>>>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi, Timo, Jark.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I am fine with the new option name.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future
> > >> work.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>> Timo
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
> > >>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI
> > looks
> > >>>>>> like
> > >>>>>>>>>>>>>> single
> > >>>>>>>>>>>>>>>>> statement.
> > >>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting
> statements
> > >>>> from
> > >>>>>>>>>>>>>> opening
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> closing.
> > >>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
> > >>>>>> default),
> > >>>>>>>>>> and
> > >>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>> support this config
> > >>>>>>>>>>>>>>>>> in SQL CLI first, will support it in
> > >>>>>>>>>>>>>> TableEnvironment#executeMultiSql()
> > >>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>> the future, right?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>> Jark
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
> > >>>> twalthr@apache.org
> > >>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should
> not
> > >>>>>> apply
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only
> > >> sense
> > >>>>>>>>>> when
> > >>>>>>>>>>>>>>>>>> executing multi statements. Once we have a
> > >>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could
> > be
> > >>>>>>>>>>>>>> considered.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other
> platforms
> > >>>> will
> > >>>>>>>>>> also
> > >>>>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>>>>> to have this config option, which is why I would like
> to
> > >>>>>>>>> avoid a
> > >>>>>>>>>>> SQL
> > >>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has
> to
> > >> come
> > >>>>>>>>> up
> > >>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>> this important config option separately.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`?
> Or
> > >>>>>> other
> > >>>>>>>>>>>>>>> opinions?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>> Timo
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> > >>>>>>>>>>>>>>>>>>> Hi, all.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem
> is
> > >> we
> > >>>>>>>>>> have
> > >>>>>>>>>>> no
> > >>>>>>>>>>>>>>>> means
> > >>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set
> > the
> > >>>>>>>>> option
> > >>>>>>>>>>>>>> true
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>> use `TableResult#await` together.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>> Shengkai.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>> Best regards!
> > >>>>>>>>>>>>>> Rui Li
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> > >
> >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Kurt Young <yk...@gmail.com>.

I'm +1 for either:
1. introduce a sql client specific option, or
2. Introduce a table config option and make it apply to both table module &
sql client.

It would be the FLIP owner's call to decide.

Best,
Kurt


On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <tw...@apache.org> wrote:

> We could also think about reading this config option in Table API. The
> effect would be to call `await()` directly in an execute call. I could
> also imagine this to be useful esp. when you fire a lot of insert into
> queries. We had the case before that users where confused that the
> execution happens asynchronously, such an option could prevent this to
> happen again.
>
> Regards,
> Timo
>
> On 01.03.21 05:14, Kurt Young wrote:
> > I also asked some users about their opinion that if we introduce some
> > config prefixed with "table" but doesn't
> > have affection with methods in Table API and SQL. All of them are kind of
> > shocked by such question, asking
> > why we would do anything like this.
> >
> > This kind of reaction actually doesn't surprise me a lot, so I jumped in
> > and challenged this config option even
> > after the FLIP had already been accepted.
> >
> > If we only have to define the execution behavior for multiple statements
> in
> > SQL client, we should only introduce
> > a config option which would tell users it's affection scope by its name.
> > Prefixing with "table" is definitely not a good
> > idea here.
> >
> > Best,
> > Kurt
> >
> >
> > On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <xb...@gmail.com> wrote:
> >
> >> Hi, all
> >>
> >> Look like there’s only one divergence about option [ table | sql-client
> >> ].dml-sync in this thread, correct me if I’m wrong.
> >>
> >> 1. Leaving the context of this thread, from a user's perspective,
> >> the table.xx configurations should take effect in Table API & SQL,
> >> the sql-client.xx configurations should only take effect in sql-client.
> >>   In my(the user's) opinion, other explanations are counterintuitive.
> >>
> >> 2.  It should be pointed out that both all existed table.xx
> configurations
> >> like table.exec.state.ttl, table.optimizer.agg-phase-strategy,
> >> table.local-time-zone,etc..  and the proposed sql-client.xx
> configurations
> >> like sql-client.verbose, sql-client.execution.max-table-result.rows
> >> comply with this convention.
> >>
> >> 3. Considering the portability to support different CLI tools
> (sql-client,
> >> sql-gateway, etc.), I prefer table.dml-sync.
> >>
> >> In addition, I think sql-client/sql-gateway/other CLI tools can be
> placed
> >> out of flink-table module even in an external project, this should not
> >> affect our conclusion.
> >>
> >>
> >> Hope this can help you.
> >>
> >>
> >> Best,
> >> Leonard
> >>
> >>
> >>
> >>> 在 2021年2月25日，18:51，Shengkai Fang <fs...@gmail.com> 写道：
> >>>
> >>> Hi, everyone.
> >>>
> >>> I do some summaries about the discussion about the option. If the
> summary
> >>> has errors, please correct me.
> >>>
> >>> `table.dml-sync`:
> >>> - take effect for `executeMultiSql` and sql client
> >>> - benefit: SQL script portability. One script for all platforms.
> >>> - drawback: Don't work for `TableEnvironment#executeSql`.
> >>>
> >>> `table.multi-dml-sync`:
> >>> - take effect for `executeMultiSql` and sql client
> >>> - benefit: SQL script portability
> >>> - drawback: It's confused when the sql script has one dml statement but
> >>> need to set option `table.multi-dml-sync`
> >>>
> >>> `client.dml-sync`:
> >>> - take effect for sql client only
> >>> - benefit: clear definition.
> >>> - drawback: Every platform needs to define its own option. Bad SQL
> script
> >>> portability.
> >>>
> >>> Just as Jark said, I think the `table.dml-sync` is a good choice if we
> >> can
> >>> extend its scope and make this option works for `executeSql`.
> >>> It's straightforward and users can use this option now in table api.
> The
> >>> drawback is the  `TableResult#await` plays the same role as the option.
> >> I
> >>> don't think the drawback is really critical because many systems have
> >>> commands play the same role with the different names.
> >>>
> >>> Best,
> >>> Shengkai
> >>>
> >>> Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：
> >>>
> >>>> The `table.` prefix is meant to be a general option in the table
> >>>> ecosystem. Not necessarily attached to Table API or SQL Client. That's
> >>>> why SQL Client is also located in the `flink-table` module.
> >>>>
> >>>> My main concern is the SQL script portability. Declaring the
> sync/async
> >>>> behavior will happen in many SQL scripts. And users should be easily
> >>>> switch from SQL Client to some commercial product without the need of
> >>>> changing the script again.
> >>>>
> >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync`
> later
> >>>> but that would mean introducing future confusion. An app name (what
> >>>> `sql-client` kind of is) should not be part of a config option key if
> >>>> other apps will need the same kind of option.
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 24.02.21 08:59, Jark Wu wrote:
> >>>>>>  From my point of view, I also prefer "sql-client.dml-sync",
> >>>>> because the behavior of this configuration is very clear.
> >>>>> Even if we introduce a new config in the future, e.g.
> `table.dml-sync`,
> >>>>> we can also deprecate the sql-client one.
> >>>>>
> >>>>> Introducing a "table."  configuration without any implementation
> >>>>> will confuse users a lot, as they expect it should take effect on
> >>>>> the Table API.
> >>>>>
> >>>>> If we want to introduce an unified "table.dml-sync" option, I prefer
> >>>>> it should be implemented on Table API and affect all the DMLs on
> >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
> >>>>> as I have mentioned before [1].
> >>>>>
> >>>>>> It would be very straightforward that it affects all the DMLs on SQL
> >> CLI
> >>>>> and
> >>>>> TableEnvironment (including `executeSql`, `StatementSet`,
> >>>>> `Table#executeInsert`, etc.).
> >>>>> This can also make SQL CLI easy to support this configuration by
> >> passing
> >>>>> through to the TableEnv.
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>>
> >>>>> [1]:
> >>>>>
> >>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
> >>>>>
> >>>>>
> >>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:
> >>>>>
> >>>>>> If we all agree the option should only be handled by sql client,
> then
> >>>> why
> >>>>>> don't we
> >>>>>> just call it `sql-client.dml-sync`? As you said, calling it
> >>>>>> `table.dml-sync` but has no
> >>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a
> >> big
> >>>>>> confusion for
> >>>>>> users.
> >>>>>>
> >>>>>> The only concern I saw is if we introduce
> >>>>>> "TableEnvironment.executeMultiSql()" in the
> >>>>>> future, how do we control the synchronization between statements?
> TBH
> >> I
> >>>>>> don't really
> >>>>>> see a strong requirement for such interfaces. Right now, we have a
> >>>> pretty
> >>>>>> clear semantic
> >>>>>> of `TableEnv.executeSql`, and it's very convenient for users if they
> >>>> want
> >>>>>> to execute multiple
> >>>>>> sql statements. They can simulate either synced or async execution
> >> with
> >>>>>> this building block.
> >>>>>>
> >>>>>> This will introduce slight overhead for users, but compared to the
> >>>>>> confusion we might
> >>>>>> cause if we introduce such a method of our own, I think it's better
> to
> >>>> wait
> >>>>>> for some more
> >>>>>> feedback.
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi Kurt,
> >>>>>>>
> >>>>>>> we can also shorten it to `table.dml-sync` if that would help. Then
> >> it
> >>>>>>> would confuse users that do a regular `.executeSql("INSERT INTO")`
> >> in a
> >>>>>>> notebook session.
> >>>>>>>
> >>>>>>> In any case users will need to learn the semantics of this option.
> >>>>>>> `table.multi-dml-sync` should be described as "If a you are in a
> >> multi
> >>>>>>> statement environment, execute DMLs synchrounous.". I don't have a
> >>>>>>> strong opinion on shortening it to `table.dml-sync`.
> >>>>>>>
> >>>>>>> Just to clarify the implementation: The option should be handled by
> >> the
> >>>>>>> SQL Client only, but the name can be shared accross platforms.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>>
> >>>>>>> On 23.02.21 09:54, Kurt Young wrote:
> >>>>>>>> Sorry for the late reply, but I'm confused by
> >> `table.multi-dml-sync`.
> >>>>>>>>
> >>>>>>>> IIUC this config will take effect with 2 use cases:
> >>>>>>>> 1. SQL client, either interactive mode or executing multiple
> >>>> statements
> >>>>>>> via
> >>>>>>>> -f. In most cases,
> >>>>>>>> there will be only one INSERT INTO statement but we are
> controlling
> >>>> the
> >>>>>>>> sync/async behavior
> >>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users.
> >>>>>>> Besides,
> >>>>>>>>
> >>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we
> >> are
> >>>>>>> also
> >>>>>>>> not sure if we will
> >>>>>>>> really introduce this in the future.
> >>>>>>>>
> >>>>>>>> I would prefer to introduce this option for only sql client. For
> >>>>>>> platforms
> >>>>>>>> Timo mentioned which
> >>>>>>>> need to control such behavior, I think it's easy and flexible to
> >>>>>>> introduce
> >>>>>>>> one on their own.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Kurt
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fskmine@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi everyone.
> >>>>>>>>>
> >>>>>>>>> Sorry for the late response.
> >>>>>>>>>
> >>>>>>>>> For `execution.runtime-mode`, I think it's much better than
> >>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
> >>>>>>>>>
> >>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
> >>>>>> clarify
> >>>>>>> the
> >>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed
> to
> >>>>>>> specify
> >>>>>>>>> the table that is fully qualified and only works for the table
> that
> >>>> is
> >>>>>>>>> created by the sql statements.
> >>>>>>>>>
> >>>>>>>>> I have updated the FLIP with suggestions. It seems we have
> reached
> >> a
> >>>>>>>>> consensus, I'd like to start a formal vote for the FLIP.
> >>>>>>>>>
> >>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Shengkai
> >>>>>>>>>
> >>>>>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
> >>>>>>>>>
> >>>>>>>>>> Hi Ingo,
> >>>>>>>>>>
> >>>>>>>>>> 1) I think you are right, the table path should be
> >> fully-qualified.
> >>>>>>>>>>
> >>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
> >>>>>>>>>> only aims to print DDL for the tables registered using SQL
> CREATE
> >>>>>> TABLE
> >>>>>>>>>> DDL.
> >>>>>>>>>> If a table is registered using Table API,  e.g.
> >>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String,
> DataStream)`,
> >>>>>>>>>> currently it's not possible to print DDL for such tables.
> >>>>>>>>>> I think we should point it out in the FLIP.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jark
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com>
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE
> statement.
> >>>>>>>>>>>
> >>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL
> >>>>>> should
> >>>>>>>>>>> always have the table identifier fully-qualified. Otherwise the
> >> DDL
> >>>>>>>>>> depends
> >>>>>>>>>>> on the current context (catalog/database), which could be
> >>>>>> surprising,
> >>>>>>>>>>> especially since "the same" table can behave differently if
> >> created
> >>>>>> in
> >>>>>>>>>>> different catalogs.
> >>>>>>>>>>> 2) How should this handle tables which cannot be fully
> >>>> characterized
> >>>>>>> by
> >>>>>>>>>>> properties only? I don't know if there's an example for this
> yet,
> >>>>>> but
> >>>>>>>>>>> hypothetically this is not currently a requirement, right? This
> >>>>>> isn't
> >>>>>>>>> as
> >>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if
> >>>> it's
> >>>>>>>>>>> general Flink SQL syntax we should consider this (one way or
> >>>>>> another).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Regards
> >>>>>>>>>>> Ingo
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <
> twalthr@apache.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>
> >>>>>>>>>>>> thanks for updating the FLIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have one last comment for the option `table.execution.mode`.
> >>>>>> Should
> >>>>>>>>>> we
> >>>>>>>>>>>> already use the global Flink option `execution.runtime-mode`
> >>>>>> instead?
> >>>>>>>>>>>>
> >>>>>>>>>>>> We are using Flink's options where possible (e.g. `
> >> pipeline.name`
> >>>>>>>>> and
> >>>>>>>>>>>> `parallism.default`) why not also for batch/streaming mode?
> >>>>>>>>>>>>
> >>>>>>>>>>>> The description of the option matches to the Blink planner
> >>>>>> behavior:
> >>>>>>>>>>>>
> >>>>>>>>>>>> ```
> >>>>>>>>>>>> Among other things, this controls task scheduling, network
> >> shuffle
> >>>>>>>>>>>> behavior, and time semantics.
> >>>>>>>>>>>> ```
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
> >>>>>>>>>>>>> Hi, guys.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have updated the FLIP.  It seems we have reached agreement.
> >>>>>> Maybe
> >>>>>>>>>> we
> >>>>>>>>>>>> can
> >>>>>>>>>>>>> start the vote soon. If anyone has other questions, please
> >> leave
> >>>>>>>>> your
> >>>>>>>>>>>>> comments.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best，
> >>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi guys,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The conclusion sounds good to me.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <
> >> fskmine@gmail.com
> >>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi, Timo, Jark.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am fine with the new option name.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future
> >> work.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
> >>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI
> looks
> >>>>>> like
> >>>>>>>>>>>>>> single
> >>>>>>>>>>>>>>>>> statement.
> >>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements
> >>>> from
> >>>>>>>>>>>>>> opening
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> closing.
> >>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
> >>>>>> default),
> >>>>>>>>>> and
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> support this config
> >>>>>>>>>>>>>>>>> in SQL CLI first, will support it in
> >>>>>>>>>>>>>> TableEnvironment#executeMultiSql()
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> the future, right?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
> >>>> twalthr@apache.org
> >>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not
> >>>>>> apply
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only
> >> sense
> >>>>>>>>>> when
> >>>>>>>>>>>>>>>>>> executing multi statements. Once we have a
> >>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could
> be
> >>>>>>>>>>>>>> considered.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms
> >>>> will
> >>>>>>>>>> also
> >>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>> to have this config option, which is why I would like to
> >>>>>>>>> avoid a
> >>>>>>>>>>> SQL
> >>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to
> >> come
> >>>>>>>>> up
> >>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>> this important config option separately.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or
> >>>>>> other
> >>>>>>>>>>>>>>> opinions?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> >>>>>>>>>>>>>>>>>>> Hi, all.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is
> >> we
> >>>>>>>>>> have
> >>>>>>>>>>> no
> >>>>>>>>>>>>>>>> means
> >>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set
> the
> >>>>>>>>> option
> >>>>>>>>>>>>>> true
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> use `TableResult#await` together.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> Shengkai.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

We could also think about reading this config option in Table API. The 
effect would be to call `await()` directly in an execute call. I could 
also imagine this to be useful esp. when you fire a lot of insert into 
queries. We had the case before that users where confused that the 
execution happens asynchronously, such an option could prevent this to 
happen again.

Regards,
Timo

On 01.03.21 05:14, Kurt Young wrote:
> I also asked some users about their opinion that if we introduce some
> config prefixed with "table" but doesn't
> have affection with methods in Table API and SQL. All of them are kind of
> shocked by such question, asking
> why we would do anything like this.
> 
> This kind of reaction actually doesn't surprise me a lot, so I jumped in
> and challenged this config option even
> after the FLIP had already been accepted.
> 
> If we only have to define the execution behavior for multiple statements in
> SQL client, we should only introduce
> a config option which would tell users it's affection scope by its name.
> Prefixing with "table" is definitely not a good
> idea here.
> 
> Best,
> Kurt
> 
> 
> On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <xb...@gmail.com> wrote:
> 
>> Hi, all
>>
>> Look like there’s only one divergence about option [ table | sql-client
>> ].dml-sync in this thread, correct me if I’m wrong.
>>
>> 1. Leaving the context of this thread, from a user's perspective,
>> the table.xx configurations should take effect in Table API & SQL,
>> the sql-client.xx configurations should only take effect in sql-client.
>>   In my(the user's) opinion, other explanations are counterintuitive.
>>
>> 2.  It should be pointed out that both all existed table.xx configurations
>> like table.exec.state.ttl, table.optimizer.agg-phase-strategy,
>> table.local-time-zone,etc..  and the proposed sql-client.xx configurations
>> like sql-client.verbose, sql-client.execution.max-table-result.rows
>> comply with this convention.
>>
>> 3. Considering the portability to support different CLI tools (sql-client,
>> sql-gateway, etc.), I prefer table.dml-sync.
>>
>> In addition, I think sql-client/sql-gateway/other CLI tools can be placed
>> out of flink-table module even in an external project, this should not
>> affect our conclusion.
>>
>>
>> Hope this can help you.
>>
>>
>> Best,
>> Leonard
>>
>>
>>
>>> 在 2021年2月25日，18:51，Shengkai Fang <fs...@gmail.com> 写道：
>>>
>>> Hi, everyone.
>>>
>>> I do some summaries about the discussion about the option. If the summary
>>> has errors, please correct me.
>>>
>>> `table.dml-sync`:
>>> - take effect for `executeMultiSql` and sql client
>>> - benefit: SQL script portability. One script for all platforms.
>>> - drawback: Don't work for `TableEnvironment#executeSql`.
>>>
>>> `table.multi-dml-sync`:
>>> - take effect for `executeMultiSql` and sql client
>>> - benefit: SQL script portability
>>> - drawback: It's confused when the sql script has one dml statement but
>>> need to set option `table.multi-dml-sync`
>>>
>>> `client.dml-sync`:
>>> - take effect for sql client only
>>> - benefit: clear definition.
>>> - drawback: Every platform needs to define its own option. Bad SQL script
>>> portability.
>>>
>>> Just as Jark said, I think the `table.dml-sync` is a good choice if we
>> can
>>> extend its scope and make this option works for `executeSql`.
>>> It's straightforward and users can use this option now in table api.  The
>>> drawback is the  `TableResult#await` plays the same role as the option.
>> I
>>> don't think the drawback is really critical because many systems have
>>> commands play the same role with the different names.
>>>
>>> Best,
>>> Shengkai
>>>
>>> Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：
>>>
>>>> The `table.` prefix is meant to be a general option in the table
>>>> ecosystem. Not necessarily attached to Table API or SQL Client. That's
>>>> why SQL Client is also located in the `flink-table` module.
>>>>
>>>> My main concern is the SQL script portability. Declaring the sync/async
>>>> behavior will happen in many SQL scripts. And users should be easily
>>>> switch from SQL Client to some commercial product without the need of
>>>> changing the script again.
>>>>
>>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later
>>>> but that would mean introducing future confusion. An app name (what
>>>> `sql-client` kind of is) should not be part of a config option key if
>>>> other apps will need the same kind of option.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 24.02.21 08:59, Jark Wu wrote:
>>>>>>  From my point of view, I also prefer "sql-client.dml-sync",
>>>>> because the behavior of this configuration is very clear.
>>>>> Even if we introduce a new config in the future, e.g. `table.dml-sync`,
>>>>> we can also deprecate the sql-client one.
>>>>>
>>>>> Introducing a "table."  configuration without any implementation
>>>>> will confuse users a lot, as they expect it should take effect on
>>>>> the Table API.
>>>>>
>>>>> If we want to introduce an unified "table.dml-sync" option, I prefer
>>>>> it should be implemented on Table API and affect all the DMLs on
>>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
>>>>> as I have mentioned before [1].
>>>>>
>>>>>> It would be very straightforward that it affects all the DMLs on SQL
>> CLI
>>>>> and
>>>>> TableEnvironment (including `executeSql`, `StatementSet`,
>>>>> `Table#executeInsert`, etc.).
>>>>> This can also make SQL CLI easy to support this configuration by
>> passing
>>>>> through to the TableEnv.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>> [1]:
>>>>>
>>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
>>>>>
>>>>>
>>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:
>>>>>
>>>>>> If we all agree the option should only be handled by sql client, then
>>>> why
>>>>>> don't we
>>>>>> just call it `sql-client.dml-sync`? As you said, calling it
>>>>>> `table.dml-sync` but has no
>>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a
>> big
>>>>>> confusion for
>>>>>> users.
>>>>>>
>>>>>> The only concern I saw is if we introduce
>>>>>> "TableEnvironment.executeMultiSql()" in the
>>>>>> future, how do we control the synchronization between statements? TBH
>> I
>>>>>> don't really
>>>>>> see a strong requirement for such interfaces. Right now, we have a
>>>> pretty
>>>>>> clear semantic
>>>>>> of `TableEnv.executeSql`, and it's very convenient for users if they
>>>> want
>>>>>> to execute multiple
>>>>>> sql statements. They can simulate either synced or async execution
>> with
>>>>>> this building block.
>>>>>>
>>>>>> This will introduce slight overhead for users, but compared to the
>>>>>> confusion we might
>>>>>> cause if we introduce such a method of our own, I think it's better to
>>>> wait
>>>>>> for some more
>>>>>> feedback.
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org>
>>>> wrote:
>>>>>>
>>>>>>> Hi Kurt,
>>>>>>>
>>>>>>> we can also shorten it to `table.dml-sync` if that would help. Then
>> it
>>>>>>> would confuse users that do a regular `.executeSql("INSERT INTO")`
>> in a
>>>>>>> notebook session.
>>>>>>>
>>>>>>> In any case users will need to learn the semantics of this option.
>>>>>>> `table.multi-dml-sync` should be described as "If a you are in a
>> multi
>>>>>>> statement environment, execute DMLs synchrounous.". I don't have a
>>>>>>> strong opinion on shortening it to `table.dml-sync`.
>>>>>>>
>>>>>>> Just to clarify the implementation: The option should be handled by
>> the
>>>>>>> SQL Client only, but the name can be shared accross platforms.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>>
>>>>>>>
>>>>>>> On 23.02.21 09:54, Kurt Young wrote:
>>>>>>>> Sorry for the late reply, but I'm confused by
>> `table.multi-dml-sync`.
>>>>>>>>
>>>>>>>> IIUC this config will take effect with 2 use cases:
>>>>>>>> 1. SQL client, either interactive mode or executing multiple
>>>> statements
>>>>>>> via
>>>>>>>> -f. In most cases,
>>>>>>>> there will be only one INSERT INTO statement but we are controlling
>>>> the
>>>>>>>> sync/async behavior
>>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users.
>>>>>>> Besides,
>>>>>>>>
>>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we
>> are
>>>>>>> also
>>>>>>>> not sure if we will
>>>>>>>> really introduce this in the future.
>>>>>>>>
>>>>>>>> I would prefer to introduce this option for only sql client. For
>>>>>>> platforms
>>>>>>>> Timo mentioned which
>>>>>>>> need to control such behavior, I think it's easy and flexible to
>>>>>>> introduce
>>>>>>>> one on their own.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kurt
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone.
>>>>>>>>>
>>>>>>>>> Sorry for the late response.
>>>>>>>>>
>>>>>>>>> For `execution.runtime-mode`, I think it's much better than
>>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
>>>>>>>>>
>>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
>>>>>> clarify
>>>>>>> the
>>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to
>>>>>>> specify
>>>>>>>>> the table that is fully qualified and only works for the table that
>>>> is
>>>>>>>>> created by the sql statements.
>>>>>>>>>
>>>>>>>>> I have updated the FLIP with suggestions. It seems we have reached
>> a
>>>>>>>>> consensus, I'd like to start a formal vote for the FLIP.
>>>>>>>>>
>>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Shengkai
>>>>>>>>>
>>>>>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
>>>>>>>>>
>>>>>>>>>> Hi Ingo,
>>>>>>>>>>
>>>>>>>>>> 1) I think you are right, the table path should be
>> fully-qualified.
>>>>>>>>>>
>>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
>>>>>>>>>> only aims to print DDL for the tables registered using SQL CREATE
>>>>>> TABLE
>>>>>>>>>> DDL.
>>>>>>>>>> If a table is registered using Table API,  e.g.
>>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
>>>>>>>>>> currently it's not possible to print DDL for such tables.
>>>>>>>>>> I think we should point it out in the FLIP.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com>
>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE statement.
>>>>>>>>>>>
>>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL
>>>>>> should
>>>>>>>>>>> always have the table identifier fully-qualified. Otherwise the
>> DDL
>>>>>>>>>> depends
>>>>>>>>>>> on the current context (catalog/database), which could be
>>>>>> surprising,
>>>>>>>>>>> especially since "the same" table can behave differently if
>> created
>>>>>> in
>>>>>>>>>>> different catalogs.
>>>>>>>>>>> 2) How should this handle tables which cannot be fully
>>>> characterized
>>>>>>> by
>>>>>>>>>>> properties only? I don't know if there's an example for this yet,
>>>>>> but
>>>>>>>>>>> hypothetically this is not currently a requirement, right? This
>>>>>> isn't
>>>>>>>>> as
>>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if
>>>> it's
>>>>>>>>>>> general Flink SQL syntax we should consider this (one way or
>>>>>> another).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Ingo
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <twalthr@apache.org
>>>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks for updating the FLIP.
>>>>>>>>>>>>
>>>>>>>>>>>> I have one last comment for the option `table.execution.mode`.
>>>>>> Should
>>>>>>>>>> we
>>>>>>>>>>>> already use the global Flink option `execution.runtime-mode`
>>>>>> instead?
>>>>>>>>>>>>
>>>>>>>>>>>> We are using Flink's options where possible (e.g. `
>> pipeline.name`
>>>>>>>>> and
>>>>>>>>>>>> `parallism.default`) why not also for batch/streaming mode?
>>>>>>>>>>>>
>>>>>>>>>>>> The description of the option matches to the Blink planner
>>>>>> behavior:
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> Among other things, this controls task scheduling, network
>> shuffle
>>>>>>>>>>>> behavior, and time semantics.
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
>>>>>>>>>>>>> Hi, guys.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have updated the FLIP.  It seems we have reached agreement.
>>>>>> Maybe
>>>>>>>>>> we
>>>>>>>>>>>> can
>>>>>>>>>>>>> start the vote soon. If anyone has other questions, please
>> leave
>>>>>>>>> your
>>>>>>>>>>>>> comments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best，
>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>
>>>>>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The conclusion sounds good to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <
>> fskmine@gmail.com
>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi, Timo, Jark.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am fine with the new option name.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future
>> work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
>>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks
>>>>>> like
>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>> statement.
>>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements
>>>> from
>>>>>>>>>>>>>> opening
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> closing.
>>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
>>>>>> default),
>>>>>>>>>> and
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> support this config
>>>>>>>>>>>>>>>>> in SQL CLI first, will support it in
>>>>>>>>>>>>>> TableEnvironment#executeMultiSql()
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> the future, right?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
>>>> twalthr@apache.org
>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not
>>>>>> apply
>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only
>> sense
>>>>>>>>>> when
>>>>>>>>>>>>>>>>>> executing multi statements. Once we have a
>>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
>>>>>>>>>>>>>> considered.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms
>>>> will
>>>>>>>>>> also
>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>> to have this config option, which is why I would like to
>>>>>>>>> avoid a
>>>>>>>>>>> SQL
>>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to
>> come
>>>>>>>>> up
>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> this important config option separately.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or
>>>>>> other
>>>>>>>>>>>>>>> opinions?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
>>>>>>>>>>>>>>>>>>> Hi, all.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is
>> we
>>>>>>>>>> have
>>>>>>>>>>> no
>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
>>>>>>>>> option
>>>>>>>>>>>>>> true
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> use `TableResult#await` together.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Shengkai.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Kurt Young <yk...@gmail.com>.

I also asked some users about their opinion that if we introduce some
config prefixed with "table" but doesn't
have affection with methods in Table API and SQL. All of them are kind of
shocked by such question, asking
why we would do anything like this.

This kind of reaction actually doesn't surprise me a lot, so I jumped in
and challenged this config option even
after the FLIP had already been accepted.

If we only have to define the execution behavior for multiple statements in
SQL client, we should only introduce
a config option which would tell users it's affection scope by its name.
Prefixing with "table" is definitely not a good
idea here.

Best,
Kurt


On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <xb...@gmail.com> wrote:

> Hi, all
>
> Look like there’s only one divergence about option [ table | sql-client
> ].dml-sync in this thread, correct me if I’m wrong.
>
> 1. Leaving the context of this thread, from a user's perspective,
> the table.xx configurations should take effect in Table API & SQL,
> the sql-client.xx configurations should only take effect in sql-client.
>  In my(the user's) opinion, other explanations are counterintuitive.
>
> 2.  It should be pointed out that both all existed table.xx configurations
> like table.exec.state.ttl, table.optimizer.agg-phase-strategy,
> table.local-time-zone,etc..  and the proposed sql-client.xx configurations
> like sql-client.verbose, sql-client.execution.max-table-result.rows
> comply with this convention.
>
> 3. Considering the portability to support different CLI tools (sql-client,
> sql-gateway, etc.), I prefer table.dml-sync.
>
> In addition, I think sql-client/sql-gateway/other CLI tools can be placed
> out of flink-table module even in an external project, this should not
> affect our conclusion.
>
>
> Hope this can help you.
>
>
> Best,
> Leonard
>
>
>
> > 在 2021年2月25日，18:51，Shengkai Fang <fs...@gmail.com> 写道：
> >
> > Hi, everyone.
> >
> > I do some summaries about the discussion about the option. If the summary
> > has errors, please correct me.
> >
> > `table.dml-sync`:
> > - take effect for `executeMultiSql` and sql client
> > - benefit: SQL script portability. One script for all platforms.
> > - drawback: Don't work for `TableEnvironment#executeSql`.
> >
> > `table.multi-dml-sync`:
> > - take effect for `executeMultiSql` and sql client
> > - benefit: SQL script portability
> > - drawback: It's confused when the sql script has one dml statement but
> > need to set option `table.multi-dml-sync`
> >
> > `client.dml-sync`:
> > - take effect for sql client only
> > - benefit: clear definition.
> > - drawback: Every platform needs to define its own option. Bad SQL script
> > portability.
> >
> > Just as Jark said, I think the `table.dml-sync` is a good choice if we
> can
> > extend its scope and make this option works for `executeSql`.
> > It's straightforward and users can use this option now in table api.  The
> > drawback is the  `TableResult#await` plays the same role as the option.
> I
> > don't think the drawback is really critical because many systems have
> > commands play the same role with the different names.
> >
> > Best,
> > Shengkai
> >
> > Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：
> >
> >> The `table.` prefix is meant to be a general option in the table
> >> ecosystem. Not necessarily attached to Table API or SQL Client. That's
> >> why SQL Client is also located in the `flink-table` module.
> >>
> >> My main concern is the SQL script portability. Declaring the sync/async
> >> behavior will happen in many SQL scripts. And users should be easily
> >> switch from SQL Client to some commercial product without the need of
> >> changing the script again.
> >>
> >> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later
> >> but that would mean introducing future confusion. An app name (what
> >> `sql-client` kind of is) should not be part of a config option key if
> >> other apps will need the same kind of option.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 24.02.21 08:59, Jark Wu wrote:
> >>>> From my point of view, I also prefer "sql-client.dml-sync",
> >>> because the behavior of this configuration is very clear.
> >>> Even if we introduce a new config in the future, e.g. `table.dml-sync`,
> >>> we can also deprecate the sql-client one.
> >>>
> >>> Introducing a "table."  configuration without any implementation
> >>> will confuse users a lot, as they expect it should take effect on
> >>> the Table API.
> >>>
> >>> If we want to introduce an unified "table.dml-sync" option, I prefer
> >>> it should be implemented on Table API and affect all the DMLs on
> >>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
> >>> as I have mentioned before [1].
> >>>
> >>>> It would be very straightforward that it affects all the DMLs on SQL
> CLI
> >>> and
> >>> TableEnvironment (including `executeSql`, `StatementSet`,
> >>> `Table#executeInsert`, etc.).
> >>> This can also make SQL CLI easy to support this configuration by
> passing
> >>> through to the TableEnv.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> [1]:
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
> >>>
> >>>
> >>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:
> >>>
> >>>> If we all agree the option should only be handled by sql client, then
> >> why
> >>>> don't we
> >>>> just call it `sql-client.dml-sync`? As you said, calling it
> >>>> `table.dml-sync` but has no
> >>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a
> big
> >>>> confusion for
> >>>> users.
> >>>>
> >>>> The only concern I saw is if we introduce
> >>>> "TableEnvironment.executeMultiSql()" in the
> >>>> future, how do we control the synchronization between statements? TBH
> I
> >>>> don't really
> >>>> see a strong requirement for such interfaces. Right now, we have a
> >> pretty
> >>>> clear semantic
> >>>> of `TableEnv.executeSql`, and it's very convenient for users if they
> >> want
> >>>> to execute multiple
> >>>> sql statements. They can simulate either synced or async execution
> with
> >>>> this building block.
> >>>>
> >>>> This will introduce slight overhead for users, but compared to the
> >>>> confusion we might
> >>>> cause if we introduce such a method of our own, I think it's better to
> >> wait
> >>>> for some more
> >>>> feedback.
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Kurt,
> >>>>>
> >>>>> we can also shorten it to `table.dml-sync` if that would help. Then
> it
> >>>>> would confuse users that do a regular `.executeSql("INSERT INTO")`
> in a
> >>>>> notebook session.
> >>>>>
> >>>>> In any case users will need to learn the semantics of this option.
> >>>>> `table.multi-dml-sync` should be described as "If a you are in a
> multi
> >>>>> statement environment, execute DMLs synchrounous.". I don't have a
> >>>>> strong opinion on shortening it to `table.dml-sync`.
> >>>>>
> >>>>> Just to clarify the implementation: The option should be handled by
> the
> >>>>> SQL Client only, but the name can be shared accross platforms.
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>> On 23.02.21 09:54, Kurt Young wrote:
> >>>>>> Sorry for the late reply, but I'm confused by
> `table.multi-dml-sync`.
> >>>>>>
> >>>>>> IIUC this config will take effect with 2 use cases:
> >>>>>> 1. SQL client, either interactive mode or executing multiple
> >> statements
> >>>>> via
> >>>>>> -f. In most cases,
> >>>>>> there will be only one INSERT INTO statement but we are controlling
> >> the
> >>>>>> sync/async behavior
> >>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users.
> >>>>> Besides,
> >>>>>>
> >>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we
> are
> >>>>> also
> >>>>>> not sure if we will
> >>>>>> really introduce this in the future.
> >>>>>>
> >>>>>> I would prefer to introduce this option for only sql client. For
> >>>>> platforms
> >>>>>> Timo mentioned which
> >>>>>> need to control such behavior, I think it's easy and flexible to
> >>>>> introduce
> >>>>>> one on their own.
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Hi everyone.
> >>>>>>>
> >>>>>>> Sorry for the late response.
> >>>>>>>
> >>>>>>> For `execution.runtime-mode`, I think it's much better than
> >>>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
> >>>>>>>
> >>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
> >>>> clarify
> >>>>> the
> >>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to
> >>>>> specify
> >>>>>>> the table that is fully qualified and only works for the table that
> >> is
> >>>>>>> created by the sql statements.
> >>>>>>>
> >>>>>>> I have updated the FLIP with suggestions. It seems we have reached
> a
> >>>>>>> consensus, I'd like to start a formal vote for the FLIP.
> >>>>>>>
> >>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Shengkai
> >>>>>>>
> >>>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
> >>>>>>>
> >>>>>>>> Hi Ingo,
> >>>>>>>>
> >>>>>>>> 1) I think you are right, the table path should be
> fully-qualified.
> >>>>>>>>
> >>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
> >>>>>>>> only aims to print DDL for the tables registered using SQL CREATE
> >>>> TABLE
> >>>>>>>> DDL.
> >>>>>>>> If a table is registered using Table API,  e.g.
> >>>>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
> >>>>>>>> currently it's not possible to print DDL for such tables.
> >>>>>>>> I think we should point it out in the FLIP.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com>
> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I have a couple questions about the SHOW CREATE TABLE statement.
> >>>>>>>>>
> >>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL
> >>>> should
> >>>>>>>>> always have the table identifier fully-qualified. Otherwise the
> DDL
> >>>>>>>> depends
> >>>>>>>>> on the current context (catalog/database), which could be
> >>>> surprising,
> >>>>>>>>> especially since "the same" table can behave differently if
> created
> >>>> in
> >>>>>>>>> different catalogs.
> >>>>>>>>> 2) How should this handle tables which cannot be fully
> >> characterized
> >>>>> by
> >>>>>>>>> properties only? I don't know if there's an example for this yet,
> >>>> but
> >>>>>>>>> hypothetically this is not currently a requirement, right? This
> >>>> isn't
> >>>>>>> as
> >>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if
> >> it's
> >>>>>>>>> general Flink SQL syntax we should consider this (one way or
> >>>> another).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>> Ingo
> >>>>>>>>>
> >>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <twalthr@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>
> >>>>>>>>>> thanks for updating the FLIP.
> >>>>>>>>>>
> >>>>>>>>>> I have one last comment for the option `table.execution.mode`.
> >>>> Should
> >>>>>>>> we
> >>>>>>>>>> already use the global Flink option `execution.runtime-mode`
> >>>> instead?
> >>>>>>>>>>
> >>>>>>>>>> We are using Flink's options where possible (e.g. `
> pipeline.name`
> >>>>>>> and
> >>>>>>>>>> `parallism.default`) why not also for batch/streaming mode?
> >>>>>>>>>>
> >>>>>>>>>> The description of the option matches to the Blink planner
> >>>> behavior:
> >>>>>>>>>>
> >>>>>>>>>> ```
> >>>>>>>>>> Among other things, this controls task scheduling, network
> shuffle
> >>>>>>>>>> behavior, and time semantics.
> >>>>>>>>>> ```
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Timo
> >>>>>>>>>>
> >>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
> >>>>>>>>>>> Hi, guys.
> >>>>>>>>>>>
> >>>>>>>>>>> I have updated the FLIP.  It seems we have reached agreement.
> >>>> Maybe
> >>>>>>>> we
> >>>>>>>>>> can
> >>>>>>>>>>> start the vote soon. If anyone has other questions, please
> leave
> >>>>>>> your
> >>>>>>>>>>> comments.
> >>>>>>>>>>>
> >>>>>>>>>>> Best，
> >>>>>>>>>>> Shengkai
> >>>>>>>>>>>
> >>>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi guys,
> >>>>>>>>>>>>
> >>>>>>>>>>>> The conclusion sounds good to me.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <
> fskmine@gmail.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi, Timo, Jark.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am fine with the new option name.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future
> work.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
> >>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks
> >>>> like
> >>>>>>>>>>>> single
> >>>>>>>>>>>>>>> statement.
> >>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements
> >> from
> >>>>>>>>>>>> opening
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> closing.
> >>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
> >>>> default),
> >>>>>>>> and
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>> support this config
> >>>>>>>>>>>>>>> in SQL CLI first, will support it in
> >>>>>>>>>>>> TableEnvironment#executeMultiSql()
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> the future, right?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
> >> twalthr@apache.org
> >>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not
> >>>> apply
> >>>>>>>> to
> >>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only
> sense
> >>>>>>>> when
> >>>>>>>>>>>>>>>> executing multi statements. Once we have a
> >>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
> >>>>>>>>>>>> considered.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms
> >> will
> >>>>>>>> also
> >>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>> to have this config option, which is why I would like to
> >>>>>>> avoid a
> >>>>>>>>> SQL
> >>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to
> come
> >>>>>>> up
> >>>>>>>>> with
> >>>>>>>>>>>>>>>> this important config option separately.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or
> >>>> other
> >>>>>>>>>>>>> opinions?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> >>>>>>>>>>>>>>>>> Hi, all.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is
> we
> >>>>>>>> have
> >>>>>>>>> no
> >>>>>>>>>>>>>> means
> >>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
> >>>>>>> option
> >>>>>>>>>>>> true
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> use `TableResult#await` together.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Shengkai.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Best regards!
> >>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Leonard Xu <xb...@gmail.com>.

Hi, all

Look like there’s only one divergence about option [ table | sql-client ].dml-sync in this thread, correct me if I’m wrong.

1. Leaving the context of this thread, from a user's perspective, 
the table.xx configurations should take effect in Table API & SQL,
the sql-client.xx configurations should only take effect in sql-client. 
 In my(the user's) opinion, other explanations are counterintuitive.

2.  It should be pointed out that both all existed table.xx configurations like table.exec.state.ttl, table.optimizer.agg-phase-strategy, table.local-time-zone,etc..  and the proposed sql-client.xx configurations like sql-client.verbose, sql-client.execution.max-table-result.rows	
comply with this convention.

3. Considering the portability to support different CLI tools (sql-client, sql-gateway, etc.), I prefer table.dml-sync.   

In addition, I think sql-client/sql-gateway/other CLI tools can be placed out of flink-table module even in an external project, this should not affect our conclusion.


Hope this can help you.


Best,
Leonard



> 在 2021年2月25日，18:51，Shengkai Fang <fs...@gmail.com> 写道：
> 
> Hi, everyone.
> 
> I do some summaries about the discussion about the option. If the summary
> has errors, please correct me.
> 
> `table.dml-sync`:
> - take effect for `executeMultiSql` and sql client
> - benefit: SQL script portability. One script for all platforms.
> - drawback: Don't work for `TableEnvironment#executeSql`.
> 
> `table.multi-dml-sync`:
> - take effect for `executeMultiSql` and sql client
> - benefit: SQL script portability
> - drawback: It's confused when the sql script has one dml statement but
> need to set option `table.multi-dml-sync`
> 
> `client.dml-sync`:
> - take effect for sql client only
> - benefit: clear definition.
> - drawback: Every platform needs to define its own option. Bad SQL script
> portability.
> 
> Just as Jark said, I think the `table.dml-sync` is a good choice if we can
> extend its scope and make this option works for `executeSql`.
> It's straightforward and users can use this option now in table api.  The
> drawback is the  `TableResult#await` plays the same role as the option.  I
> don't think the drawback is really critical because many systems have
> commands play the same role with the different names.
> 
> Best,
> Shengkai
> 
> Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：
> 
>> The `table.` prefix is meant to be a general option in the table
>> ecosystem. Not necessarily attached to Table API or SQL Client. That's
>> why SQL Client is also located in the `flink-table` module.
>> 
>> My main concern is the SQL script portability. Declaring the sync/async
>> behavior will happen in many SQL scripts. And users should be easily
>> switch from SQL Client to some commercial product without the need of
>> changing the script again.
>> 
>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later
>> but that would mean introducing future confusion. An app name (what
>> `sql-client` kind of is) should not be part of a config option key if
>> other apps will need the same kind of option.
>> 
>> Regards,
>> Timo
>> 
>> 
>> On 24.02.21 08:59, Jark Wu wrote:
>>>> From my point of view, I also prefer "sql-client.dml-sync",
>>> because the behavior of this configuration is very clear.
>>> Even if we introduce a new config in the future, e.g. `table.dml-sync`,
>>> we can also deprecate the sql-client one.
>>> 
>>> Introducing a "table."  configuration without any implementation
>>> will confuse users a lot, as they expect it should take effect on
>>> the Table API.
>>> 
>>> If we want to introduce an unified "table.dml-sync" option, I prefer
>>> it should be implemented on Table API and affect all the DMLs on
>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
>>> as I have mentioned before [1].
>>> 
>>>> It would be very straightforward that it affects all the DMLs on SQL CLI
>>> and
>>> TableEnvironment (including `executeSql`, `StatementSet`,
>>> `Table#executeInsert`, etc.).
>>> This can also make SQL CLI easy to support this configuration by passing
>>> through to the TableEnv.
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> [1]:
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
>>> 
>>> 
>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:
>>> 
>>>> If we all agree the option should only be handled by sql client, then
>> why
>>>> don't we
>>>> just call it `sql-client.dml-sync`? As you said, calling it
>>>> `table.dml-sync` but has no
>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big
>>>> confusion for
>>>> users.
>>>> 
>>>> The only concern I saw is if we introduce
>>>> "TableEnvironment.executeMultiSql()" in the
>>>> future, how do we control the synchronization between statements? TBH I
>>>> don't really
>>>> see a strong requirement for such interfaces. Right now, we have a
>> pretty
>>>> clear semantic
>>>> of `TableEnv.executeSql`, and it's very convenient for users if they
>> want
>>>> to execute multiple
>>>> sql statements. They can simulate either synced or async execution with
>>>> this building block.
>>>> 
>>>> This will introduce slight overhead for users, but compared to the
>>>> confusion we might
>>>> cause if we introduce such a method of our own, I think it's better to
>> wait
>>>> for some more
>>>> feedback.
>>>> 
>>>> Best,
>>>> Kurt
>>>> 
>>>> 
>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org>
>> wrote:
>>>> 
>>>>> Hi Kurt,
>>>>> 
>>>>> we can also shorten it to `table.dml-sync` if that would help. Then it
>>>>> would confuse users that do a regular `.executeSql("INSERT INTO")` in a
>>>>> notebook session.
>>>>> 
>>>>> In any case users will need to learn the semantics of this option.
>>>>> `table.multi-dml-sync` should be described as "If a you are in a multi
>>>>> statement environment, execute DMLs synchrounous.". I don't have a
>>>>> strong opinion on shortening it to `table.dml-sync`.
>>>>> 
>>>>> Just to clarify the implementation: The option should be handled by the
>>>>> SQL Client only, but the name can be shared accross platforms.
>>>>> 
>>>>> Regards,
>>>>> Timo
>>>>> 
>>>>> 
>>>>> On 23.02.21 09:54, Kurt Young wrote:
>>>>>> Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
>>>>>> 
>>>>>> IIUC this config will take effect with 2 use cases:
>>>>>> 1. SQL client, either interactive mode or executing multiple
>> statements
>>>>> via
>>>>>> -f. In most cases,
>>>>>> there will be only one INSERT INTO statement but we are controlling
>> the
>>>>>> sync/async behavior
>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users.
>>>>> Besides,
>>>>>> 
>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we are
>>>>> also
>>>>>> not sure if we will
>>>>>> really introduce this in the future.
>>>>>> 
>>>>>> I would prefer to introduce this option for only sql client. For
>>>>> platforms
>>>>>> Timo mentioned which
>>>>>> need to control such behavior, I think it's easy and flexible to
>>>>> introduce
>>>>>> one on their own.
>>>>>> 
>>>>>> Best,
>>>>>> Kurt
>>>>>> 
>>>>>> 
>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi everyone.
>>>>>>> 
>>>>>>> Sorry for the late response.
>>>>>>> 
>>>>>>> For `execution.runtime-mode`, I think it's much better than
>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
>>>>>>> 
>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
>>>> clarify
>>>>> the
>>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to
>>>>> specify
>>>>>>> the table that is fully qualified and only works for the table that
>> is
>>>>>>> created by the sql statements.
>>>>>>> 
>>>>>>> I have updated the FLIP with suggestions. It seems we have reached a
>>>>>>> consensus, I'd like to start a formal vote for the FLIP.
>>>>>>> 
>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Shengkai
>>>>>>> 
>>>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
>>>>>>> 
>>>>>>>> Hi Ingo,
>>>>>>>> 
>>>>>>>> 1) I think you are right, the table path should be fully-qualified.
>>>>>>>> 
>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
>>>>>>>> only aims to print DDL for the tables registered using SQL CREATE
>>>> TABLE
>>>>>>>> DDL.
>>>>>>>> If a table is registered using Table API,  e.g.
>>>>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
>>>>>>>> currently it's not possible to print DDL for such tables.
>>>>>>>> I think we should point it out in the FLIP.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jark
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE statement.
>>>>>>>>> 
>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL
>>>> should
>>>>>>>>> always have the table identifier fully-qualified. Otherwise the DDL
>>>>>>>> depends
>>>>>>>>> on the current context (catalog/database), which could be
>>>> surprising,
>>>>>>>>> especially since "the same" table can behave differently if created
>>>> in
>>>>>>>>> different catalogs.
>>>>>>>>> 2) How should this handle tables which cannot be fully
>> characterized
>>>>> by
>>>>>>>>> properties only? I don't know if there's an example for this yet,
>>>> but
>>>>>>>>> hypothetically this is not currently a requirement, right? This
>>>> isn't
>>>>>>> as
>>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if
>> it's
>>>>>>>>> general Flink SQL syntax we should consider this (one way or
>>>> another).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> Ingo
>>>>>>>>> 
>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Shengkai,
>>>>>>>>>> 
>>>>>>>>>> thanks for updating the FLIP.
>>>>>>>>>> 
>>>>>>>>>> I have one last comment for the option `table.execution.mode`.
>>>> Should
>>>>>>>> we
>>>>>>>>>> already use the global Flink option `execution.runtime-mode`
>>>> instead?
>>>>>>>>>> 
>>>>>>>>>> We are using Flink's options where possible (e.g. `pipeline.name`
>>>>>>> and
>>>>>>>>>> `parallism.default`) why not also for batch/streaming mode?
>>>>>>>>>> 
>>>>>>>>>> The description of the option matches to the Blink planner
>>>> behavior:
>>>>>>>>>> 
>>>>>>>>>> ```
>>>>>>>>>> Among other things, this controls task scheduling, network shuffle
>>>>>>>>>> behavior, and time semantics.
>>>>>>>>>> ```
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Timo
>>>>>>>>>> 
>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
>>>>>>>>>>> Hi, guys.
>>>>>>>>>>> 
>>>>>>>>>>> I have updated the FLIP.  It seems we have reached agreement.
>>>> Maybe
>>>>>>>> we
>>>>>>>>>> can
>>>>>>>>>>> start the vote soon. If anyone has other questions, please leave
>>>>>>> your
>>>>>>>>>>> comments.
>>>>>>>>>>> 
>>>>>>>>>>> Best，
>>>>>>>>>>> Shengkai
>>>>>>>>>>> 
>>>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
>>>>>>>>>>> 
>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>> 
>>>>>>>>>>>> The conclusion sounds good to me.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fskmine@gmail.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi, Timo, Jark.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am fine with the new option name.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks
>>>> like
>>>>>>>>>>>> single
>>>>>>>>>>>>>>> statement.
>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements
>> from
>>>>>>>>>>>> opening
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> closing.
>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
>>>> default),
>>>>>>>> and
>>>>>>>>>>>> we
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>> support this config
>>>>>>>>>>>>>>> in SQL CLI first, will support it in
>>>>>>>>>>>> TableEnvironment#executeMultiSql()
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the future, right?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
>> twalthr@apache.org
>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not
>>>> apply
>>>>>>>> to
>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense
>>>>>>>> when
>>>>>>>>>>>>>>>> executing multi statements. Once we have a
>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
>>>>>>>>>>>> considered.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms
>> will
>>>>>>>> also
>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>> to have this config option, which is why I would like to
>>>>>>> avoid a
>>>>>>>>> SQL
>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to come
>>>>>>> up
>>>>>>>>> with
>>>>>>>>>>>>>>>> this important config option separately.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or
>>>> other
>>>>>>>>>>>>> opinions?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
>>>>>>>>>>>>>>>>> Hi, all.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is  we
>>>>>>>> have
>>>>>>>>> no
>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
>>>>>>> option
>>>>>>>>>>>> true
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> use `TableResult#await` together.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Shengkai.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Best regards!
>>>>>>>>>>>> Rui Li
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi, everyone.

I do some summaries about the discussion about the option. If the summary
has errors, please correct me.

`table.dml-sync`:
- take effect for `executeMultiSql` and sql client
- benefit: SQL script portability. One script for all platforms.
- drawback: Don't work for `TableEnvironment#executeSql`.

`table.multi-dml-sync`:
- take effect for `executeMultiSql` and sql client
- benefit: SQL script portability
- drawback: It's confused when the sql script has one dml statement but
need to set option `table.multi-dml-sync`

`client.dml-sync`:
- take effect for sql client only
- benefit: clear definition.
- drawback: Every platform needs to define its own option. Bad SQL script
portability.

Just as Jark said, I think the `table.dml-sync` is a good choice if we can
extend its scope and make this option works for `executeSql`.
It's straightforward and users can use this option now in table api.  The
drawback is the  `TableResult#await` plays the same role as the option.  I
don't think the drawback is really critical because many systems have
commands play the same role with the different names.

Best,
Shengkai

Timo Walther <tw...@apache.org> 于2021年2月25日周四 下午4:23写道：

> The `table.` prefix is meant to be a general option in the table
> ecosystem. Not necessarily attached to Table API or SQL Client. That's
> why SQL Client is also located in the `flink-table` module.
>
> My main concern is the SQL script portability. Declaring the sync/async
> behavior will happen in many SQL scripts. And users should be easily
> switch from SQL Client to some commercial product without the need of
> changing the script again.
>
> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later
> but that would mean introducing future confusion. An app name (what
> `sql-client` kind of is) should not be part of a config option key if
> other apps will need the same kind of option.
>
> Regards,
> Timo
>
>
> On 24.02.21 08:59, Jark Wu wrote:
> >>From my point of view, I also prefer "sql-client.dml-sync",
> > because the behavior of this configuration is very clear.
> > Even if we introduce a new config in the future, e.g. `table.dml-sync`,
> > we can also deprecate the sql-client one.
> >
> > Introducing a "table."  configuration without any implementation
> > will confuse users a lot, as they expect it should take effect on
> > the Table API.
> >
> > If we want to introduce an unified "table.dml-sync" option, I prefer
> > it should be implemented on Table API and affect all the DMLs on
> > Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
> > as I have mentioned before [1].
> >
> >> It would be very straightforward that it affects all the DMLs on SQL CLI
> > and
> > TableEnvironment (including `executeSql`, `StatementSet`,
> > `Table#executeInsert`, etc.).
> > This can also make SQL CLI easy to support this configuration by passing
> > through to the TableEnv.
> >
> > Best,
> > Jark
> >
> >
> > [1]:
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
> >
> >
> > On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:
> >
> >> If we all agree the option should only be handled by sql client, then
> why
> >> don't we
> >> just call it `sql-client.dml-sync`? As you said, calling it
> >> `table.dml-sync` but has no
> >> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big
> >> confusion for
> >> users.
> >>
> >> The only concern I saw is if we introduce
> >> "TableEnvironment.executeMultiSql()" in the
> >> future, how do we control the synchronization between statements? TBH I
> >> don't really
> >> see a strong requirement for such interfaces. Right now, we have a
> pretty
> >> clear semantic
> >> of `TableEnv.executeSql`, and it's very convenient for users if they
> want
> >> to execute multiple
> >> sql statements. They can simulate either synced or async execution with
> >> this building block.
> >>
> >> This will introduce slight overhead for users, but compared to the
> >> confusion we might
> >> cause if we introduce such a method of our own, I think it's better to
> wait
> >> for some more
> >> feedback.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org>
> wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> we can also shorten it to `table.dml-sync` if that would help. Then it
> >>> would confuse users that do a regular `.executeSql("INSERT INTO")` in a
> >>> notebook session.
> >>>
> >>> In any case users will need to learn the semantics of this option.
> >>> `table.multi-dml-sync` should be described as "If a you are in a multi
> >>> statement environment, execute DMLs synchrounous.". I don't have a
> >>> strong opinion on shortening it to `table.dml-sync`.
> >>>
> >>> Just to clarify the implementation: The option should be handled by the
> >>> SQL Client only, but the name can be shared accross platforms.
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 23.02.21 09:54, Kurt Young wrote:
> >>>> Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
> >>>>
> >>>> IIUC this config will take effect with 2 use cases:
> >>>> 1. SQL client, either interactive mode or executing multiple
> statements
> >>> via
> >>>> -f. In most cases,
> >>>> there will be only one INSERT INTO statement but we are controlling
> the
> >>>> sync/async behavior
> >>>> with "*multi-dml*-sync". I think this will confuse a lot of users.
> >>> Besides,
> >>>>
> >>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we are
> >>> also
> >>>> not sure if we will
> >>>> really introduce this in the future.
> >>>>
> >>>> I would prefer to introduce this option for only sql client. For
> >>> platforms
> >>>> Timo mentioned which
> >>>> need to control such behavior, I think it's easy and flexible to
> >>> introduce
> >>>> one on their own.
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Hi everyone.
> >>>>>
> >>>>> Sorry for the late response.
> >>>>>
> >>>>> For `execution.runtime-mode`, I think it's much better than
> >>>>> `table.execution.mode`. Thanks for Timo's suggestions!
> >>>>>
> >>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
> >> clarify
> >>> the
> >>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to
> >>> specify
> >>>>> the table that is fully qualified and only works for the table that
> is
> >>>>> created by the sql statements.
> >>>>>
> >>>>> I have updated the FLIP with suggestions. It seems we have reached a
> >>>>> consensus, I'd like to start a formal vote for the FLIP.
> >>>>>
> >>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
> >>>>>
> >>>>> Best,
> >>>>> Shengkai
> >>>>>
> >>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
> >>>>>
> >>>>>> Hi Ingo,
> >>>>>>
> >>>>>> 1) I think you are right, the table path should be fully-qualified.
> >>>>>>
> >>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
> >>>>>> only aims to print DDL for the tables registered using SQL CREATE
> >> TABLE
> >>>>>> DDL.
> >>>>>> If a table is registered using Table API,  e.g.
> >>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
> >>>>>> currently it's not possible to print DDL for such tables.
> >>>>>> I think we should point it out in the FLIP.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I have a couple questions about the SHOW CREATE TABLE statement.
> >>>>>>>
> >>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL
> >> should
> >>>>>>> always have the table identifier fully-qualified. Otherwise the DDL
> >>>>>> depends
> >>>>>>> on the current context (catalog/database), which could be
> >> surprising,
> >>>>>>> especially since "the same" table can behave differently if created
> >> in
> >>>>>>> different catalogs.
> >>>>>>> 2) How should this handle tables which cannot be fully
> characterized
> >>> by
> >>>>>>> properties only? I don't know if there's an example for this yet,
> >> but
> >>>>>>> hypothetically this is not currently a requirement, right? This
> >> isn't
> >>>>> as
> >>>>>>> much of a problem if this syntax is SQL-client-specific, but if
> it's
> >>>>>>> general Flink SQL syntax we should consider this (one way or
> >> another).
> >>>>>>>
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Ingo
> >>>>>>>
> >>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Shengkai,
> >>>>>>>>
> >>>>>>>> thanks for updating the FLIP.
> >>>>>>>>
> >>>>>>>> I have one last comment for the option `table.execution.mode`.
> >> Should
> >>>>>> we
> >>>>>>>> already use the global Flink option `execution.runtime-mode`
> >> instead?
> >>>>>>>>
> >>>>>>>> We are using Flink's options where possible (e.g. `pipeline.name`
> >>>>> and
> >>>>>>>> `parallism.default`) why not also for batch/streaming mode?
> >>>>>>>>
> >>>>>>>> The description of the option matches to the Blink planner
> >> behavior:
> >>>>>>>>
> >>>>>>>> ```
> >>>>>>>> Among other things, this controls task scheduling, network shuffle
> >>>>>>>> behavior, and time semantics.
> >>>>>>>> ```
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Timo
> >>>>>>>>
> >>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
> >>>>>>>>> Hi, guys.
> >>>>>>>>>
> >>>>>>>>> I have updated the FLIP.  It seems we have reached agreement.
> >> Maybe
> >>>>>> we
> >>>>>>>> can
> >>>>>>>>> start the vote soon. If anyone has other questions, please leave
> >>>>> your
> >>>>>>>>> comments.
> >>>>>>>>>
> >>>>>>>>> Best，
> >>>>>>>>> Shengkai
> >>>>>>>>>
> >>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> >>>>>>>>>
> >>>>>>>>>> Hi guys,
> >>>>>>>>>>
> >>>>>>>>>> The conclusion sounds good to me.
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fskmine@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi, Timo, Jark.
> >>>>>>>>>>>
> >>>>>>>>>>> I am fine with the new option name.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Shengkai
> >>>>>>>>>>>
> >>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> >>>>>>>>>>>
> >>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> >>>>>>>>>>>>
> >>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
> >>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks
> >> like
> >>>>>>>>>> single
> >>>>>>>>>>>>> statement.
> >>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements
> from
> >>>>>>>>>> opening
> >>>>>>>>>>> to
> >>>>>>>>>>>>> closing.
> >>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
> >> default),
> >>>>>> and
> >>>>>>>>>> we
> >>>>>>>>>>>> will
> >>>>>>>>>>>>> support this config
> >>>>>>>>>>>>> in SQL CLI first, will support it in
> >>>>>>>>>> TableEnvironment#executeMultiSql()
> >>>>>>>>>>>> in
> >>>>>>>>>>>>> the future, right?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <
> twalthr@apache.org
> >>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not
> >> apply
> >>>>>> to
> >>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense
> >>>>>> when
> >>>>>>>>>>>>>> executing multi statements. Once we have a
> >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
> >>>>>>>>>> considered.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms
> will
> >>>>>> also
> >>>>>>>>>>> need
> >>>>>>>>>>>>>> to have this config option, which is why I would like to
> >>>>> avoid a
> >>>>>>> SQL
> >>>>>>>>>>>>>> Client specific option. Otherwise every platform has to come
> >>>>> up
> >>>>>>> with
> >>>>>>>>>>>>>> this important config option separately.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or
> >> other
> >>>>>>>>>>> opinions?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> >>>>>>>>>>>>>>> Hi, all.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I think it may cause user confused. The main problem is  we
> >>>>>> have
> >>>>>>> no
> >>>>>>>>>>>> means
> >>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
> >>>>> option
> >>>>>>>>>> true
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> use `TableResult#await` together.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Shengkai.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Best regards!
> >>>>>>>>>> Rui Li
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

The `table.` prefix is meant to be a general option in the table 
ecosystem. Not necessarily attached to Table API or SQL Client. That's 
why SQL Client is also located in the `flink-table` module.

My main concern is the SQL script portability. Declaring the sync/async 
behavior will happen in many SQL scripts. And users should be easily 
switch from SQL Client to some commercial product without the need of 
changing the script again.

Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later 
but that would mean introducing future confusion. An app name (what 
`sql-client` kind of is) should not be part of a config option key if 
other apps will need the same kind of option.

Regards,
Timo


On 24.02.21 08:59, Jark Wu wrote:
>>From my point of view, I also prefer "sql-client.dml-sync",
> because the behavior of this configuration is very clear.
> Even if we introduce a new config in the future, e.g. `table.dml-sync`,
> we can also deprecate the sql-client one.
> 
> Introducing a "table."  configuration without any implementation
> will confuse users a lot, as they expect it should take effect on
> the Table API.
> 
> If we want to introduce an unified "table.dml-sync" option, I prefer
> it should be implemented on Table API and affect all the DMLs on
> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
> as I have mentioned before [1].
> 
>> It would be very straightforward that it affects all the DMLs on SQL CLI
> and
> TableEnvironment (including `executeSql`, `StatementSet`,
> `Table#executeInsert`, etc.).
> This can also make SQL CLI easy to support this configuration by passing
> through to the TableEnv.
> 
> Best,
> Jark
> 
> 
> [1]:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html
> 
> 
> On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:
> 
>> If we all agree the option should only be handled by sql client, then why
>> don't we
>> just call it `sql-client.dml-sync`? As you said, calling it
>> `table.dml-sync` but has no
>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big
>> confusion for
>> users.
>>
>> The only concern I saw is if we introduce
>> "TableEnvironment.executeMultiSql()" in the
>> future, how do we control the synchronization between statements? TBH I
>> don't really
>> see a strong requirement for such interfaces. Right now, we have a pretty
>> clear semantic
>> of `TableEnv.executeSql`, and it's very convenient for users if they want
>> to execute multiple
>> sql statements. They can simulate either synced or async execution with
>> this building block.
>>
>> This will introduce slight overhead for users, but compared to the
>> confusion we might
>> cause if we introduce such a method of our own, I think it's better to wait
>> for some more
>> feedback.
>>
>> Best,
>> Kurt
>>
>>
>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org> wrote:
>>
>>> Hi Kurt,
>>>
>>> we can also shorten it to `table.dml-sync` if that would help. Then it
>>> would confuse users that do a regular `.executeSql("INSERT INTO")` in a
>>> notebook session.
>>>
>>> In any case users will need to learn the semantics of this option.
>>> `table.multi-dml-sync` should be described as "If a you are in a multi
>>> statement environment, execute DMLs synchrounous.". I don't have a
>>> strong opinion on shortening it to `table.dml-sync`.
>>>
>>> Just to clarify the implementation: The option should be handled by the
>>> SQL Client only, but the name can be shared accross platforms.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 23.02.21 09:54, Kurt Young wrote:
>>>> Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
>>>>
>>>> IIUC this config will take effect with 2 use cases:
>>>> 1. SQL client, either interactive mode or executing multiple statements
>>> via
>>>> -f. In most cases,
>>>> there will be only one INSERT INTO statement but we are controlling the
>>>> sync/async behavior
>>>> with "*multi-dml*-sync". I think this will confuse a lot of users.
>>> Besides,
>>>>
>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we are
>>> also
>>>> not sure if we will
>>>> really introduce this in the future.
>>>>
>>>> I would prefer to introduce this option for only sql client. For
>>> platforms
>>>> Timo mentioned which
>>>> need to control such behavior, I think it's easy and flexible to
>>> introduce
>>>> one on their own.
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com>
>>> wrote:
>>>>
>>>>> Hi everyone.
>>>>>
>>>>> Sorry for the late response.
>>>>>
>>>>> For `execution.runtime-mode`, I think it's much better than
>>>>> `table.execution.mode`. Thanks for Timo's suggestions!
>>>>>
>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
>> clarify
>>> the
>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to
>>> specify
>>>>> the table that is fully qualified and only works for the table that is
>>>>> created by the sql statements.
>>>>>
>>>>> I have updated the FLIP with suggestions. It seems we have reached a
>>>>> consensus, I'd like to start a formal vote for the FLIP.
>>>>>
>>>>> Please vote +1 to approve the FLIP, or -1 with a comment.
>>>>>
>>>>> Best,
>>>>> Shengkai
>>>>>
>>>>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
>>>>>
>>>>>> Hi Ingo,
>>>>>>
>>>>>> 1) I think you are right, the table path should be fully-qualified.
>>>>>>
>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE
>>>>>> only aims to print DDL for the tables registered using SQL CREATE
>> TABLE
>>>>>> DDL.
>>>>>> If a table is registered using Table API,  e.g.
>>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
>>>>>> currently it's not possible to print DDL for such tables.
>>>>>> I think we should point it out in the FLIP.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a couple questions about the SHOW CREATE TABLE statement.
>>>>>>>
>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL
>> should
>>>>>>> always have the table identifier fully-qualified. Otherwise the DDL
>>>>>> depends
>>>>>>> on the current context (catalog/database), which could be
>> surprising,
>>>>>>> especially since "the same" table can behave differently if created
>> in
>>>>>>> different catalogs.
>>>>>>> 2) How should this handle tables which cannot be fully characterized
>>> by
>>>>>>> properties only? I don't know if there's an example for this yet,
>> but
>>>>>>> hypothetically this is not currently a requirement, right? This
>> isn't
>>>>> as
>>>>>>> much of a problem if this syntax is SQL-client-specific, but if it's
>>>>>>> general Flink SQL syntax we should consider this (one way or
>> another).
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>> Ingo
>>>>>>>
>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org>
>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Shengkai,
>>>>>>>>
>>>>>>>> thanks for updating the FLIP.
>>>>>>>>
>>>>>>>> I have one last comment for the option `table.execution.mode`.
>> Should
>>>>>> we
>>>>>>>> already use the global Flink option `execution.runtime-mode`
>> instead?
>>>>>>>>
>>>>>>>> We are using Flink's options where possible (e.g. `pipeline.name`
>>>>> and
>>>>>>>> `parallism.default`) why not also for batch/streaming mode?
>>>>>>>>
>>>>>>>> The description of the option matches to the Blink planner
>> behavior:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> Among other things, this controls task scheduling, network shuffle
>>>>>>>> behavior, and time semantics.
>>>>>>>> ```
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Timo
>>>>>>>>
>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
>>>>>>>>> Hi, guys.
>>>>>>>>>
>>>>>>>>> I have updated the FLIP.  It seems we have reached agreement.
>> Maybe
>>>>>> we
>>>>>>>> can
>>>>>>>>> start the vote soon. If anyone has other questions, please leave
>>>>> your
>>>>>>>>> comments.
>>>>>>>>>
>>>>>>>>> Best，
>>>>>>>>> Shengkai
>>>>>>>>>
>>>>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
>>>>>>>>>
>>>>>>>>>> Hi guys,
>>>>>>>>>>
>>>>>>>>>> The conclusion sounds good to me.
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi, Timo, Jark.
>>>>>>>>>>>
>>>>>>>>>>> I am fine with the new option name.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Shengkai
>>>>>>>>>>>
>>>>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
>>>>>>>>>>>
>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
>>>>>>>>>>>>
>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks
>> like
>>>>>>>>>> single
>>>>>>>>>>>>> statement.
>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements from
>>>>>>>>>> opening
>>>>>>>>>>> to
>>>>>>>>>>>>> closing.
>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
>> default),
>>>>>> and
>>>>>>>>>> we
>>>>>>>>>>>> will
>>>>>>>>>>>>> support this config
>>>>>>>>>>>>> in SQL CLI first, will support it in
>>>>>>>>>> TableEnvironment#executeMultiSql()
>>>>>>>>>>>> in
>>>>>>>>>>>>> the future, right?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <twalthr@apache.org
>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not
>> apply
>>>>>> to
>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense
>>>>>> when
>>>>>>>>>>>>>> executing multi statements. Once we have a
>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
>>>>>>>>>> considered.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms will
>>>>>> also
>>>>>>>>>>> need
>>>>>>>>>>>>>> to have this config option, which is why I would like to
>>>>> avoid a
>>>>>>> SQL
>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to come
>>>>> up
>>>>>>> with
>>>>>>>>>>>>>> this important config option separately.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or
>> other
>>>>>>>>>>> opinions?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
>>>>>>>>>>>>>>> Hi, all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is  we
>>>>>> have
>>>>>>> no
>>>>>>>>>>>> means
>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
>>>>> option
>>>>>>>>>> true
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> use `TableResult#await` together.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Shengkai.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best regards!
>>>>>>>>>> Rui Li
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

From my point of view, I also prefer "sql-client.dml-sync",
because the behavior of this configuration is very clear.
Even if we introduce a new config in the future, e.g. `table.dml-sync`,
we can also deprecate the sql-client one.

Introducing a "table."  configuration without any implementation
will confuse users a lot, as they expect it should take effect on
the Table API.

If we want to introduce an unified "table.dml-sync" option, I prefer
it should be implemented on Table API and affect all the DMLs on
Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
as I have mentioned before [1].

> It would be very straightforward that it affects all the DMLs on SQL CLI
and
TableEnvironment (including `executeSql`, `StatementSet`,
`Table#executeInsert`, etc.).
This can also make SQL CLI easy to support this configuration by passing
through to the TableEnv.

Best,
Jark


[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html


On Wed, 24 Feb 2021 at 10:39, Kurt Young <yk...@gmail.com> wrote:

> If we all agree the option should only be handled by sql client, then why
> don't we
> just call it `sql-client.dml-sync`? As you said, calling it
> `table.dml-sync` but has no
> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big
> confusion for
> users.
>
> The only concern I saw is if we introduce
> "TableEnvironment.executeMultiSql()" in the
> future, how do we control the synchronization between statements? TBH I
> don't really
> see a strong requirement for such interfaces. Right now, we have a pretty
> clear semantic
> of `TableEnv.executeSql`, and it's very convenient for users if they want
> to execute multiple
> sql statements. They can simulate either synced or async execution with
> this building block.
>
> This will introduce slight overhead for users, but compared to the
> confusion we might
> cause if we introduce such a method of our own, I think it's better to wait
> for some more
> feedback.
>
> Best,
> Kurt
>
>
> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org> wrote:
>
> > Hi Kurt,
> >
> > we can also shorten it to `table.dml-sync` if that would help. Then it
> > would confuse users that do a regular `.executeSql("INSERT INTO")` in a
> > notebook session.
> >
> > In any case users will need to learn the semantics of this option.
> > `table.multi-dml-sync` should be described as "If a you are in a multi
> > statement environment, execute DMLs synchrounous.". I don't have a
> > strong opinion on shortening it to `table.dml-sync`.
> >
> > Just to clarify the implementation: The option should be handled by the
> > SQL Client only, but the name can be shared accross platforms.
> >
> > Regards,
> > Timo
> >
> >
> > On 23.02.21 09:54, Kurt Young wrote:
> > > Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
> > >
> > > IIUC this config will take effect with 2 use cases:
> > > 1. SQL client, either interactive mode or executing multiple statements
> > via
> > > -f. In most cases,
> > > there will be only one INSERT INTO statement but we are controlling the
> > > sync/async behavior
> > > with "*multi-dml*-sync". I think this will confuse a lot of users.
> > Besides,
> > >
> > > 2. TableEnvironment#executeMultiSql(), but this is future work, we are
> > also
> > > not sure if we will
> > > really introduce this in the future.
> > >
> > > I would prefer to introduce this option for only sql client. For
> > platforms
> > > Timo mentioned which
> > > need to control such behavior, I think it's easy and flexible to
> > introduce
> > > one on their own.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com>
> > wrote:
> > >
> > >> Hi everyone.
> > >>
> > >> Sorry for the late response.
> > >>
> > >> For `execution.runtime-mode`, I think it's much better than
> > >> `table.execution.mode`. Thanks for Timo's suggestions!
> > >>
> > >> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
> clarify
> > the
> > >> usage of the SHOW CREATE TABLE statements. It should be allowed to
> > specify
> > >> the table that is fully qualified and only works for the table that is
> > >> created by the sql statements.
> > >>
> > >> I have updated the FLIP with suggestions. It seems we have reached a
> > >> consensus, I'd like to start a formal vote for the FLIP.
> > >>
> > >> Please vote +1 to approve the FLIP, or -1 with a comment.
> > >>
> > >> Best,
> > >> Shengkai
> > >>
> > >> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
> > >>
> > >>> Hi Ingo,
> > >>>
> > >>> 1) I think you are right, the table path should be fully-qualified.
> > >>>
> > >>> 2) I think this is also a good point. The SHOW CREATE TABLE
> > >>> only aims to print DDL for the tables registered using SQL CREATE
> TABLE
> > >>> DDL.
> > >>> If a table is registered using Table API,  e.g.
> > >>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
> > >>> currently it's not possible to print DDL for such tables.
> > >>> I think we should point it out in the FLIP.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>>
> > >>>
> > >>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> I have a couple questions about the SHOW CREATE TABLE statement.
> > >>>>
> > >>>> 1) Contrary to the example in the FLIP I think the returned DDL
> should
> > >>>> always have the table identifier fully-qualified. Otherwise the DDL
> > >>> depends
> > >>>> on the current context (catalog/database), which could be
> surprising,
> > >>>> especially since "the same" table can behave differently if created
> in
> > >>>> different catalogs.
> > >>>> 2) How should this handle tables which cannot be fully characterized
> > by
> > >>>> properties only? I don't know if there's an example for this yet,
> but
> > >>>> hypothetically this is not currently a requirement, right? This
> isn't
> > >> as
> > >>>> much of a problem if this syntax is SQL-client-specific, but if it's
> > >>>> general Flink SQL syntax we should consider this (one way or
> another).
> > >>>>
> > >>>>
> > >>>> Regards
> > >>>> Ingo
> > >>>>
> > >>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org>
> > >> wrote:
> > >>>>
> > >>>>> Hi Shengkai,
> > >>>>>
> > >>>>> thanks for updating the FLIP.
> > >>>>>
> > >>>>> I have one last comment for the option `table.execution.mode`.
> Should
> > >>> we
> > >>>>> already use the global Flink option `execution.runtime-mode`
> instead?
> > >>>>>
> > >>>>> We are using Flink's options where possible (e.g. `pipeline.name`
> > >> and
> > >>>>> `parallism.default`) why not also for batch/streaming mode?
> > >>>>>
> > >>>>> The description of the option matches to the Blink planner
> behavior:
> > >>>>>
> > >>>>> ```
> > >>>>> Among other things, this controls task scheduling, network shuffle
> > >>>>> behavior, and time semantics.
> > >>>>> ```
> > >>>>>
> > >>>>> Regards,
> > >>>>> Timo
> > >>>>>
> > >>>>> On 10.02.21 06:30, Shengkai Fang wrote:
> > >>>>>> Hi, guys.
> > >>>>>>
> > >>>>>> I have updated the FLIP.  It seems we have reached agreement.
> Maybe
> > >>> we
> > >>>>> can
> > >>>>>> start the vote soon. If anyone has other questions, please leave
> > >> your
> > >>>>>> comments.
> > >>>>>>
> > >>>>>> Best，
> > >>>>>> Shengkai
> > >>>>>>
> > >>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> > >>>>>>
> > >>>>>>> Hi guys,
> > >>>>>>>
> > >>>>>>> The conclusion sounds good to me.
> > >>>>>>>
> > >>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi, Timo, Jark.
> > >>>>>>>>
> > >>>>>>>> I am fine with the new option name.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Shengkai
> > >>>>>>>>
> > >>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> > >>>>>>>>
> > >>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> > >>>>>>>>>
> > >>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Timo
> > >>>>>>>>>
> > >>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
> > >>>>>>>>>> I'm fine with `table.multi-dml-sync`.
> > >>>>>>>>>>
> > >>>>>>>>>> My previous concern about "multi" is that DML in CLI looks
> like
> > >>>>>>> single
> > >>>>>>>>>> statement.
> > >>>>>>>>>> But we can treat CLI as a multi-line accepting statements from
> > >>>>>>> opening
> > >>>>>>>> to
> > >>>>>>>>>> closing.
> > >>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
> > >>>>>>>>>>
> > >>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by
> default),
> > >>> and
> > >>>>>>> we
> > >>>>>>>>> will
> > >>>>>>>>>> support this config
> > >>>>>>>>>> in SQL CLI first, will support it in
> > >>>>>>> TableEnvironment#executeMultiSql()
> > >>>>>>>>> in
> > >>>>>>>>>> the future, right?
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Jark
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <twalthr@apache.org
> >
> > >>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not
> apply
> > >>> to
> > >>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense
> > >>> when
> > >>>>>>>>>>> executing multi statements. Once we have a
> > >>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
> > >>>>>>> considered.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Maybe we can find a better generic name? Other platforms will
> > >>> also
> > >>>>>>>> need
> > >>>>>>>>>>> to have this config option, which is why I would like to
> > >> avoid a
> > >>>> SQL
> > >>>>>>>>>>> Client specific option. Otherwise every platform has to come
> > >> up
> > >>>> with
> > >>>>>>>>>>> this important config option separately.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or
> other
> > >>>>>>>> opinions?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Timo
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> > >>>>>>>>>>>> Hi, all.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I think it may cause user confused. The main problem is  we
> > >>> have
> > >>>> no
> > >>>>>>>>> means
> > >>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
> > >> option
> > >>>>>>> true
> > >>>>>>>>> and
> > >>>>>>>>>>>> use `TableResult#await` together.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Shengkai.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Best regards!
> > >>>>>>> Rui Li
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Kurt Young <yk...@gmail.com>.

If we all agree the option should only be handled by sql client, then why
don't we
just call it `sql-client.dml-sync`? As you said, calling it
`table.dml-sync` but has no
affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big
confusion for
users.

The only concern I saw is if we introduce
"TableEnvironment.executeMultiSql()" in the
future, how do we control the synchronization between statements? TBH I
don't really
see a strong requirement for such interfaces. Right now, we have a pretty
clear semantic
of `TableEnv.executeSql`, and it's very convenient for users if they want
to execute multiple
sql statements. They can simulate either synced or async execution with
this building block.

This will introduce slight overhead for users, but compared to the
confusion we might
cause if we introduce such a method of our own, I think it's better to wait
for some more
feedback.

Best,
Kurt


On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <tw...@apache.org> wrote:

> Hi Kurt,
>
> we can also shorten it to `table.dml-sync` if that would help. Then it
> would confuse users that do a regular `.executeSql("INSERT INTO")` in a
> notebook session.
>
> In any case users will need to learn the semantics of this option.
> `table.multi-dml-sync` should be described as "If a you are in a multi
> statement environment, execute DMLs synchrounous.". I don't have a
> strong opinion on shortening it to `table.dml-sync`.
>
> Just to clarify the implementation: The option should be handled by the
> SQL Client only, but the name can be shared accross platforms.
>
> Regards,
> Timo
>
>
> On 23.02.21 09:54, Kurt Young wrote:
> > Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
> >
> > IIUC this config will take effect with 2 use cases:
> > 1. SQL client, either interactive mode or executing multiple statements
> via
> > -f. In most cases,
> > there will be only one INSERT INTO statement but we are controlling the
> > sync/async behavior
> > with "*multi-dml*-sync". I think this will confuse a lot of users.
> Besides,
> >
> > 2. TableEnvironment#executeMultiSql(), but this is future work, we are
> also
> > not sure if we will
> > really introduce this in the future.
> >
> > I would prefer to introduce this option for only sql client. For
> platforms
> > Timo mentioned which
> > need to control such behavior, I think it's easy and flexible to
> introduce
> > one on their own.
> >
> > Best,
> > Kurt
> >
> >
> > On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com>
> wrote:
> >
> >> Hi everyone.
> >>
> >> Sorry for the late response.
> >>
> >> For `execution.runtime-mode`, I think it's much better than
> >> `table.execution.mode`. Thanks for Timo's suggestions!
> >>
> >> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify
> the
> >> usage of the SHOW CREATE TABLE statements. It should be allowed to
> specify
> >> the table that is fully qualified and only works for the table that is
> >> created by the sql statements.
> >>
> >> I have updated the FLIP with suggestions. It seems we have reached a
> >> consensus, I'd like to start a formal vote for the FLIP.
> >>
> >> Please vote +1 to approve the FLIP, or -1 with a comment.
> >>
> >> Best,
> >> Shengkai
> >>
> >> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
> >>
> >>> Hi Ingo,
> >>>
> >>> 1) I think you are right, the table path should be fully-qualified.
> >>>
> >>> 2) I think this is also a good point. The SHOW CREATE TABLE
> >>> only aims to print DDL for the tables registered using SQL CREATE TABLE
> >>> DDL.
> >>> If a table is registered using Table API,  e.g.
> >>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
> >>> currently it's not possible to print DDL for such tables.
> >>> I think we should point it out in the FLIP.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>>
> >>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I have a couple questions about the SHOW CREATE TABLE statement.
> >>>>
> >>>> 1) Contrary to the example in the FLIP I think the returned DDL should
> >>>> always have the table identifier fully-qualified. Otherwise the DDL
> >>> depends
> >>>> on the current context (catalog/database), which could be surprising,
> >>>> especially since "the same" table can behave differently if created in
> >>>> different catalogs.
> >>>> 2) How should this handle tables which cannot be fully characterized
> by
> >>>> properties only? I don't know if there's an example for this yet, but
> >>>> hypothetically this is not currently a requirement, right? This isn't
> >> as
> >>>> much of a problem if this syntax is SQL-client-specific, but if it's
> >>>> general Flink SQL syntax we should consider this (one way or another).
> >>>>
> >>>>
> >>>> Regards
> >>>> Ingo
> >>>>
> >>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Shengkai,
> >>>>>
> >>>>> thanks for updating the FLIP.
> >>>>>
> >>>>> I have one last comment for the option `table.execution.mode`. Should
> >>> we
> >>>>> already use the global Flink option `execution.runtime-mode` instead?
> >>>>>
> >>>>> We are using Flink's options where possible (e.g. `pipeline.name`
> >> and
> >>>>> `parallism.default`) why not also for batch/streaming mode?
> >>>>>
> >>>>> The description of the option matches to the Blink planner behavior:
> >>>>>
> >>>>> ```
> >>>>> Among other things, this controls task scheduling, network shuffle
> >>>>> behavior, and time semantics.
> >>>>> ```
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>> On 10.02.21 06:30, Shengkai Fang wrote:
> >>>>>> Hi, guys.
> >>>>>>
> >>>>>> I have updated the FLIP.  It seems we have reached agreement. Maybe
> >>> we
> >>>>> can
> >>>>>> start the vote soon. If anyone has other questions, please leave
> >> your
> >>>>>> comments.
> >>>>>>
> >>>>>> Best，
> >>>>>> Shengkai
> >>>>>>
> >>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> >>>>>>
> >>>>>>> Hi guys,
> >>>>>>>
> >>>>>>> The conclusion sounds good to me.
> >>>>>>>
> >>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi, Timo, Jark.
> >>>>>>>>
> >>>>>>>> I am fine with the new option name.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Shengkai
> >>>>>>>>
> >>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> >>>>>>>>
> >>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> >>>>>>>>>
> >>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
> >>>>>>>>>> I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>
> >>>>>>>>>> My previous concern about "multi" is that DML in CLI looks like
> >>>>>>> single
> >>>>>>>>>> statement.
> >>>>>>>>>> But we can treat CLI as a multi-line accepting statements from
> >>>>>>> opening
> >>>>>>>> to
> >>>>>>>>>> closing.
> >>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
> >>>>>>>>>>
> >>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by default),
> >>> and
> >>>>>>> we
> >>>>>>>>> will
> >>>>>>>>>> support this config
> >>>>>>>>>> in SQL CLI first, will support it in
> >>>>>>> TableEnvironment#executeMultiSql()
> >>>>>>>>> in
> >>>>>>>>>> the future, right?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jark
> >>>>>>>>>>
> >>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>
> >>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not apply
> >>> to
> >>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense
> >>> when
> >>>>>>>>>>> executing multi statements. Once we have a
> >>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
> >>>>>>> considered.
> >>>>>>>>>>>
> >>>>>>>>>>> Maybe we can find a better generic name? Other platforms will
> >>> also
> >>>>>>>> need
> >>>>>>>>>>> to have this config option, which is why I would like to
> >> avoid a
> >>>> SQL
> >>>>>>>>>>> Client specific option. Otherwise every platform has to come
> >> up
> >>>> with
> >>>>>>>>>>> this important config option separately.
> >>>>>>>>>>>
> >>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> >>>>>>>> opinions?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Timo
> >>>>>>>>>>>
> >>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> >>>>>>>>>>>> Hi, all.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think it may cause user confused. The main problem is  we
> >>> have
> >>>> no
> >>>>>>>>> means
> >>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
> >> option
> >>>>>>> true
> >>>>>>>>> and
> >>>>>>>>>>>> use `TableResult#await` together.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Shengkai.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards!
> >>>>>>> Rui Li
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Hi Kurt,

we can also shorten it to `table.dml-sync` if that would help. Then it 
would confuse users that do a regular `.executeSql("INSERT INTO")` in a 
notebook session.

In any case users will need to learn the semantics of this option. 
`table.multi-dml-sync` should be described as "If a you are in a multi 
statement environment, execute DMLs synchrounous.". I don't have a 
strong opinion on shortening it to `table.dml-sync`.

Just to clarify the implementation: The option should be handled by the 
SQL Client only, but the name can be shared accross platforms.

Regards,
Timo


On 23.02.21 09:54, Kurt Young wrote:
> Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
> 
> IIUC this config will take effect with 2 use cases:
> 1. SQL client, either interactive mode or executing multiple statements via
> -f. In most cases,
> there will be only one INSERT INTO statement but we are controlling the
> sync/async behavior
> with "*multi-dml*-sync". I think this will confuse a lot of users. Besides,
> 
> 2. TableEnvironment#executeMultiSql(), but this is future work, we are also
> not sure if we will
> really introduce this in the future.
> 
> I would prefer to introduce this option for only sql client. For platforms
> Timo mentioned which
> need to control such behavior, I think it's easy and flexible to introduce
> one on their own.
> 
> Best,
> Kurt
> 
> 
> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com> wrote:
> 
>> Hi everyone.
>>
>> Sorry for the late response.
>>
>> For `execution.runtime-mode`, I think it's much better than
>> `table.execution.mode`. Thanks for Timo's suggestions!
>>
>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify the
>> usage of the SHOW CREATE TABLE statements. It should be allowed to specify
>> the table that is fully qualified and only works for the table that is
>> created by the sql statements.
>>
>> I have updated the FLIP with suggestions. It seems we have reached a
>> consensus, I'd like to start a formal vote for the FLIP.
>>
>> Please vote +1 to approve the FLIP, or -1 with a comment.
>>
>> Best,
>> Shengkai
>>
>> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
>>
>>> Hi Ingo,
>>>
>>> 1) I think you are right, the table path should be fully-qualified.
>>>
>>> 2) I think this is also a good point. The SHOW CREATE TABLE
>>> only aims to print DDL for the tables registered using SQL CREATE TABLE
>>> DDL.
>>> If a table is registered using Table API,  e.g.
>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
>>> currently it's not possible to print DDL for such tables.
>>> I think we should point it out in the FLIP.
>>>
>>> Best,
>>> Jark
>>>
>>>
>>>
>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a couple questions about the SHOW CREATE TABLE statement.
>>>>
>>>> 1) Contrary to the example in the FLIP I think the returned DDL should
>>>> always have the table identifier fully-qualified. Otherwise the DDL
>>> depends
>>>> on the current context (catalog/database), which could be surprising,
>>>> especially since "the same" table can behave differently if created in
>>>> different catalogs.
>>>> 2) How should this handle tables which cannot be fully characterized by
>>>> properties only? I don't know if there's an example for this yet, but
>>>> hypothetically this is not currently a requirement, right? This isn't
>> as
>>>> much of a problem if this syntax is SQL-client-specific, but if it's
>>>> general Flink SQL syntax we should consider this (one way or another).
>>>>
>>>>
>>>> Regards
>>>> Ingo
>>>>
>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org>
>> wrote:
>>>>
>>>>> Hi Shengkai,
>>>>>
>>>>> thanks for updating the FLIP.
>>>>>
>>>>> I have one last comment for the option `table.execution.mode`. Should
>>> we
>>>>> already use the global Flink option `execution.runtime-mode` instead?
>>>>>
>>>>> We are using Flink's options where possible (e.g. `pipeline.name`
>> and
>>>>> `parallism.default`) why not also for batch/streaming mode?
>>>>>
>>>>> The description of the option matches to the Blink planner behavior:
>>>>>
>>>>> ```
>>>>> Among other things, this controls task scheduling, network shuffle
>>>>> behavior, and time semantics.
>>>>> ```
>>>>>
>>>>> Regards,
>>>>> Timo
>>>>>
>>>>> On 10.02.21 06:30, Shengkai Fang wrote:
>>>>>> Hi, guys.
>>>>>>
>>>>>> I have updated the FLIP.  It seems we have reached agreement. Maybe
>>> we
>>>>> can
>>>>>> start the vote soon. If anyone has other questions, please leave
>> your
>>>>>> comments.
>>>>>>
>>>>>> Best，
>>>>>> Shengkai
>>>>>>
>>>>>> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> The conclusion sounds good to me.
>>>>>>>
>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> Hi, Timo, Jark.
>>>>>>>>
>>>>>>>> I am fine with the new option name.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Shengkai
>>>>>>>>
>>>>>>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
>>>>>>>>
>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
>>>>>>>>>
>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Timo
>>>>>>>>>
>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote:
>>>>>>>>>> I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>
>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks like
>>>>>>> single
>>>>>>>>>> statement.
>>>>>>>>>> But we can treat CLI as a multi-line accepting statements from
>>>>>>> opening
>>>>>>>> to
>>>>>>>>>> closing.
>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`.
>>>>>>>>>>
>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by default),
>>> and
>>>>>>> we
>>>>>>>>> will
>>>>>>>>>> support this config
>>>>>>>>>> in SQL CLI first, will support it in
>>>>>>> TableEnvironment#executeMultiSql()
>>>>>>>>> in
>>>>>>>>>> the future, right?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jark
>>>>>>>>>>
>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not apply
>>> to
>>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense
>>> when
>>>>>>>>>>> executing multi statements. Once we have a
>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be
>>>>>>> considered.
>>>>>>>>>>>
>>>>>>>>>>> Maybe we can find a better generic name? Other platforms will
>>> also
>>>>>>>> need
>>>>>>>>>>> to have this config option, which is why I would like to
>> avoid a
>>>> SQL
>>>>>>>>>>> Client specific option. Otherwise every platform has to come
>> up
>>>> with
>>>>>>>>>>> this important config option separately.
>>>>>>>>>>>
>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
>>>>>>>> opinions?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
>>>>>>>>>>>> Hi, all.
>>>>>>>>>>>>
>>>>>>>>>>>> I think it may cause user confused. The main problem is  we
>>> have
>>>> no
>>>>>>>>> means
>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the
>> option
>>>>>>> true
>>>>>>>>> and
>>>>>>>>>>>> use `TableResult#await` together.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Shengkai.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards!
>>>>>>> Rui Li
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Kurt Young <yk...@gmail.com>.

Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.

IIUC this config will take effect with 2 use cases:
1. SQL client, either interactive mode or executing multiple statements via
-f. In most cases,
there will be only one INSERT INTO statement but we are controlling the
sync/async behavior
with "*multi-dml*-sync". I think this will confuse a lot of users. Besides,

2. TableEnvironment#executeMultiSql(), but this is future work, we are also
not sure if we will
really introduce this in the future.

I would prefer to introduce this option for only sql client. For platforms
Timo mentioned which
need to control such behavior, I think it's easy and flexible to introduce
one on their own.

Best,
Kurt


On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <fs...@gmail.com> wrote:

> Hi everyone.
>
> Sorry for the late response.
>
> For `execution.runtime-mode`, I think it's much better than
> `table.execution.mode`. Thanks for Timo's suggestions!
>
> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify the
> usage of the SHOW CREATE TABLE statements. It should be allowed to specify
> the table that is fully qualified and only works for the table that is
> created by the sql statements.
>
> I have updated the FLIP with suggestions. It seems we have reached a
> consensus, I'd like to start a formal vote for the FLIP.
>
> Please vote +1 to approve the FLIP, or -1 with a comment.
>
> Best,
> Shengkai
>
> Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：
>
> > Hi Ingo,
> >
> > 1) I think you are right, the table path should be fully-qualified.
> >
> > 2) I think this is also a good point. The SHOW CREATE TABLE
> > only aims to print DDL for the tables registered using SQL CREATE TABLE
> > DDL.
> > If a table is registered using Table API,  e.g.
> > `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
> > currently it's not possible to print DDL for such tables.
> > I think we should point it out in the FLIP.
> >
> > Best,
> > Jark
> >
> >
> >
> > On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
> >
> > > Hi all,
> > >
> > > I have a couple questions about the SHOW CREATE TABLE statement.
> > >
> > > 1) Contrary to the example in the FLIP I think the returned DDL should
> > > always have the table identifier fully-qualified. Otherwise the DDL
> > depends
> > > on the current context (catalog/database), which could be surprising,
> > > especially since "the same" table can behave differently if created in
> > > different catalogs.
> > > 2) How should this handle tables which cannot be fully characterized by
> > > properties only? I don't know if there's an example for this yet, but
> > > hypothetically this is not currently a requirement, right? This isn't
> as
> > > much of a problem if this syntax is SQL-client-specific, but if it's
> > > general Flink SQL syntax we should consider this (one way or another).
> > >
> > >
> > > Regards
> > > Ingo
> > >
> > > On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org>
> wrote:
> > >
> > > > Hi Shengkai,
> > > >
> > > > thanks for updating the FLIP.
> > > >
> > > > I have one last comment for the option `table.execution.mode`. Should
> > we
> > > > already use the global Flink option `execution.runtime-mode` instead?
> > > >
> > > > We are using Flink's options where possible (e.g. `pipeline.name`
> and
> > > > `parallism.default`) why not also for batch/streaming mode?
> > > >
> > > > The description of the option matches to the Blink planner behavior:
> > > >
> > > > ```
> > > > Among other things, this controls task scheduling, network shuffle
> > > > behavior, and time semantics.
> > > > ```
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > On 10.02.21 06:30, Shengkai Fang wrote:
> > > > > Hi, guys.
> > > > >
> > > > > I have updated the FLIP.  It seems we have reached agreement. Maybe
> > we
> > > > can
> > > > > start the vote soon. If anyone has other questions, please leave
> your
> > > > > comments.
> > > > >
> > > > > Best，
> > > > > Shengkai
> > > > >
> > > > > Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> > > > >
> > > > >> Hi guys,
> > > > >>
> > > > >> The conclusion sounds good to me.
> > > > >>
> > > > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com>
> > > wrote:
> > > > >>
> > > > >>> Hi, Timo, Jark.
> > > > >>>
> > > > >>> I am fine with the new option name.
> > > > >>>
> > > > >>> Best,
> > > > >>> Shengkai
> > > > >>>
> > > > >>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> > > > >>>
> > > > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> > > > >>>>
> > > > >>>> @Rui, Shengkai: Are you also fine with this conclusion?
> > > > >>>>
> > > > >>>> Thanks,
> > > > >>>> Timo
> > > > >>>>
> > > > >>>> On 09.02.21 10:14, Jark Wu wrote:
> > > > >>>>> I'm fine with `table.multi-dml-sync`.
> > > > >>>>>
> > > > >>>>> My previous concern about "multi" is that DML in CLI looks like
> > > > >> single
> > > > >>>>> statement.
> > > > >>>>> But we can treat CLI as a multi-line accepting statements from
> > > > >> opening
> > > > >>> to
> > > > >>>>> closing.
> > > > >>>>> Thus, I'm fine with `table.multi-dml-sync`.
> > > > >>>>>
> > > > >>>>> So the conclusion is `table.multi-dml-sync` (false by default),
> > and
> > > > >> we
> > > > >>>> will
> > > > >>>>> support this config
> > > > >>>>> in SQL CLI first, will support it in
> > > > >> TableEnvironment#executeMultiSql()
> > > > >>>> in
> > > > >>>>> the future, right?
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>> Jark
> > > > >>>>>
> > > > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
> > > > >> wrote:
> > > > >>>>>
> > > > >>>>>> Hi everyone,
> > > > >>>>>>
> > > > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply
> > to
> > > > >>>>>> regular `executeSql`. Actually, this option makes only sense
> > when
> > > > >>>>>> executing multi statements. Once we have a
> > > > >>>>>> `TableEnvironment.executeMultiSql()` this config could be
> > > > >> considered.
> > > > >>>>>>
> > > > >>>>>> Maybe we can find a better generic name? Other platforms will
> > also
> > > > >>> need
> > > > >>>>>> to have this config option, which is why I would like to
> avoid a
> > > SQL
> > > > >>>>>> Client specific option. Otherwise every platform has to come
> up
> > > with
> > > > >>>>>> this important config option separately.
> > > > >>>>>>
> > > > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> > > > >>> opinions?
> > > > >>>>>>
> > > > >>>>>> Regards,
> > > > >>>>>> Timo
> > > > >>>>>>
> > > > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> > > > >>>>>>> Hi, all.
> > > > >>>>>>>
> > > > >>>>>>> I think it may cause user confused. The main problem is  we
> > have
> > > no
> > > > >>>> means
> > > > >>>>>>> to detect the conflict configuration, e.g. users set the
> option
> > > > >> true
> > > > >>>> and
> > > > >>>>>>> use `TableResult#await` together.
> > > > >>>>>>>
> > > > >>>>>>> Best,
> > > > >>>>>>> Shengkai.
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best regards!
> > > > >> Rui Li
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi everyone.

Sorry for the late response.

For `execution.runtime-mode`, I think it's much better than
`table.execution.mode`. Thanks for Timo's suggestions!

For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify the
usage of the SHOW CREATE TABLE statements. It should be allowed to specify
the table that is fully qualified and only works for the table that is
created by the sql statements.

I have updated the FLIP with suggestions. It seems we have reached a
consensus, I'd like to start a formal vote for the FLIP.

Please vote +1 to approve the FLIP, or -1 with a comment.

Best,
Shengkai

Jark Wu <im...@gmail.com> 于2021年2月15日周一 下午10:50写道：

> Hi Ingo,
>
> 1) I think you are right, the table path should be fully-qualified.
>
> 2) I think this is also a good point. The SHOW CREATE TABLE
> only aims to print DDL for the tables registered using SQL CREATE TABLE
> DDL.
> If a table is registered using Table API,  e.g.
> `StreamTableEnvironment#createTemporaryView(String, DataStream)`,
> currently it's not possible to print DDL for such tables.
> I think we should point it out in the FLIP.
>
> Best,
> Jark
>
>
>
> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:
>
> > Hi all,
> >
> > I have a couple questions about the SHOW CREATE TABLE statement.
> >
> > 1) Contrary to the example in the FLIP I think the returned DDL should
> > always have the table identifier fully-qualified. Otherwise the DDL
> depends
> > on the current context (catalog/database), which could be surprising,
> > especially since "the same" table can behave differently if created in
> > different catalogs.
> > 2) How should this handle tables which cannot be fully characterized by
> > properties only? I don't know if there's an example for this yet, but
> > hypothetically this is not currently a requirement, right? This isn't as
> > much of a problem if this syntax is SQL-client-specific, but if it's
> > general Flink SQL syntax we should consider this (one way or another).
> >
> >
> > Regards
> > Ingo
> >
> > On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org> wrote:
> >
> > > Hi Shengkai,
> > >
> > > thanks for updating the FLIP.
> > >
> > > I have one last comment for the option `table.execution.mode`. Should
> we
> > > already use the global Flink option `execution.runtime-mode` instead?
> > >
> > > We are using Flink's options where possible (e.g. `pipeline.name` and
> > > `parallism.default`) why not also for batch/streaming mode?
> > >
> > > The description of the option matches to the Blink planner behavior:
> > >
> > > ```
> > > Among other things, this controls task scheduling, network shuffle
> > > behavior, and time semantics.
> > > ```
> > >
> > > Regards,
> > > Timo
> > >
> > > On 10.02.21 06:30, Shengkai Fang wrote:
> > > > Hi, guys.
> > > >
> > > > I have updated the FLIP.  It seems we have reached agreement. Maybe
> we
> > > can
> > > > start the vote soon. If anyone has other questions, please leave your
> > > > comments.
> > > >
> > > > Best，
> > > > Shengkai
> > > >
> > > > Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> > > >
> > > >> Hi guys,
> > > >>
> > > >> The conclusion sounds good to me.
> > > >>
> > > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com>
> > wrote:
> > > >>
> > > >>> Hi, Timo, Jark.
> > > >>>
> > > >>> I am fine with the new option name.
> > > >>>
> > > >>> Best,
> > > >>> Shengkai
> > > >>>
> > > >>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> > > >>>
> > > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> > > >>>>
> > > >>>> @Rui, Shengkai: Are you also fine with this conclusion?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Timo
> > > >>>>
> > > >>>> On 09.02.21 10:14, Jark Wu wrote:
> > > >>>>> I'm fine with `table.multi-dml-sync`.
> > > >>>>>
> > > >>>>> My previous concern about "multi" is that DML in CLI looks like
> > > >> single
> > > >>>>> statement.
> > > >>>>> But we can treat CLI as a multi-line accepting statements from
> > > >> opening
> > > >>> to
> > > >>>>> closing.
> > > >>>>> Thus, I'm fine with `table.multi-dml-sync`.
> > > >>>>>
> > > >>>>> So the conclusion is `table.multi-dml-sync` (false by default),
> and
> > > >> we
> > > >>>> will
> > > >>>>> support this config
> > > >>>>> in SQL CLI first, will support it in
> > > >> TableEnvironment#executeMultiSql()
> > > >>>> in
> > > >>>>> the future, right?
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Jark
> > > >>>>>
> > > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
> > > >> wrote:
> > > >>>>>
> > > >>>>>> Hi everyone,
> > > >>>>>>
> > > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply
> to
> > > >>>>>> regular `executeSql`. Actually, this option makes only sense
> when
> > > >>>>>> executing multi statements. Once we have a
> > > >>>>>> `TableEnvironment.executeMultiSql()` this config could be
> > > >> considered.
> > > >>>>>>
> > > >>>>>> Maybe we can find a better generic name? Other platforms will
> also
> > > >>> need
> > > >>>>>> to have this config option, which is why I would like to avoid a
> > SQL
> > > >>>>>> Client specific option. Otherwise every platform has to come up
> > with
> > > >>>>>> this important config option separately.
> > > >>>>>>
> > > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> > > >>> opinions?
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>> Timo
> > > >>>>>>
> > > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> > > >>>>>>> Hi, all.
> > > >>>>>>>
> > > >>>>>>> I think it may cause user confused. The main problem is  we
> have
> > no
> > > >>>> means
> > > >>>>>>> to detect the conflict configuration, e.g. users set the option
> > > >> true
> > > >>>> and
> > > >>>>>>> use `TableResult#await` together.
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Shengkai.
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Best regards!
> > > >> Rui Li
> > > >>
> > > >
> > >
> > >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi Ingo,

1) I think you are right, the table path should be fully-qualified.

2) I think this is also a good point. The SHOW CREATE TABLE
only aims to print DDL for the tables registered using SQL CREATE TABLE
DDL.
If a table is registered using Table API,  e.g.
`StreamTableEnvironment#createTemporaryView(String, DataStream)`,
currently it's not possible to print DDL for such tables.
I think we should point it out in the FLIP.

Best,
Jark



On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <in...@ververica.com> wrote:

> Hi all,
>
> I have a couple questions about the SHOW CREATE TABLE statement.
>
> 1) Contrary to the example in the FLIP I think the returned DDL should
> always have the table identifier fully-qualified. Otherwise the DDL depends
> on the current context (catalog/database), which could be surprising,
> especially since "the same" table can behave differently if created in
> different catalogs.
> 2) How should this handle tables which cannot be fully characterized by
> properties only? I don't know if there's an example for this yet, but
> hypothetically this is not currently a requirement, right? This isn't as
> much of a problem if this syntax is SQL-client-specific, but if it's
> general Flink SQL syntax we should consider this (one way or another).
>
>
> Regards
> Ingo
>
> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org> wrote:
>
> > Hi Shengkai,
> >
> > thanks for updating the FLIP.
> >
> > I have one last comment for the option `table.execution.mode`. Should we
> > already use the global Flink option `execution.runtime-mode` instead?
> >
> > We are using Flink's options where possible (e.g. `pipeline.name` and
> > `parallism.default`) why not also for batch/streaming mode?
> >
> > The description of the option matches to the Blink planner behavior:
> >
> > ```
> > Among other things, this controls task scheduling, network shuffle
> > behavior, and time semantics.
> > ```
> >
> > Regards,
> > Timo
> >
> > On 10.02.21 06:30, Shengkai Fang wrote:
> > > Hi, guys.
> > >
> > > I have updated the FLIP.  It seems we have reached agreement. Maybe we
> > can
> > > start the vote soon. If anyone has other questions, please leave your
> > > comments.
> > >
> > > Best，
> > > Shengkai
> > >
> > > Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> > >
> > >> Hi guys,
> > >>
> > >> The conclusion sounds good to me.
> > >>
> > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com>
> wrote:
> > >>
> > >>> Hi, Timo, Jark.
> > >>>
> > >>> I am fine with the new option name.
> > >>>
> > >>> Best,
> > >>> Shengkai
> > >>>
> > >>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> > >>>
> > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> > >>>>
> > >>>> @Rui, Shengkai: Are you also fine with this conclusion?
> > >>>>
> > >>>> Thanks,
> > >>>> Timo
> > >>>>
> > >>>> On 09.02.21 10:14, Jark Wu wrote:
> > >>>>> I'm fine with `table.multi-dml-sync`.
> > >>>>>
> > >>>>> My previous concern about "multi" is that DML in CLI looks like
> > >> single
> > >>>>> statement.
> > >>>>> But we can treat CLI as a multi-line accepting statements from
> > >> opening
> > >>> to
> > >>>>> closing.
> > >>>>> Thus, I'm fine with `table.multi-dml-sync`.
> > >>>>>
> > >>>>> So the conclusion is `table.multi-dml-sync` (false by default), and
> > >> we
> > >>>> will
> > >>>>> support this config
> > >>>>> in SQL CLI first, will support it in
> > >> TableEnvironment#executeMultiSql()
> > >>>> in
> > >>>>> the future, right?
> > >>>>>
> > >>>>> Best,
> > >>>>> Jark
> > >>>>>
> > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
> > >> wrote:
> > >>>>>
> > >>>>>> Hi everyone,
> > >>>>>>
> > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply to
> > >>>>>> regular `executeSql`. Actually, this option makes only sense when
> > >>>>>> executing multi statements. Once we have a
> > >>>>>> `TableEnvironment.executeMultiSql()` this config could be
> > >> considered.
> > >>>>>>
> > >>>>>> Maybe we can find a better generic name? Other platforms will also
> > >>> need
> > >>>>>> to have this config option, which is why I would like to avoid a
> SQL
> > >>>>>> Client specific option. Otherwise every platform has to come up
> with
> > >>>>>> this important config option separately.
> > >>>>>>
> > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> > >>> opinions?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Timo
> > >>>>>>
> > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> > >>>>>>> Hi, all.
> > >>>>>>>
> > >>>>>>> I think it may cause user confused. The main problem is  we have
> no
> > >>>> means
> > >>>>>>> to detect the conflict configuration, e.g. users set the option
> > >> true
> > >>>> and
> > >>>>>>> use `TableResult#await` together.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Shengkai.
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Best regards!
> > >> Rui Li
> > >>
> > >
> >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Ingo Bürk <in...@ververica.com>.

Hi all,

I have a couple questions about the SHOW CREATE TABLE statement.

1) Contrary to the example in the FLIP I think the returned DDL should
always have the table identifier fully-qualified. Otherwise the DDL depends
on the current context (catalog/database), which could be surprising,
especially since "the same" table can behave differently if created in
different catalogs.
2) How should this handle tables which cannot be fully characterized by
properties only? I don't know if there's an example for this yet, but
hypothetically this is not currently a requirement, right? This isn't as
much of a problem if this syntax is SQL-client-specific, but if it's
general Flink SQL syntax we should consider this (one way or another).


Regards
Ingo

On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <tw...@apache.org> wrote:

> Hi Shengkai,
>
> thanks for updating the FLIP.
>
> I have one last comment for the option `table.execution.mode`. Should we
> already use the global Flink option `execution.runtime-mode` instead?
>
> We are using Flink's options where possible (e.g. `pipeline.name` and
> `parallism.default`) why not also for batch/streaming mode?
>
> The description of the option matches to the Blink planner behavior:
>
> ```
> Among other things, this controls task scheduling, network shuffle
> behavior, and time semantics.
> ```
>
> Regards,
> Timo
>
> On 10.02.21 06:30, Shengkai Fang wrote:
> > Hi, guys.
> >
> > I have updated the FLIP.  It seems we have reached agreement. Maybe we
> can
> > start the vote soon. If anyone has other questions, please leave your
> > comments.
> >
> > Best，
> > Shengkai
> >
> > Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> >
> >> Hi guys,
> >>
> >> The conclusion sounds good to me.
> >>
> >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com> wrote:
> >>
> >>> Hi, Timo, Jark.
> >>>
> >>> I am fine with the new option name.
> >>>
> >>> Best,
> >>> Shengkai
> >>>
> >>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> >>>
> >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> >>>>
> >>>> @Rui, Shengkai: Are you also fine with this conclusion?
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>> On 09.02.21 10:14, Jark Wu wrote:
> >>>>> I'm fine with `table.multi-dml-sync`.
> >>>>>
> >>>>> My previous concern about "multi" is that DML in CLI looks like
> >> single
> >>>>> statement.
> >>>>> But we can treat CLI as a multi-line accepting statements from
> >> opening
> >>> to
> >>>>> closing.
> >>>>> Thus, I'm fine with `table.multi-dml-sync`.
> >>>>>
> >>>>> So the conclusion is `table.multi-dml-sync` (false by default), and
> >> we
> >>>> will
> >>>>> support this config
> >>>>> in SQL CLI first, will support it in
> >> TableEnvironment#executeMultiSql()
> >>>> in
> >>>>> the future, right?
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply to
> >>>>>> regular `executeSql`. Actually, this option makes only sense when
> >>>>>> executing multi statements. Once we have a
> >>>>>> `TableEnvironment.executeMultiSql()` this config could be
> >> considered.
> >>>>>>
> >>>>>> Maybe we can find a better generic name? Other platforms will also
> >>> need
> >>>>>> to have this config option, which is why I would like to avoid a SQL
> >>>>>> Client specific option. Otherwise every platform has to come up with
> >>>>>> this important config option separately.
> >>>>>>
> >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> >>> opinions?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> >>>>>>> Hi, all.
> >>>>>>>
> >>>>>>> I think it may cause user confused. The main problem is  we have no
> >>>> means
> >>>>>>> to detect the conflict configuration, e.g. users set the option
> >> true
> >>>> and
> >>>>>>> use `TableResult#await` together.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Shengkai.
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> Best regards!
> >> Rui Li
> >>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Hi Shengkai,

thanks for updating the FLIP.

I have one last comment for the option `table.execution.mode`. Should we 
already use the global Flink option `execution.runtime-mode` instead?

We are using Flink's options where possible (e.g. `pipeline.name` and 
`parallism.default`) why not also for batch/streaming mode?

The description of the option matches to the Blink planner behavior:

```
Among other things, this controls task scheduling, network shuffle 
behavior, and time semantics.
```

Regards,
Timo

On 10.02.21 06:30, Shengkai Fang wrote:
> Hi, guys.
> 
> I have updated the FLIP.  It seems we have reached agreement. Maybe we can
> start the vote soon. If anyone has other questions, please leave your
> comments.
> 
> Best，
> Shengkai
> 
> Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：
> 
>> Hi guys,
>>
>> The conclusion sounds good to me.
>>
>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com> wrote:
>>
>>> Hi, Timo, Jark.
>>>
>>> I am fine with the new option name.
>>>
>>> Best,
>>> Shengkai
>>>
>>> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
>>>
>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
>>>>
>>>> @Rui, Shengkai: Are you also fine with this conclusion?
>>>>
>>>> Thanks,
>>>> Timo
>>>>
>>>> On 09.02.21 10:14, Jark Wu wrote:
>>>>> I'm fine with `table.multi-dml-sync`.
>>>>>
>>>>> My previous concern about "multi" is that DML in CLI looks like
>> single
>>>>> statement.
>>>>> But we can treat CLI as a multi-line accepting statements from
>> opening
>>> to
>>>>> closing.
>>>>> Thus, I'm fine with `table.multi-dml-sync`.
>>>>>
>>>>> So the conclusion is `table.multi-dml-sync` (false by default), and
>> we
>>>> will
>>>>> support this config
>>>>> in SQL CLI first, will support it in
>> TableEnvironment#executeMultiSql()
>>>> in
>>>>> the future, right?
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I understand Rui's concerns. `table.dml-sync` should not apply to
>>>>>> regular `executeSql`. Actually, this option makes only sense when
>>>>>> executing multi statements. Once we have a
>>>>>> `TableEnvironment.executeMultiSql()` this config could be
>> considered.
>>>>>>
>>>>>> Maybe we can find a better generic name? Other platforms will also
>>> need
>>>>>> to have this config option, which is why I would like to avoid a SQL
>>>>>> Client specific option. Otherwise every platform has to come up with
>>>>>> this important config option separately.
>>>>>>
>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
>>> opinions?
>>>>>>
>>>>>> Regards,
>>>>>> Timo
>>>>>>
>>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
>>>>>>> Hi, all.
>>>>>>>
>>>>>>> I think it may cause user confused. The main problem is  we have no
>>>> means
>>>>>>> to detect the conflict configuration, e.g. users set the option
>> true
>>>> and
>>>>>>> use `TableResult#await` together.
>>>>>>>
>>>>>>> Best,
>>>>>>> Shengkai.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi, guys.

I have updated the FLIP.  It seems we have reached agreement. Maybe we can
start the vote soon. If anyone has other questions, please leave your
comments.

Best，
Shengkai

Rui Li <li...@gmail.com>于2021年2月9日 周二下午7:52写道：

> Hi guys,
>
> The conclusion sounds good to me.
>
> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com> wrote:
>
> > Hi, Timo, Jark.
> >
> > I am fine with the new option name.
> >
> > Best,
> > Shengkai
> >
> > Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
> >
> > > Yes, `TableEnvironment#executeMultiSql()` can be future work.
> > >
> > > @Rui, Shengkai: Are you also fine with this conclusion?
> > >
> > > Thanks,
> > > Timo
> > >
> > > On 09.02.21 10:14, Jark Wu wrote:
> > > > I'm fine with `table.multi-dml-sync`.
> > > >
> > > > My previous concern about "multi" is that DML in CLI looks like
> single
> > > > statement.
> > > > But we can treat CLI as a multi-line accepting statements from
> opening
> > to
> > > > closing.
> > > > Thus, I'm fine with `table.multi-dml-sync`.
> > > >
> > > > So the conclusion is `table.multi-dml-sync` (false by default), and
> we
> > > will
> > > > support this config
> > > > in SQL CLI first, will support it in
> TableEnvironment#executeMultiSql()
> > > in
> > > > the future, right?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org>
> wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I understand Rui's concerns. `table.dml-sync` should not apply to
> > > >> regular `executeSql`. Actually, this option makes only sense when
> > > >> executing multi statements. Once we have a
> > > >> `TableEnvironment.executeMultiSql()` this config could be
> considered.
> > > >>
> > > >> Maybe we can find a better generic name? Other platforms will also
> > need
> > > >> to have this config option, which is why I would like to avoid a SQL
> > > >> Client specific option. Otherwise every platform has to come up with
> > > >> this important config option separately.
> > > >>
> > > >> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> > opinions?
> > > >>
> > > >> Regards,
> > > >> Timo
> > > >>
> > > >> On 09.02.21 08:50, Shengkai Fang wrote:
> > > >>> Hi, all.
> > > >>>
> > > >>> I think it may cause user confused. The main problem is  we have no
> > > means
> > > >>> to detect the conflict configuration, e.g. users set the option
> true
> > > and
> > > >>> use `TableResult#await` together.
> > > >>>
> > > >>> Best,
> > > >>> Shengkai.
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>
>
> --
> Best regards!
> Rui Li
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Rui Li <li...@gmail.com>.

Hi guys,

The conclusion sounds good to me.

On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <fs...@gmail.com> wrote:

> Hi, Timo, Jark.
>
> I am fine with the new option name.
>
> Best,
> Shengkai
>
> Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：
>
> > Yes, `TableEnvironment#executeMultiSql()` can be future work.
> >
> > @Rui, Shengkai: Are you also fine with this conclusion?
> >
> > Thanks,
> > Timo
> >
> > On 09.02.21 10:14, Jark Wu wrote:
> > > I'm fine with `table.multi-dml-sync`.
> > >
> > > My previous concern about "multi" is that DML in CLI looks like single
> > > statement.
> > > But we can treat CLI as a multi-line accepting statements from opening
> to
> > > closing.
> > > Thus, I'm fine with `table.multi-dml-sync`.
> > >
> > > So the conclusion is `table.multi-dml-sync` (false by default), and we
> > will
> > > support this config
> > > in SQL CLI first, will support it in TableEnvironment#executeMultiSql()
> > in
> > > the future, right?
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I understand Rui's concerns. `table.dml-sync` should not apply to
> > >> regular `executeSql`. Actually, this option makes only sense when
> > >> executing multi statements. Once we have a
> > >> `TableEnvironment.executeMultiSql()` this config could be considered.
> > >>
> > >> Maybe we can find a better generic name? Other platforms will also
> need
> > >> to have this config option, which is why I would like to avoid a SQL
> > >> Client specific option. Otherwise every platform has to come up with
> > >> this important config option separately.
> > >>
> > >> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> opinions?
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >> On 09.02.21 08:50, Shengkai Fang wrote:
> > >>> Hi, all.
> > >>>
> > >>> I think it may cause user confused. The main problem is  we have no
> > means
> > >>> to detect the conflict configuration, e.g. users set the option true
> > and
> > >>> use `TableResult#await` together.
> > >>>
> > >>> Best,
> > >>> Shengkai.
> > >>>
> > >>
> > >>
> > >
> >
> >
>


-- 
Best regards!
Rui Li

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi, Timo, Jark.

I am fine with the new option name.

Best,
Shengkai

Timo Walther <tw...@apache.org>于2021年2月9日 周二下午5:35写道：

> Yes, `TableEnvironment#executeMultiSql()` can be future work.
>
> @Rui, Shengkai: Are you also fine with this conclusion?
>
> Thanks,
> Timo
>
> On 09.02.21 10:14, Jark Wu wrote:
> > I'm fine with `table.multi-dml-sync`.
> >
> > My previous concern about "multi" is that DML in CLI looks like single
> > statement.
> > But we can treat CLI as a multi-line accepting statements from opening to
> > closing.
> > Thus, I'm fine with `table.multi-dml-sync`.
> >
> > So the conclusion is `table.multi-dml-sync` (false by default), and we
> will
> > support this config
> > in SQL CLI first, will support it in TableEnvironment#executeMultiSql()
> in
> > the future, right?
> >
> > Best,
> > Jark
> >
> > On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org> wrote:
> >
> >> Hi everyone,
> >>
> >> I understand Rui's concerns. `table.dml-sync` should not apply to
> >> regular `executeSql`. Actually, this option makes only sense when
> >> executing multi statements. Once we have a
> >> `TableEnvironment.executeMultiSql()` this config could be considered.
> >>
> >> Maybe we can find a better generic name? Other platforms will also need
> >> to have this config option, which is why I would like to avoid a SQL
> >> Client specific option. Otherwise every platform has to come up with
> >> this important config option separately.
> >>
> >> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?
> >>
> >> Regards,
> >> Timo
> >>
> >> On 09.02.21 08:50, Shengkai Fang wrote:
> >>> Hi, all.
> >>>
> >>> I think it may cause user confused. The main problem is  we have no
> means
> >>> to detect the conflict configuration, e.g. users set the option true
> and
> >>> use `TableResult#await` together.
> >>>
> >>> Best,
> >>> Shengkai.
> >>>
> >>
> >>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Yes, `TableEnvironment#executeMultiSql()` can be future work.

@Rui, Shengkai: Are you also fine with this conclusion?

Thanks,
Timo

On 09.02.21 10:14, Jark Wu wrote:
> I'm fine with `table.multi-dml-sync`.
> 
> My previous concern about "multi" is that DML in CLI looks like single
> statement.
> But we can treat CLI as a multi-line accepting statements from opening to
> closing.
> Thus, I'm fine with `table.multi-dml-sync`.
> 
> So the conclusion is `table.multi-dml-sync` (false by default), and we will
> support this config
> in SQL CLI first, will support it in TableEnvironment#executeMultiSql() in
> the future, right?
> 
> Best,
> Jark
> 
> On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org> wrote:
> 
>> Hi everyone,
>>
>> I understand Rui's concerns. `table.dml-sync` should not apply to
>> regular `executeSql`. Actually, this option makes only sense when
>> executing multi statements. Once we have a
>> `TableEnvironment.executeMultiSql()` this config could be considered.
>>
>> Maybe we can find a better generic name? Other platforms will also need
>> to have this config option, which is why I would like to avoid a SQL
>> Client specific option. Otherwise every platform has to come up with
>> this important config option separately.
>>
>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?
>>
>> Regards,
>> Timo
>>
>> On 09.02.21 08:50, Shengkai Fang wrote:
>>> Hi, all.
>>>
>>> I think it may cause user confused. The main problem is  we have no means
>>> to detect the conflict configuration, e.g. users set the option true and
>>> use `TableResult#await` together.
>>>
>>> Best,
>>> Shengkai.
>>>
>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

I'm fine with `table.multi-dml-sync`.

My previous concern about "multi" is that DML in CLI looks like single
statement.
But we can treat CLI as a multi-line accepting statements from opening to
closing.
Thus, I'm fine with `table.multi-dml-sync`.

So the conclusion is `table.multi-dml-sync` (false by default), and we will
support this config
in SQL CLI first, will support it in TableEnvironment#executeMultiSql() in
the future, right?

Best,
Jark

On Tue, 9 Feb 2021 at 16:37, Timo Walther <tw...@apache.org> wrote:

> Hi everyone,
>
> I understand Rui's concerns. `table.dml-sync` should not apply to
> regular `executeSql`. Actually, this option makes only sense when
> executing multi statements. Once we have a
> `TableEnvironment.executeMultiSql()` this config could be considered.
>
> Maybe we can find a better generic name? Other platforms will also need
> to have this config option, which is why I would like to avoid a SQL
> Client specific option. Otherwise every platform has to come up with
> this important config option separately.
>
> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?
>
> Regards,
> Timo
>
> On 09.02.21 08:50, Shengkai Fang wrote:
> > Hi, all.
> >
> > I think it may cause user confused. The main problem is  we have no means
> > to detect the conflict configuration, e.g. users set the option true and
> > use `TableResult#await` together.
> >
> > Best,
> > Shengkai.
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Hi everyone,

I understand Rui's concerns. `table.dml-sync` should not apply to 
regular `executeSql`. Actually, this option makes only sense when 
executing multi statements. Once we have a 
`TableEnvironment.executeMultiSql()` this config could be considered.

Maybe we can find a better generic name? Other platforms will also need 
to have this config option, which is why I would like to avoid a SQL 
Client specific option. Otherwise every platform has to come up with 
this important config option separately.

Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?

Regards,
Timo

On 09.02.21 08:50, Shengkai Fang wrote:
> Hi, all.
> 
> I think it may cause user confused. The main problem is  we have no means
> to detect the conflict configuration, e.g. users set the option true and
> use `TableResult#await` together.
> 
> Best,
> Shengkai.
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi, all.

I think it may cause user confused. The main problem is  we have no means
to detect the conflict configuration, e.g. users set the option true and
use `TableResult#await` together.

Best,
Shengkai.

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Rui Li <li...@gmail.com>.

Hi Jark,

I agree it's more consistent if table API also respects this config. But on
the other hand, it might make the `executeSql` API a little trickier to
use, because now DDL, DQL and DML all behave differently from one another:

   - DDL: always sync
   - DQL: always async
   - DML: can be sync or async according to the config

So I slightly prefer to apply this config only to the SQL Client. API users
can always easily achieve sync or async behavior in their code. And the
config option is just meant to give SQL Client users a chance to do the
same thing. But let's hear more opinions from other folks.

On Tue, Feb 9, 2021 at 10:21 AM Jark Wu <im...@gmail.com> wrote:

> Hi Rui,
>
> That's a good point. From the naming of the option, I prefer to get sync
> behavior.
> It would be very straightforward that it affects all the DMLs on SQL CLI
> and
> TableEnvironment (including `executeSql`, `StatementSet`,
> `Table#executeInsert`, etc.).
> This can also make SQL CLI easy to support this configuration by passing
> through to the TableEnv.
>
> Best,
> Jark
>
> On Tue, 9 Feb 2021 at 10:07, Rui Li <li...@gmail.com> wrote:
>
>> Hi,
>>
>> Glad to see we have reached consensus on option #2. +1 to it.
>>
>> Regarding the name, I'm fine with `table.dml-async`. But I wonder whether
>> this config also applies to table API. E.g. if a user
>> sets table.dml-async=false and calls TableEnvironment::executeSql to run a
>> DML, will he get sync behavior?
>>
>> On Mon, Feb 8, 2021 at 11:28 PM Jark Wu <im...@gmail.com> wrote:
>>
>>> Ah, I just forgot the option name.
>>>
>>> I'm also fine with `table.dml-async`.
>>>
>>> What do you think @Rui Li <li...@gmail.com> @Shengkai Fang
>>> <fs...@gmail.com> ?
>>>
>>> Best,
>>> Jark
>>>
>>> On Mon, 8 Feb 2021 at 23:06, Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Great to hear that. Can someone update the FLIP a final time before we
>>>> start a vote?
>>>>
>>>> We should quickly discuss how we would like to name the config option
>>>> for the async/sync mode. I heared voices internally that are strongly
>>>> against calling it "detach" due to historical reasons with a Flink job
>>>> detach mode. How about `table.dml-async`?
>>>>
>>>> Thanks,
>>>> Timo
>>>>
>>>>
>>>> On 08.02.21 15:55, Jark Wu wrote:
>>>> > Thanks Timo,
>>>> >
>>>> > I'm +1 for option#2 too.
>>>> >
>>>> > I think we have addressed all the concerns and can start a vote.
>>>> >
>>>> > Best,
>>>> > Jark
>>>> >
>>>> > On Mon, 8 Feb 2021 at 22:19, Timo Walther <tw...@apache.org> wrote:
>>>> >
>>>> >> Hi Jark,
>>>> >>
>>>> >> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
>>>> >>
>>>> >> So let's stick to the config option approach.
>>>> >>
>>>> >> However, I strongly believe that we should not use the
>>>> batch/streaming
>>>> >> mode for deriving semantics. This discussion is similar to time
>>>> function
>>>> >> discussion. We should not derive sync/async submission behavior from
>>>> a
>>>> >> flag that should only influence runtime operators and the incremental
>>>> >> computation. Statements for bounded streams should have the same
>>>> >> semantics in batch mode.
>>>> >>
>>>> >> I think your proposed option 2) is a good tradeoff. For the following
>>>> >> reasons:
>>>> >>
>>>> >> pros:
>>>> >> - by default, batch and streaming behave exactly the same
>>>> >> - SQL Client CLI behavior does not change compared to 1.12 and
>>>> remains
>>>> >> async for batch and streaming
>>>> >> - consistent with the async Table API behavior
>>>> >>
>>>> >> con:
>>>> >> - batch files are not 100% SQL compliant by default
>>>> >>
>>>> >> The last item might not be an issue since we can expect that users
>>>> have
>>>> >> long-running jobs and prefer async execution in most cases.
>>>> >>
>>>> >> Regards,
>>>> >> Timo
>>>> >>
>>>> >>
>>>> >> On 08.02.21 14:15, Jark Wu wrote:
>>>> >>> Hi Timo,
>>>> >>>
>>>> >>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;...
>>>> END;`.
>>>> >>> Because it makes submitting streaming jobs very verbose, every
>>>> INSERT
>>>> >> INTO
>>>> >>> and STATEMENT SET must be wrapped in the ASYNC clause which is
>>>> >>> not user-friendly and not backward-compatible.
>>>> >>>
>>>> >>> I agree we will have unified behavior but this is at the cost of
>>>> hurting
>>>> >>> our main users.
>>>> >>> I'm worried that end users can't understand the technical decision,
>>>> and
>>>> >>> they would
>>>> >>> feel streaming is harder to use.
>>>> >>>
>>>> >>> If we want to have an unified behavior, and let users decide what's
>>>> the
>>>> >>> desirable behavior, I prefer to have a config option. A Flink
>>>> cluster can
>>>> >>> be set to async, then
>>>> >>> users don't need to wrap every DML in an ASYNC clause. This is the
>>>> least
>>>> >>> intrusive
>>>> >>> way to the users.
>>>> >>>
>>>> >>>
>>>> >>> Personally, I'm fine with following options in priority:
>>>> >>>
>>>> >>> 1) sync for batch DML and async for streaming DML
>>>> >>> ==> only breaks batch behavior, but makes both happy
>>>> >>>
>>>> >>> 2) async for both batch and streaming DML, and can be set to sync
>>>> via a
>>>> >>> configuration.
>>>> >>> ==> compatible, and provides flexible configurable behavior
>>>> >>>
>>>> >>> 3) sync for both batch and streaming DML, and can be
>>>> >>>       set to async via a configuration.
>>>> >>> ==> +0 for this, because it breaks all the compatibility, esp. our
>>>> main
>>>> >>> users.
>>>> >>>
>>>> >>> Best,
>>>> >>> Jark
>>>> >>>
>>>> >>> On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org>
>>>> wrote:
>>>> >>>
>>>> >>>> Hi Jark, Hi Rui,
>>>> >>>>
>>>> >>>> 1) How should we execute statements in CLI and in file? Should
>>>> there be
>>>> >>>> a difference?
>>>> >>>> So it seems we have consensus here with unified bahavior. Even
>>>> though
>>>> >>>> this means we are breaking existing batch INSERT INTOs that were
>>>> >>>> asynchronous before.
>>>> >>>>
>>>> >>>> 2) Should we have different behavior for batch and streaming?
>>>> >>>> I think also batch users prefer async behavior because usually even
>>>> >>>> those pipelines take some time to execute. But we need should
>>>> stick to
>>>> >>>> standard SQL blocking semantics.
>>>> >>>>
>>>> >>>> What are your opinions on making async explicit in SQL via `BEGIN
>>>> ASYNC;
>>>> >>>> ... END;`? This would allow us to really have unified semantics
>>>> because
>>>> >>>> batch and streaming would behave the same?
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Timo
>>>> >>>>
>>>> >>>>
>>>> >>>> On 07.02.21 04:46, Rui Li wrote:
>>>> >>>>> Hi Timo,
>>>> >>>>>
>>>> >>>>> I agree with Jark that we should provide consistent experience
>>>> >> regarding
>>>> >>>>> SQL CLI and files. Some systems even allow users to execute SQL
>>>> files
>>>> >> in
>>>> >>>>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to
>>>> support that
>>>> >>>> in
>>>> >>>>> the future, it's a little tricky to decide whether that should be
>>>> >> treated
>>>> >>>>> as CLI or file.
>>>> >>>>>
>>>> >>>>> I actually prefer a config option and let users decide what's the
>>>> >>>>> desirable behavior. But if we have agreed not to use options, I'm
>>>> also
>>>> >>>> fine
>>>> >>>>> with Alternative #1.
>>>> >>>>>
>>>> >>>>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>>> Hi Timo,
>>>> >>>>>>
>>>> >>>>>> 1) How should we execute statements in CLI and in file? Should
>>>> there
>>>> >> be
>>>> >>>> a
>>>> >>>>>> difference?
>>>> >>>>>> I do think we should unify the behavior of CLI and SQL files. SQL
>>>> >> files
>>>> >>>> can
>>>> >>>>>> be thought of as a shortcut of
>>>> >>>>>> "start CLI" => "copy content of SQL files" => "past content in
>>>> CLI".
>>>> >>>>>> Actually, we already did this in kafka_e2e.sql [1].
>>>> >>>>>> I think it's hard for users to understand why SQL files behave
>>>> >>>> differently
>>>> >>>>>> from CLI, all the other systems don't have such a difference.
>>>> >>>>>>
>>>> >>>>>> If we distinguish SQL files and CLI, should there be a
>>>> difference in
>>>> >>>> JDBC
>>>> >>>>>> driver and UI platform?
>>>> >>>>>> Personally, they all should have consistent behavior.
>>>> >>>>>>
>>>> >>>>>> 2) Should we have different behavior for batch and streaming?
>>>> >>>>>> I think we all agree streaming users prefer async execution,
>>>> otherwise
>>>> >>>> it's
>>>> >>>>>> weird and difficult to use if the
>>>> >>>>>> submit script or CLI never exists. On the other hand, batch SQL
>>>> users
>>>> >>>> are
>>>> >>>>>> used to SQL statements being
>>>> >>>>>> executed blockly.
>>>> >>>>>>
>>>> >>>>>> Either unified async execution or unified sync execution, will
>>>> hurt
>>>> >> one
>>>> >>>>>> side of the streaming
>>>> >>>>>> batch users. In order to make both sides happy, I think we can
>>>> have
>>>> >>>>>> different behavior for batch and streaming.
>>>> >>>>>> There are many essential differences between batch and stream
>>>> >> systems, I
>>>> >>>>>> think it's normal to have some
>>>> >>>>>> different behaviors, and the behavior doesn't break the unified
>>>> batch
>>>> >>>>>> stream semantics.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Thus, I'm +1 to Alternative 1:
>>>> >>>>>> We consider batch/streaming mode and block for batch INSERT INTO
>>>> and
>>>> >>>> async
>>>> >>>>>> for streaming INSERT INTO/STATEMENT SET.
>>>> >>>>>> And this behavior is consistent across CLI and files.
>>>> >>>>>>
>>>> >>>>>> Best,
>>>> >>>>>> Jark
>>>> >>>>>>
>>>> >>>>>> [1]:
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
>>>> >>>>>>
>>>> >>>>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>>> Hi Jark,
>>>> >>>>>>>
>>>> >>>>>>> thanks for the summary. I hope we can also find a good long-term
>>>> >>>>>>> solution on the async/sync execution behavior topic.
>>>> >>>>>>>
>>>> >>>>>>> It should be discussed in a bigger round because it is (similar
>>>> to
>>>> >> the
>>>> >>>>>>> time function discussion) related to batch-streaming unification
>>>> >> where
>>>> >>>>>>> we should stick to the SQL standard to some degree but also
>>>> need to
>>>> >>>> come
>>>> >>>>>>> up with good streaming semantics.
>>>> >>>>>>>
>>>> >>>>>>> Let me summarize the problem again to hear opinions:
>>>> >>>>>>>
>>>> >>>>>>> - Batch SQL users are used to execute SQL files sequentially
>>>> (from
>>>> >> top
>>>> >>>>>>> to bottom).
>>>> >>>>>>> - Batch SQL users are used to SQL statements being executed
>>>> blocking.
>>>> >>>>>>> One after the other. Esp. when moving around data with INSERT
>>>> INTO.
>>>> >>>>>>> - Streaming users prefer async execution because unbounded
>>>> stream are
>>>> >>>>>>> more frequent than bounded streams.
>>>> >>>>>>> - We decided to make Flink Table API is async because in a
>>>> >> programming
>>>> >>>>>>> language it is easy to call `.await()` on the result to make it
>>>> >>>> blocking.
>>>> >>>>>>> - INSERT INTO statements in the current SQL Client
>>>> implementation are
>>>> >>>>>>> always submitted asynchrounous.
>>>> >>>>>>> - Other client's such as Ververica platform allow only one
>>>> INSERT
>>>> >> INTO
>>>> >>>>>>> or a STATEMENT SET at the end of a file that will run
>>>> >> asynchrounously.
>>>> >>>>>>>
>>>> >>>>>>> Questions:
>>>> >>>>>>>
>>>> >>>>>>> - How should we execute statements in CLI and in file? Should
>>>> there
>>>> >> be
>>>> >>>> a
>>>> >>>>>>> difference?
>>>> >>>>>>> - Should we have different behavior for batch and streaming?
>>>> >>>>>>> - Shall we solve parts with a config option or is it better to
>>>> make
>>>> >> it
>>>> >>>>>>> explicit in the SQL job definition because it influences the
>>>> >> semantics
>>>> >>>>>>> of multiple INSERT INTOs?
>>>> >>>>>>>
>>>> >>>>>>> Let me summarize my opinion at the moment:
>>>> >>>>>>>
>>>> >>>>>>> - SQL files should always be executed blocking by default.
>>>> Because
>>>> >> they
>>>> >>>>>>> could potentially contain a long list of INSERT INTO
>>>> statements. This
>>>> >>>>>>> would be SQL standard compliant.
>>>> >>>>>>> - If we allow async execution, we should make this explicit in
>>>> the
>>>> >> SQL
>>>> >>>>>>> file via `BEGIN ASYNC; ... END;`.
>>>> >>>>>>> - In the CLI, we always execute async to maintain the old
>>>> behavior.
>>>> >> We
>>>> >>>>>>> can also assume that people are only using the CLI to fire
>>>> statements
>>>> >>>>>>> and close the CLI afterwards.
>>>> >>>>>>>
>>>> >>>>>>> Alternative 1:
>>>> >>>>>>> - We consider batch/streaming mode and block for batch INSERT
>>>> INTO
>>>> >> and
>>>> >>>>>>> async for streaming INSERT INTO/STATEMENT SET
>>>> >>>>>>>
>>>> >>>>>>> What do others think?
>>>> >>>>>>>
>>>> >>>>>>> Regards,
>>>> >>>>>>> Timo
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> On 05.02.21 04:03, Jark Wu wrote:
>>>> >>>>>>>> Hi all,
>>>> >>>>>>>>
>>>> >>>>>>>> After an offline discussion with Timo and Kurt, we have
>>>> reached some
>>>> >>>>>>>> consensus.
>>>> >>>>>>>> Please correct me if I am wrong or missed anything.
>>>> >>>>>>>>
>>>> >>>>>>>> 1) We will introduce "table.planner" and "table.execution-mode"
>>>> >>>> instead
>>>> >>>>>>> of
>>>> >>>>>>>> "sql-client" prefix,
>>>> >>>>>>>> and add `TableEnvironment.create(Configuration)` interface.
>>>> These 2
>>>> >>>>>>> options
>>>> >>>>>>>> can only be used
>>>> >>>>>>>> for tableEnv initialization. If used after initialization,
>>>> Flink
>>>> >>>> should
>>>> >>>>>>>> throw an exception. We may can
>>>> >>>>>>>> support dynamic switch the planner in the future.
>>>> >>>>>>>>
>>>> >>>>>>>> 2) We will have only one parser,
>>>> >>>>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a
>>>> string
>>>> >>>>>>>> statement, and returns a list of Operation. It will first use
>>>> regex
>>>> >> to
>>>> >>>>>>>> match some special statement,
>>>> >>>>>>>>      e.g. SET, ADD JAR, others will be delegated to the
>>>> underlying
>>>> >>>> Calcite
>>>> >>>>>>>> parser. The Parser can
>>>> >>>>>>>> have different implementations, e.g. HiveParser.
>>>> >>>>>>>>
>>>> >>>>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink
>>>> dialect.
>>>> >>>> But
>>>> >>>>>>> we
>>>> >>>>>>>> can allow
>>>> >>>>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
>>>> >>>>>>>>
>>>> >>>>>>>> 4) We don't have a conclusion for async/sync execution
>>>> behavior yet.
>>>> >>>>>>>>
>>>> >>>>>>>> Best,
>>>> >>>>>>>> Jark
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>>> Hi Ingo,
>>>> >>>>>>>>>
>>>> >>>>>>>>> Since we have supported the WITH syntax and SET command since
>>>> v1.9
>>>> >>>>>>> [1][2],
>>>> >>>>>>>>> and
>>>> >>>>>>>>> we have never received such complaints, I think it's fine for
>>>> such
>>>> >>>>>>>>> differences.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
>>>> >>>>>> requires
>>>> >>>>>>>>> string literal keys[3],
>>>> >>>>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
>>>> >>>>>>>>>
>>>> >>>>>>>>> Best,
>>>> >>>>>>>>> Jark
>>>> >>>>>>>>>
>>>> >>>>>>>>> [1]:
>>>> >>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
>>>> >>>>>>>>> [2]:
>>>> >>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
>>>> >>>>>>>>> [3]:
>>>> >>>>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>>>> >>>>>>>>> [4]:
>>>> >>>>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
>>>> >>>>>>>>> (search "set mapred.reduce.tasks=32")
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com>
>>>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>> Hi,
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> regarding the (un-)quoted question, compatibility is of
>>>> course an
>>>> >>>>>>>>>> important
>>>> >>>>>>>>>> argument, but in terms of consistency I'd find it a bit
>>>> surprising
>>>> >>>>>> that
>>>> >>>>>>>>>> WITH handles it differently than SET, and I wonder if that
>>>> could
>>>> >>>>>> cause
>>>> >>>>>>>>>> friction for developers when writing their SQL.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Regards
>>>> >>>>>>>>>> Ingo
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com>
>>>> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>> Hi all,
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Regarding "One Parser", I think it's not possible for now
>>>> because
>>>> >>>>>>>>>> Calcite
>>>> >>>>>>>>>>> parser can't parse
>>>> >>>>>>>>>>> special characters (e.g. "-") unless quoting them as string
>>>> >>>>>> literals.
>>>> >>>>>>>>>>> That's why the WITH option
>>>> >>>>>>>>>>> key are string literals not identifiers.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
>>>> >>>>>>>>>>> /local/my-home/test.jar
>>>> >>>>>>>>>>> have the same
>>>> >>>>>>>>>>> problems. That's why we propose two parser, one splits
>>>> lines into
>>>> >>>>>>>>>> multiple
>>>> >>>>>>>>>>> statements and match special
>>>> >>>>>>>>>>> command through regex which is light-weight, and delegate
>>>> other
>>>> >>>>>>>>>> statements
>>>> >>>>>>>>>>> to the other parser which is Calcite parser.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Note: we should stick on the unquoted SET
>>>> >>>>>>> table.exec.mini-batch.enabled
>>>> >>>>>>>>>> =
>>>> >>>>>>>>>>> true syntax,
>>>> >>>>>>>>>>> both for backward-compatibility and easy-to-use, and all the
>>>> >> other
>>>> >>>>>>>>>> systems
>>>> >>>>>>>>>>> don't have quotes on the key.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Regarding "table.planner" vs "sql-client.planner",
>>>> >>>>>>>>>>> if we want to use "table.planner", I think we should explain
>>>> >>>> clearly
>>>> >>>>>>>>>> what's
>>>> >>>>>>>>>>> the scope it can be used in documentation.
>>>> >>>>>>>>>>> Otherwise, there will be users complaining why the planner
>>>> >> doesn't
>>>> >>>>>>>>>> change
>>>> >>>>>>>>>>> when setting the configuration on TableEnv.
>>>> >>>>>>>>>>> Would be better throwing an exception to indicate users
>>>> it's now
>>>> >>>>>>>>>> allowed to
>>>> >>>>>>>>>>> change planner after TableEnv is initialized.
>>>> >>>>>>>>>>> However, it seems not easy to implement.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Best,
>>>> >>>>>>>>>>> Jark
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <
>>>> godfreyhe@gmail.com>
>>>> >>>>>> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>> Hi everyone,
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Regarding "table.planner" and "table.execution-mode"
>>>> >>>>>>>>>>>> If we define that those two options are just used to
>>>> initialize
>>>> >>>> the
>>>> >>>>>>>>>>>> TableEnvironment, +1 for introducing table options instead
>>>> of
>>>> >>>>>>>>>> sql-client
>>>> >>>>>>>>>>>> options.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Regarding "the sql client, we will maintain two parsers",
>>>> I want
>>>> >>>> to
>>>> >>>>>>>>>> give
>>>> >>>>>>>>>>>> more inputs:
>>>> >>>>>>>>>>>> We want to introduce sql-gateway into the Flink project
>>>> (see
>>>> >>>>>> FLIP-24
>>>> >>>>>>> &
>>>> >>>>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the
>>>> CLI
>>>> >>>>>> client
>>>> >>>>>>>>>> and
>>>> >>>>>>>>>>>> the gateway service will communicate through Rest API. The
>>>> " ADD
>>>> >>>>>> JAR
>>>> >>>>>>>>>>>> /local/path/jar " will be executed in the CLI client
>>>> machine. So
>>>> >>>>>> when
>>>> >>>>>>>>>> we
>>>> >>>>>>>>>>>> submit a sql file which contains multiple statements, the
>>>> CLI
>>>> >>>>>> client
>>>> >>>>>>>>>>> needs
>>>> >>>>>>>>>>>> to pick out the "ADD JAR" line, and also statements need
>>>> to be
>>>> >>>>>>>>>> submitted
>>>> >>>>>>>>>>> or
>>>> >>>>>>>>>>>> executed one by one to make sure the result is correct.
>>>> The sql
>>>> >>>>>> file
>>>> >>>>>>>>>> may
>>>> >>>>>>>>>>> be
>>>> >>>>>>>>>>>> look like:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> SET xxx=yyy;
>>>> >>>>>>>>>>>> create table my_table ...;
>>>> >>>>>>>>>>>> create table my_sink ...;
>>>> >>>>>>>>>>>> ADD JAR /local/path/jar1;
>>>> >>>>>>>>>>>> create function my_udf as com....MyUdf;
>>>> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>> >>>>>>>>>>>> REMOVE JAR /local/path/jar1;
>>>> >>>>>>>>>>>> drop function my_udf;
>>>> >>>>>>>>>>>> ADD JAR /local/path/jar2;
>>>> >>>>>>>>>>>> create function my_udf as com....MyUdf2;
>>>> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> The lines need to be splitted into multiple statements
>>>> first in
>>>> >>>> the
>>>> >>>>>>>>>> CLI
>>>> >>>>>>>>>>>> client, there are two approaches:
>>>> >>>>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser
>>>> >> splits
>>>> >>>>>>> the
>>>> >>>>>>>>>>>> lines and tells which lines are "ADD JAR".
>>>> >>>>>>>>>>>> pro: there is only one parser
>>>> >>>>>>>>>>>> cons: It's a little heavy that the CLI client depends on
>>>> the
>>>> >>>>>>>>>> sql-parser,
>>>> >>>>>>>>>>>> because the CLI client is just a simple tool which
>>>> receives the
>>>> >>>>>> user
>>>> >>>>>>>>>>>> commands and displays the result. The non "ADD JAR"
>>>> command will
>>>> >>>> be
>>>> >>>>>>>>>>> parsed
>>>> >>>>>>>>>>>> twice.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> 2. The CLI client splits the lines into multiple
>>>> statements and
>>>> >>>>>> finds
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>> ADD JAR command through regex matching.
>>>> >>>>>>>>>>>> pro: The CLI client is very light-weight.
>>>> >>>>>>>>>>>> cons: there are two parsers.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> (personally, I prefer the second option)
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them
>>>> both.
>>>> >>>>>>>>>>>> For default dialect, we support SHOW JARS, but if we
>>>> switch to
>>>> >>>> hive
>>>> >>>>>>>>>>>> dialect, LIST JARS is also supported.
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>
>>>> >>>>>>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>>>> >>>>>>>>>>>> [2]
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>> Godfrey
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Hi guys,
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent
>>>> with
>>>> >>>>>> other
>>>> >>>>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion
>>>> about
>>>> >>>>>> REMOVE
>>>> >>>>>>>>>> vs
>>>> >>>>>>>>>>>>> DELETE though.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as
>>>> I
>>>> >> know,
>>>> >>>>>>>>>> most
>>>> >>>>>>>>>>>>> users who are requesting these features are previously
>>>> hive
>>>> >>>> users.
>>>> >>>>>>>>>> So I
>>>> >>>>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
>>>> >>>>>> REMOVE/DELETE
>>>> >>>>>>>>>>> JARS
>>>> >>>>>>>>>>>>> as synonyms? It's just like lots of systems accept both
>>>> EXIT
>>>> >> and
>>>> >>>>>>>>>> QUIT
>>>> >>>>>>>>>>> as
>>>> >>>>>>>>>>>>> the command to terminate the program. So if that's not
>>>> hard to
>>>> >>>>>>>>>> achieve,
>>>> >>>>>>>>>>>> and
>>>> >>>>>>>>>>>>> will make users happier, I don't see a reason why we must
>>>> >> choose
>>>> >>>>>> one
>>>> >>>>>>>>>>> over
>>>> >>>>>>>>>>>>> the other.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <
>>>> >> twalthr@apache.org
>>>> >>>>>
>>>> >>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Hi everyone,
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> some feedback regarding the open questions. Maybe we can
>>>> >> discuss
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to
>>>> determine
>>>> >>>> how
>>>> >>>>>>>>>> we
>>>> >>>>>>>>>>>>>> proceed with this in the near future.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> 1) "whether the table environment has the ability to
>>>> update
>>>> >>>>>>>>>> itself"
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Maybe there was some misunderstanding. I don't think
>>>> that we
>>>> >>>>>>>>>> should
>>>> >>>>>>>>>>>>>> support
>>>> >>>>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>>>> >>>>>>>>>>>>>> "old")`. Instead I'm proposing to support
>>>> >>>>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner
>>>> and
>>>> >>>>>>>>>> execution
>>>> >>>>>>>>>>>>>> mode are read immediately and a subsequent changes to
>>>> these
>>>> >>>>>>>>>> options
>>>> >>>>>>>>>>>> will
>>>> >>>>>>>>>>>>>> have no effect. We are doing it similar in `new
>>>> >>>>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
>>>> >>>>>>>>>> ConfigOption's
>>>> >>>>>>>>>>>>>> must not be SQL Client specific but can be part of the
>>>> core
>>>> >>>> table
>>>> >>>>>>>>>>> code
>>>> >>>>>>>>>>>>>> base. Many users would like to get a 100% preconfigured
>>>> >>>>>>>>>> environment
>>>> >>>>>>>>>>>> from
>>>> >>>>>>>>>>>>>> just Configuration. And this is not possible right now.
>>>> We can
>>>> >>>>>>>>>> solve
>>>> >>>>>>>>>>>>>> both use cases in one change.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> I remember we had some discussion about this and decided
>>>> that
>>>> >> we
>>>> >>>>>>>>>>> would
>>>> >>>>>>>>>>>>>> like to maintain only one parser. In the end it is "One
>>>> Flink
>>>> >>>>>> SQL"
>>>> >>>>>>>>>>>> where
>>>> >>>>>>>>>>>>>> commands influence each other also with respect to
>>>> keywords.
>>>> >> It
>>>> >>>>>>>>>>> should
>>>> >>>>>>>>>>>>>> be fine to include the SQL Client commands in the Flink
>>>> >> parser.
>>>> >>>>>> Of
>>>> >>>>>>>>>>>>>> cource the table environment would not be able to handle
>>>> the
>>>> >>>>>>>>>>>> `Operation`
>>>> >>>>>>>>>>>>>> instance that would be the result but we can introduce
>>>> hooks
>>>> >> to
>>>> >>>>>>>>>>> handle
>>>> >>>>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Can we skip `table.job.async` in the first version? We
>>>> should
>>>> >>>>>>>>>> further
>>>> >>>>>>>>>>>>>> discuss whether we introduce a special SQL clause for
>>>> wrapping
>>>> >>>>>>>>>> async
>>>> >>>>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming
>>>> >>>> queries
>>>> >>>>>>>>>> we
>>>> >>>>>>>>>>>>>> need to be careful and should force users to either "one
>>>> >> INSERT
>>>> >>>>>>>>>> INTO"
>>>> >>>>>>>>>>>> or
>>>> >>>>>>>>>>>>>> "one STATEMENT SET".
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> 3) 4) "HIVE also uses these commands"
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> In general, Hive is not a good reference. Aligning the
>>>> >> commands
>>>> >>>>>>>>>> more
>>>> >>>>>>>>>>>>>> with the remaining commands should be our goal. We just
>>>> had a
>>>> >>>>>>>>>> MODULE
>>>> >>>>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But
>>>> it is
>>>> >>>> true
>>>> >>>>>>>>>>> that
>>>> >>>>>>>>>>>>>> JARs are not part of the catalog which is why I would
>>>> not use
>>>> >>>>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the
>>>> English
>>>> >>>>>>>>>>> language.
>>>> >>>>>>>>>>>>>> Take a look at the Java collection API as another
>>>> example.
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> 6) "Most of the commands should belong to the table
>>>> >> environment"
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
>>>> >>>>>>>>>> understand.
>>>> >>>>>>>>>>> It
>>>> >>>>>>>>>>>>>> is good to see that most commends will be available in
>>>> >>>>>>>>>>>> TableEnvironment.
>>>> >>>>>>>>>>>>>> However, I would also support SET and RESET for
>>>> consistency.
>>>> >>>>>>>>>> Again,
>>>> >>>>>>>>>>>> from
>>>> >>>>>>>>>>>>>> an architectural point of view, if we would allow some
>>>> kind of
>>>> >>>>>>>>>>>>>> `Operation` hook in table environment, we could check
>>>> for SQL
>>>> >>>>>>>>>> Client
>>>> >>>>>>>>>>>>>> specific options and forward to regular
>>>> >>>>>>>>>>> `TableConfig.getConfiguration`
>>>> >>>>>>>>>>>>>> otherwise. What do you think?
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>>> Timo
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
>>>> >>>>>>>>>>>>>>> Hi Timo,
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> I will respond some of the questions:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> 1) SQL client specific options
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends
>>>> on
>>>> >> where
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>>>>> configuration takes effect.
>>>> >>>>>>>>>>>>>>> If it is a table configuration, we should make clear
>>>> what's
>>>> >> the
>>>> >>>>>>>>>>>>> behavior
>>>> >>>>>>>>>>>>>>> when users change
>>>> >>>>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
>>>> >>>>>>>>>>>>>> `sql-client.execution.mode`
>>>> >>>>>>>>>>>>>>> are something special
>>>> >>>>>>>>>>>>>>> that can't be changed after TableEnvironment has been
>>>> >>>>>>>>>> initialized.
>>>> >>>>>>>>>>>> You
>>>> >>>>>>>>>>>>>> can
>>>> >>>>>>>>>>>>>>> see
>>>> >>>>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`
>>>> method
>>>> >> to
>>>> >>>>>>>>>>>> override
>>>> >>>>>>>>>>>>>>> configuration after
>>>> >>>>>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Therefore, I think it would be better to still use
>>>> >>>>>>>>>>>>> `sql-client.planner`
>>>> >>>>>>>>>>>>>>> and `sql-client.execution.mode`.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> 2) Execution file
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> >From my point of view, there is a big difference
>>>> between
>>>> >>>>>>>>>>>>>>> `sql-client.job.detach` and
>>>> >>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
>>>> >>>>>>>>>> `sql-client.job.detach`
>>>> >>>>>>>>>>>> will
>>>> >>>>>>>>>>>>>>> affect every single DML statement
>>>> >>>>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I
>>>> >> think
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>>> single
>>>> >>>>>>>>>>>>>>> DML statement in the interactive
>>>> >>>>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
>>>> >>>>>>>>>>>>>>> tEnv#executeMultiSql.
>>>> >>>>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
>>>> >>>>>>>>>>>>> `table.multi-sql-async`.
>>>> >>>>>>>>>>>>>>> I just find that runtime provides a configuration called
>>>> >>>>>>>>>>>>>>> "execution.attached" [1] which is false by default
>>>> >>>>>>>>>>>>>>> which specifies if the pipeline is submitted in
>>>> attached or
>>>> >>>>>>>>>>> detached
>>>> >>>>>>>>>>>>>> mode.
>>>> >>>>>>>>>>>>>>> It provides exactly the same
>>>> >>>>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you
>>>> think
>>>> >>>>>>>>>> about
>>>> >>>>>>>>>>>> using
>>>> >>>>>>>>>>>>>>> this option?
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> If we also want to support this config in
>>>> TableEnvironment, I
>>>> >>>>>>>>>> think
>>>> >>>>>>>>>>>> it
>>>> >>>>>>>>>>>>>>> should also affect the DML execution
>>>> >>>>>>>>>>>>>>>       of `tEnv#executeSql()`, not only DMLs in
>>>> >>>>>>>>>>> `tEnv#executeMultiSql()`.
>>>> >>>>>>>>>>>>>>> Therefore, the behavior may look like this:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")
>>>> ==>
>>>> >> async
>>>> >>>>>>>>>> by
>>>> >>>>>>>>>>>>>> default
>>>> >>>>>>>>>>>>>>> tableResult.await()   ==> manually block until finish
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >> tEnv.getConfig().getConfiguration().setString("execution.attached",
>>>> >>>>>>>>>>>>>> "true")
>>>> >>>>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")
>>>> ==>
>>>> >>>> sync,
>>>> >>>>>>>>>>>> don't
>>>> >>>>>>>>>>>>>> need
>>>> >>>>>>>>>>>>>>> to wait on the TableResult
>>>> >>>>>>>>>>>>>>> tEnv.executeMultiSql(
>>>> >>>>>>>>>>>>>>> """
>>>> >>>>>>>>>>>>>>> CREATE TABLE ....  ==> always sync
>>>> >>>>>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration
>>>> above
>>>> >>>>>>>>>>>>>>> SET execution.attached = false;
>>>> >>>>>>>>>>>>>>> INSERT INTO ...  => async
>>>> >>>>>>>>>>>>>>> """)
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
>>>> >>>>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two
>>>> >> separate
>>>> >>>>>>>>>>>> topics,
>>>> >>>>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
>>>> >>>>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
>>>> >>>>>>>>>> statements.
>>>> >>>>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but
>>>> don't want
>>>> >>>>>>>>>> it to
>>>> >>>>>>>>>>>>> block
>>>> >>>>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>> Jark
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> [1]:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <
>>>> >> fskmine@gmail.com>
>>>> >>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Hi, Timo.
>>>> >>>>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts
>>>> >> about
>>>> >>>>>>>>>> your
>>>> >>>>>>>>>>>>>>>> feedback.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the
>>>> >> table
>>>> >>>>>>>>>>>>>> environment
>>>> >>>>>>>>>>>>>>>> has the ability to update itself. Let's take a simple
>>>> >> program
>>>> >>>>>>>>>> as
>>>> >>>>>>>>>>> an
>>>> >>>>>>>>>>>>>>>> example.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> ```
>>>> >>>>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
>>>> >>>>>>>>>> "old");
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> tEnv.executeSql("...");
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> ```
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> If we regard this option as a table option, users
>>>> don't have
>>>> >>>> to
>>>> >>>>>>>>>>>> create
>>>> >>>>>>>>>>>>>>>> another table environment manually. In that case, tEnv
>>>> needs
>>>> >>>> to
>>>> >>>>>>>>>>>> check
>>>> >>>>>>>>>>>>>>>> whether the current mode and planner are the same as
>>>> before
>>>> >>>>>>>>>> when
>>>> >>>>>>>>>>>>>> executeSql
>>>> >>>>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the
>>>> table
>>>> >>>>>>>>>>>> environment,
>>>> >>>>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment
>>>> but
>>>> >> set
>>>> >>>>>>>>>> old
>>>> >>>>>>>>>>>>>> planner
>>>> >>>>>>>>>>>>>>>> and batch mode. But when we make this option as a sql
>>>> client
>>>> >>>>>>>>>>> option,
>>>> >>>>>>>>>>>>>> users
>>>> >>>>>>>>>>>>>>>> only use the SET command to change the setting. We can
>>>> >> rebuild
>>>> >>>>>>>>>> a
>>>> >>>>>>>>>>> new
>>>> >>>>>>>>>>>>>> table
>>>> >>>>>>>>>>>>>>>> environment when set successes.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the
>>>> >> implementation
>>>> >>>>>>>>>>> before
>>>> >>>>>>>>>>>>>>>> continuing this topic. In the sql client, we will
>>>> maintain
>>>> >> two
>>>> >>>>>>>>>>>>> parsers.
>>>> >>>>>>>>>>>>>> The
>>>> >>>>>>>>>>>>>>>> first parser(client parser) will only match the sql
>>>> client
>>>> >>>>>>>>>>> commands.
>>>> >>>>>>>>>>>>> If
>>>> >>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>> client parser can't parse the statement, we will
>>>> leverage
>>>> >> the
>>>> >>>>>>>>>>> power
>>>> >>>>>>>>>>>> of
>>>> >>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>> table environment to execute. According to our
>>>> blueprint,
>>>> >>>>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql
>>>> client.
>>>> >>>>>>>>>>> Therefore,
>>>> >>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for
>>>> this
>>>> >>>> FLIP.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> But if we need to introduce the
>>>> >>>>>>>>>> `TableEnvironment.executeMultiSql`
>>>> >>>>>>>>>>>> in
>>>> >>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>> future, I think it's OK to use the option
>>>> >>>>>>>>>> `table.multi-sql-async`
>>>> >>>>>>>>>>>>> rather
>>>> >>>>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the
>>>> name
>>>> >> is
>>>> >>>>>>>>>> not
>>>> >>>>>>>>>>>>>> suitable
>>>> >>>>>>>>>>>>>>>> because the name is confusing for others. When setting
>>>> the
>>>> >>>>>>>>>> option
>>>> >>>>>>>>>>>>>> false, we
>>>> >>>>>>>>>>>>>>>> just mean it will block the execution of the INSERT
>>>> INTO
>>>> >>>>>>>>>>> statement,
>>>> >>>>>>>>>>>>> not
>>>> >>>>>>>>>>>>>> DDL
>>>> >>>>>>>>>>>>>>>> or others(other sql statements are always executed
>>>> >>>>>>>>>> synchronously).
>>>> >>>>>>>>>>>> So
>>>> >>>>>>>>>>>>>> how
>>>> >>>>>>>>>>>>>>>> about `table.job.async`? It only works for the
>>>> sql-client
>>>> >> and
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
>>>> >>>>>>>>>> environment
>>>> >>>>>>>>>>>>> will
>>>> >>>>>>>>>>>>>>>> return the result until the job finishes.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE
>>>> JAR
>>>> >> and
>>>> >>>>>>>>>>> LIST
>>>> >>>>>>>>>>>>> JAR
>>>> >>>>>>>>>>>>>>>> because HIVE also uses these commands to add the jar
>>>> into
>>>> >> the
>>>> >>>>>>>>>>>>> classpath
>>>> >>>>>>>>>>>>>> or
>>>> >>>>>>>>>>>>>>>> delete the jar. If we use  such commands, it can
>>>> reduce our
>>>> >>>>>>>>>> work
>>>> >>>>>>>>>>> for
>>>> >>>>>>>>>>>>>> hive
>>>> >>>>>>>>>>>>>>>> compatibility.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are
>>>> not
>>>> >>>>>>>>>>>> maintained
>>>> >>>>>>>>>>>>> by
>>>> >>>>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent
>>>> with SQL
>>>> >>>>>>>>>>> grammar,
>>>> >>>>>>>>>>>>>> maybe
>>>> >>>>>>>>>>>>>>>> we should use
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
>>>> >>>>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
>>>> >>>>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
>>>> >>>>>>>>>> consistent.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should
>>>> belong to
>>>> >> the
>>>> >>>>>>>>>>> table
>>>> >>>>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE>
>>>> tag to
>>>> >>>>>>>>>>>> identify
>>>> >>>>>>>>>>>>>> which
>>>> >>>>>>>>>>>>>>>> commands should belong to the sql client and which
>>>> commands
>>>> >>>>>>>>>> should
>>>> >>>>>>>>>>>>>> belong
>>>> >>>>>>>>>>>>>>>> to the table environment. I also add a new section
>>>> about
>>>> >>>>>>>>>>>>> implementation
>>>> >>>>>>>>>>>>>>>> details in the FLIP.
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>> Shengkai
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二
>>>> 下午6:43写道：
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will
>>>> give the
>>>> >>>>>>>>>> SQL
>>>> >>>>>>>>>>>>> Client
>>>> >>>>>>>>>>>>>> a
>>>> >>>>>>>>>>>>>>>>> very good update and make it production ready.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Here is some feedback from my side:
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> 1) SQL client specific options
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> I don't think that `sql-client.planner` and
>>>> >>>>>>>>>>>>> `sql-client.execution.mode`
>>>> >>>>>>>>>>>>>>>>> are SQL Client specific. Similar to
>>>> >>>>>>>>>> `StreamExecutionEnvironment`
>>>> >>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added
>>>> recently,
>>>> >> we
>>>> >>>>>>>>>>>> should
>>>> >>>>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we
>>>> >> offer
>>>> >>>>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
>>>> >>>>>>>>>>> `table.planner`
>>>> >>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>> `table.execution-mode` to
>>>> >>>>>>>>>>>>>>>>>
>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> 2) Execution file
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1]
>>>> >> including
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>>>> mailing
>>>> >>>>>>>>>>>>>>>>> list thread at that time? Could you further elaborate
>>>> how
>>>> >> the
>>>> >>>>>>>>>>>>>>>>> multi-statement execution should work for a unified
>>>> >>>>>>>>>>> batch/streaming
>>>> >>>>>>>>>>>>>>>>> story? According to our past discussions, each line
>>>> in an
>>>> >>>>>>>>>>> execution
>>>> >>>>>>>>>>>>>> file
>>>> >>>>>>>>>>>>>>>>> should be executed blocking which means a streaming
>>>> query
>>>> >>>>>>>>>> needs a
>>>> >>>>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO
>>>> statement,
>>>> >>>>>>>>>> correct?
>>>> >>>>>>>>>>>> We
>>>> >>>>>>>>>>>>>>>>> should also offer this functionality in
>>>> >>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
>>>> >>>>>>>>>>>> `sql-client.job.detach`
>>>> >>>>>>>>>>>>>> is
>>>> >>>>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could
>>>> also
>>>> >> be
>>>> >>>> a
>>>> >>>>>>>>>>>>> general
>>>> >>>>>>>>>>>>>>>>> `table.multi-sql-async` option?
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> 3) DELETE JAR
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE"
>>>> >> sounds
>>>> >>>>>>>>>> like
>>>> >>>>>>>>>>>> one
>>>> >>>>>>>>>>>>>> is
>>>> >>>>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> 4) LIST JAR
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL
>>>> commands
>>>> >>>>>>>>>> such
>>>> >>>>>>>>>>> as
>>>> >>>>>>>>>>>>>> `SHOW
>>>> >>>>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> We should keep the details in sync with
>>>> >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
>>>> >>>> confusion
>>>> >>>>>>>>>>>> about
>>>> >>>>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
>>>> >>>>>>>>>>> `ESTIMATED_COST`
>>>> >>>>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a
>>>> >> reason
>>>> >>>>>>>>>> why
>>>> >>>>>>>>>>> to
>>>> >>>>>>>>>>>>>> call
>>>> >>>>>>>>>>>>>>>>> it that way.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> 6) Implementation details
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> It would be nice to understand how we plan to
>>>> implement the
>>>> >>>>>>>>>> given
>>>> >>>>>>>>>>>>>>>>> features. Most of the commands and config options
>>>> should go
>>>> >>>>>>>>>> into
>>>> >>>>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct?
>>>> This way
>>>> >>>>>>>>>> users
>>>> >>>>>>>>>>>>> have a
>>>> >>>>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
>>>> >>>>>>>>>> provide a
>>>> >>>>>>>>>>>>>> similar
>>>> >>>>>>>>>>>>>>>>> user experience in notebooks or interactive programs
>>>> than
>>>> >> the
>>>> >>>>>>>>>> SQL
>>>> >>>>>>>>>>>>>> Client.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>>> >>>>>>>>>>>>>>>>> [2]
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>>>>>> Timo
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>>> >>>>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better
>>>> rather
>>>> >>>> than
>>>> >>>>>>>>>>>>> `UNSET`.
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二
>>>> 下午4:44写道：
>>>> >>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Hi, Jingsong.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much
>>>> better.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> 1. We don't need to introduce another command
>>>> `UNSET`.
>>>> >>>>>>>>>> `RESET`
>>>> >>>>>>>>>>> is
>>>> >>>>>>>>>>>>>>>>>>> supported in the current sql client now. Our
>>>> proposal
>>>> >> just
>>>> >>>>>>>>>>>> extends
>>>> >>>>>>>>>>>>>> its
>>>> >>>>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
>>>> >>>>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to
>>>> the
>>>> >>>>>>>>>> default
>>>> >>>>>>>>>>>>>>>>> value[1].
>>>> >>>>>>>>>>>>>>>>>>> I think it is more friendly for batch users.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>> Shengkai
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
>>>> >>>> 下午1:56写道：
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too
>>>> >> outdated.
>>>> >>>>>>>>>> +1
>>>> >>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>> improving it.
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and
>>>> "UNSET"?
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>> Jingsong
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>>>> >>>>>>>>>> lirui.fudan@gmail.com>
>>>> >>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed
>>>> changes
>>>> >> look
>>>> >>>>>>>>>>> good
>>>> >>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>> me.
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>>>> >>>>>>>>>>>> fskmine@gmail.com
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> Hi, Rui.
>>>> >>>>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> The main changes:
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the
>>>> statement
>>>> >>>>>>>>>> type.
>>>> >>>>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the
>>>> result
>>>> >> of
>>>> >>>>>>>>>>>> queries
>>>> >>>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>> debug
>>>> >>>>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
>>>> >>>> convenient
>>>> >>>>>>>>>>>>>> comparing
>>>> >>>>>>>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> # Add a new sql client option
>>>> `sql-client.job.detach`
>>>> >> .
>>>> >>>>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the
>>>> batch
>>>> >>>>>>>>>> mode.
>>>> >>>>>>>>>>>> Users
>>>> >>>>>>>>>>>>>>>> can
>>>> >>>>>>>>>>>>>>>>>>>>> set
>>>> >>>>>>>>>>>>>>>>>>>>>> this option false and the client will process
>>>> the next
>>>> >>>>>>>>>> job
>>>> >>>>>>>>>>>> until
>>>> >>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this
>>>> option
>>>> >>>> is
>>>> >>>>>>>>>>>> false,
>>>> >>>>>>>>>>>>>>>>> which
>>>> >>>>>>>>>>>>>>>>>>>>>> means the client will execute the next job when
>>>> the
>>>> >>>>>>>>>> current
>>>> >>>>>>>>>>>> job
>>>> >>>>>>>>>>>>> is
>>>> >>>>>>>>>>>>>>>>>>>>>> submitted.
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>>>> >> 下午4:52写道：
>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and
>>>> hive
>>>> >>>>>>>>>> have
>>>> >>>>>>>>>>>>>>>> different
>>>> >>>>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the
>>>> behavior. For
>>>> >>>>>>>>>>>> example,
>>>> >>>>>>>>>>>>> if
>>>> >>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what
>>>> happens
>>>> >> if
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>> file
>>>> >>>>>>>>>>>>>>>>>>>>> contains
>>>> >>>>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should
>>>> treat
>>>> >>>>>>>>>> them
>>>> >>>>>>>>>>> as
>>>> >>>>>>>>>>>> a
>>>> >>>>>>>>>>>>>>>>>>>>> statement
>>>> >>>>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
>>>> >>>>>>>>>> STATEMENT
>>>> >>>>>>>>>>>> SET
>>>> >>>>>>>>>>>>> in
>>>> >>>>>>>>>>>>>>>>> that
>>>> >>>>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously
>>>> submit
>>>> >>>> the
>>>> >>>>>>>>>>> two
>>>> >>>>>>>>>>>>>> jobs,
>>>> >>>>>>>>>>>>>>>>>>>>> because
>>>> >>>>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>>>> >>>>>>>>>>>>> fskmine@gmail.com
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Rui,
>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
>>>> >>>>>>>>>> suggestions.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to
>>>> strengthen
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>> set
>>>> >>>>>>>>>>>>>>>>>>>>> command. In
>>>> >>>>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the
>>>> key-value
>>>> >>>> into
>>>> >>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to
>>>> generate the
>>>> >>>>>>>>>> table
>>>> >>>>>>>>>>>>>> config.
>>>> >>>>>>>>>>>>>>>>> If
>>>> >>>>>>>>>>>>>>>>>>>>> hive
>>>> >>>>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table
>>>> config,
>>>> >>>>>>>>>> users
>>>> >>>>>>>>>>>> are
>>>> >>>>>>>>>>>>>>>> able
>>>> >>>>>>>>>>>>>>>>>>>>> to set
>>>> >>>>>>>>>>>>>>>>>>>>>>>> the hive-related settings.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will
>>>> submit
>>>> >> the
>>>> >>>>>>>>>> job
>>>> >>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>> exit.
>>>> >>>>>>>>>>>>>>>>>>>>> If
>>>> >>>>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel
>>>> the job
>>>> >> by
>>>> >>>>>>>>>>>>>>>> themselves,
>>>> >>>>>>>>>>>>>>>>>>>>> which is
>>>> >>>>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In
>>>> most
>>>> >>>>>>>>>> case,
>>>> >>>>>>>>>>>>>> queries
>>>> >>>>>>>>>>>>>>>>>>>>> are used
>>>> >>>>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries
>>>> in the
>>>> >>>>>>>>>>>>> interactive
>>>> >>>>>>>>>>>>>>>>>>>>> mode.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>>>> >>>> 下午3:18写道：
>>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this
>>>> discussion. I
>>>> >>>>>>>>>> think
>>>> >>>>>>>>>>> it
>>>> >>>>>>>>>>>>>>>>> covers a
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
>>>> >>>> improve
>>>> >>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> usability of our
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the
>>>> >> FLIP.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
>>>> >>>>>>>>>>>> configurations
>>>> >>>>>>>>>>>>>>>> via
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
>>>> >>>>>>>>>> configurations
>>>> >>>>>>>>>>>> and
>>>> >>>>>>>>>>>>> we
>>>> >>>>>>>>>>>>>>>>>>>>> don't have
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such
>>>> configurations in
>>>> >>>> SQL
>>>> >>>>>>>>>>>>> Client.
>>>> >>>>>>>>>>>>>>>> For
>>>> >>>>>>>>>>>>>>>>>>>>> example,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf
>>>> when
>>>> >>>>>>>>>> using
>>>> >>>>>>>>>>>> hive
>>>> >>>>>>>>>>>>>>>>>>>>> connector [1].
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries
>>>> in SQL
>>>> >>>>>>>>>> files
>>>> >>>>>>>>>>>>>>>> specified
>>>> >>>>>>>>>>>>>>>>>>>>> with
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f
>>>> option
>>>> >> but
>>>> >>>>>>>>>>> allows
>>>> >>>>>>>>>>>>>>>>> queries
>>>> >>>>>>>>>>>>>>>>>>>>> in the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some
>>>> query
>>>> >> and
>>>> >>>>>>>>>>>> redirect
>>>> >>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> results
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would
>>>> like
>>>> >> to
>>>> >>>>>>>>>> do
>>>> >>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>> same,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>> https://issues.apache.org/jira/browse/FLINK-20590
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian
>>>> Liu <
>>>> >>>>>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
>>>> >>>>>>>>>> additional
>>>> >>>>>>>>>>>>>>>>>>>>> suggestions:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in
>>>> ExecutionContext
>>>> >>>> to
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and
>>>> >> batch
>>>> >>>>>>>>>> sql.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql
>>>> >> client
>>>> >>>>>>>>>>>> collect
>>>> >>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> results
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at
>>>> present,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>             which may have memory issues in
>>>> JM or
>>>> >>>> Local
>>>> >>>>>>>>>> for
>>>> >>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>> big
>>>> >>>>>>>>>>>>>>>>> query
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> result.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing
>>>> purpose.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>             We may change to use
>>>> SelectTableSink,
>>>> >>>> which
>>>> >>>>>>>>>> is
>>>> >>>>>>>>>>>> based
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway
>>>> which
>>>> >>>>>>>>>> is in
>>>> >>>>>>>>>>>>>>>> FLIP-91.
>>>> >>>>>>>>>>>>>>>>>>>>> Seems
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a
>>>> long
>>>> >>>> time.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>             Provide a long running service
>>>> out of
>>>> >> the
>>>> >>>>>>>>>> box to
>>>> >>>>>>>>>>>>>>>>> facilitate
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> sql
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com>
>>>> 于2021年1月28日周四
>>>> >>>>>>>>>>> 下午8:54写道：
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
>>>> >>>>>>>>>>> FLIP-163:SQL
>>>> >>>>>>>>>>>>>>>> Client
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Improvements.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the
>>>> problems of
>>>> >>>> the
>>>> >>>>>>>>>>> sql
>>>> >>>>>>>>>>>>>>>> client.
>>>> >>>>>>>>>>>>>>>>>>>>> For
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table
>>>> >> proposed
>>>> >>>>>>>>>> by
>>>> >>>>>>>>>>>>>> FLIP-95.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file
>>>> to
>>>> >>>>>>>>>>> initialize
>>>> >>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> table
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated
>>>> '-u'
>>>> >>>>>>>>>>>> parameter;
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD
>>>> JAR;
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
>>>> >>>>>>>>>> FLIP-163[1].
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> *With kind regards
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> ------------------------------------------------------------
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese
>>>> Academy
>>>> >>>> of
>>>> >>>>>>>>>>>>> Science
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
>>>> >>>> liuyang0704@gmail.com
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> --
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best regards!
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Rui Li
>>>> >>>>>>>>>>>
>>>> >
>>>>
>>>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>

-- 
Best regards!
Rui Li

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi Rui,

That's a good point. From the naming of the option, I prefer to get sync
behavior.
It would be very straightforward that it affects all the DMLs on SQL CLI
and
TableEnvironment (including `executeSql`, `StatementSet`,
`Table#executeInsert`, etc.).
This can also make SQL CLI easy to support this configuration by passing
through to the TableEnv.

Best,
Jark

On Tue, 9 Feb 2021 at 10:07, Rui Li <li...@gmail.com> wrote:

> Hi,
>
> Glad to see we have reached consensus on option #2. +1 to it.
>
> Regarding the name, I'm fine with `table.dml-async`. But I wonder whether
> this config also applies to table API. E.g. if a user
> sets table.dml-async=false and calls TableEnvironment::executeSql to run a
> DML, will he get sync behavior?
>
> On Mon, Feb 8, 2021 at 11:28 PM Jark Wu <im...@gmail.com> wrote:
>
>> Ah, I just forgot the option name.
>>
>> I'm also fine with `table.dml-async`.
>>
>> What do you think @Rui Li <li...@gmail.com> @Shengkai Fang
>> <fs...@gmail.com> ?
>>
>> Best,
>> Jark
>>
>> On Mon, 8 Feb 2021 at 23:06, Timo Walther <tw...@apache.org> wrote:
>>
>>> Great to hear that. Can someone update the FLIP a final time before we
>>> start a vote?
>>>
>>> We should quickly discuss how we would like to name the config option
>>> for the async/sync mode. I heared voices internally that are strongly
>>> against calling it "detach" due to historical reasons with a Flink job
>>> detach mode. How about `table.dml-async`?
>>>
>>> Thanks,
>>> Timo
>>>
>>>
>>> On 08.02.21 15:55, Jark Wu wrote:
>>> > Thanks Timo,
>>> >
>>> > I'm +1 for option#2 too.
>>> >
>>> > I think we have addressed all the concerns and can start a vote.
>>> >
>>> > Best,
>>> > Jark
>>> >
>>> > On Mon, 8 Feb 2021 at 22:19, Timo Walther <tw...@apache.org> wrote:
>>> >
>>> >> Hi Jark,
>>> >>
>>> >> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
>>> >>
>>> >> So let's stick to the config option approach.
>>> >>
>>> >> However, I strongly believe that we should not use the batch/streaming
>>> >> mode for deriving semantics. This discussion is similar to time
>>> function
>>> >> discussion. We should not derive sync/async submission behavior from a
>>> >> flag that should only influence runtime operators and the incremental
>>> >> computation. Statements for bounded streams should have the same
>>> >> semantics in batch mode.
>>> >>
>>> >> I think your proposed option 2) is a good tradeoff. For the following
>>> >> reasons:
>>> >>
>>> >> pros:
>>> >> - by default, batch and streaming behave exactly the same
>>> >> - SQL Client CLI behavior does not change compared to 1.12 and remains
>>> >> async for batch and streaming
>>> >> - consistent with the async Table API behavior
>>> >>
>>> >> con:
>>> >> - batch files are not 100% SQL compliant by default
>>> >>
>>> >> The last item might not be an issue since we can expect that users
>>> have
>>> >> long-running jobs and prefer async execution in most cases.
>>> >>
>>> >> Regards,
>>> >> Timo
>>> >>
>>> >>
>>> >> On 08.02.21 14:15, Jark Wu wrote:
>>> >>> Hi Timo,
>>> >>>
>>> >>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
>>> >>> Because it makes submitting streaming jobs very verbose, every INSERT
>>> >> INTO
>>> >>> and STATEMENT SET must be wrapped in the ASYNC clause which is
>>> >>> not user-friendly and not backward-compatible.
>>> >>>
>>> >>> I agree we will have unified behavior but this is at the cost of
>>> hurting
>>> >>> our main users.
>>> >>> I'm worried that end users can't understand the technical decision,
>>> and
>>> >>> they would
>>> >>> feel streaming is harder to use.
>>> >>>
>>> >>> If we want to have an unified behavior, and let users decide what's
>>> the
>>> >>> desirable behavior, I prefer to have a config option. A Flink
>>> cluster can
>>> >>> be set to async, then
>>> >>> users don't need to wrap every DML in an ASYNC clause. This is the
>>> least
>>> >>> intrusive
>>> >>> way to the users.
>>> >>>
>>> >>>
>>> >>> Personally, I'm fine with following options in priority:
>>> >>>
>>> >>> 1) sync for batch DML and async for streaming DML
>>> >>> ==> only breaks batch behavior, but makes both happy
>>> >>>
>>> >>> 2) async for both batch and streaming DML, and can be set to sync
>>> via a
>>> >>> configuration.
>>> >>> ==> compatible, and provides flexible configurable behavior
>>> >>>
>>> >>> 3) sync for both batch and streaming DML, and can be
>>> >>>       set to async via a configuration.
>>> >>> ==> +0 for this, because it breaks all the compatibility, esp. our
>>> main
>>> >>> users.
>>> >>>
>>> >>> Best,
>>> >>> Jark
>>> >>>
>>> >>> On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org>
>>> wrote:
>>> >>>
>>> >>>> Hi Jark, Hi Rui,
>>> >>>>
>>> >>>> 1) How should we execute statements in CLI and in file? Should
>>> there be
>>> >>>> a difference?
>>> >>>> So it seems we have consensus here with unified bahavior. Even
>>> though
>>> >>>> this means we are breaking existing batch INSERT INTOs that were
>>> >>>> asynchronous before.
>>> >>>>
>>> >>>> 2) Should we have different behavior for batch and streaming?
>>> >>>> I think also batch users prefer async behavior because usually even
>>> >>>> those pipelines take some time to execute. But we need should stick
>>> to
>>> >>>> standard SQL blocking semantics.
>>> >>>>
>>> >>>> What are your opinions on making async explicit in SQL via `BEGIN
>>> ASYNC;
>>> >>>> ... END;`? This would allow us to really have unified semantics
>>> because
>>> >>>> batch and streaming would behave the same?
>>> >>>>
>>> >>>> Regards,
>>> >>>> Timo
>>> >>>>
>>> >>>>
>>> >>>> On 07.02.21 04:46, Rui Li wrote:
>>> >>>>> Hi Timo,
>>> >>>>>
>>> >>>>> I agree with Jark that we should provide consistent experience
>>> >> regarding
>>> >>>>> SQL CLI and files. Some systems even allow users to execute SQL
>>> files
>>> >> in
>>> >>>>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support
>>> that
>>> >>>> in
>>> >>>>> the future, it's a little tricky to decide whether that should be
>>> >> treated
>>> >>>>> as CLI or file.
>>> >>>>>
>>> >>>>> I actually prefer a config option and let users decide what's the
>>> >>>>> desirable behavior. But if we have agreed not to use options, I'm
>>> also
>>> >>>> fine
>>> >>>>> with Alternative #1.
>>> >>>>>
>>> >>>>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
>>> >>>>>
>>> >>>>>> Hi Timo,
>>> >>>>>>
>>> >>>>>> 1) How should we execute statements in CLI and in file? Should
>>> there
>>> >> be
>>> >>>> a
>>> >>>>>> difference?
>>> >>>>>> I do think we should unify the behavior of CLI and SQL files. SQL
>>> >> files
>>> >>>> can
>>> >>>>>> be thought of as a shortcut of
>>> >>>>>> "start CLI" => "copy content of SQL files" => "past content in
>>> CLI".
>>> >>>>>> Actually, we already did this in kafka_e2e.sql [1].
>>> >>>>>> I think it's hard for users to understand why SQL files behave
>>> >>>> differently
>>> >>>>>> from CLI, all the other systems don't have such a difference.
>>> >>>>>>
>>> >>>>>> If we distinguish SQL files and CLI, should there be a difference
>>> in
>>> >>>> JDBC
>>> >>>>>> driver and UI platform?
>>> >>>>>> Personally, they all should have consistent behavior.
>>> >>>>>>
>>> >>>>>> 2) Should we have different behavior for batch and streaming?
>>> >>>>>> I think we all agree streaming users prefer async execution,
>>> otherwise
>>> >>>> it's
>>> >>>>>> weird and difficult to use if the
>>> >>>>>> submit script or CLI never exists. On the other hand, batch SQL
>>> users
>>> >>>> are
>>> >>>>>> used to SQL statements being
>>> >>>>>> executed blockly.
>>> >>>>>>
>>> >>>>>> Either unified async execution or unified sync execution, will
>>> hurt
>>> >> one
>>> >>>>>> side of the streaming
>>> >>>>>> batch users. In order to make both sides happy, I think we can
>>> have
>>> >>>>>> different behavior for batch and streaming.
>>> >>>>>> There are many essential differences between batch and stream
>>> >> systems, I
>>> >>>>>> think it's normal to have some
>>> >>>>>> different behaviors, and the behavior doesn't break the unified
>>> batch
>>> >>>>>> stream semantics.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Thus, I'm +1 to Alternative 1:
>>> >>>>>> We consider batch/streaming mode and block for batch INSERT INTO
>>> and
>>> >>>> async
>>> >>>>>> for streaming INSERT INTO/STATEMENT SET.
>>> >>>>>> And this behavior is consistent across CLI and files.
>>> >>>>>>
>>> >>>>>> Best,
>>> >>>>>> Jark
>>> >>>>>>
>>> >>>>>> [1]:
>>> >>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
>>> >>>>>>
>>> >>>>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org>
>>> wrote:
>>> >>>>>>
>>> >>>>>>> Hi Jark,
>>> >>>>>>>
>>> >>>>>>> thanks for the summary. I hope we can also find a good long-term
>>> >>>>>>> solution on the async/sync execution behavior topic.
>>> >>>>>>>
>>> >>>>>>> It should be discussed in a bigger round because it is (similar
>>> to
>>> >> the
>>> >>>>>>> time function discussion) related to batch-streaming unification
>>> >> where
>>> >>>>>>> we should stick to the SQL standard to some degree but also need
>>> to
>>> >>>> come
>>> >>>>>>> up with good streaming semantics.
>>> >>>>>>>
>>> >>>>>>> Let me summarize the problem again to hear opinions:
>>> >>>>>>>
>>> >>>>>>> - Batch SQL users are used to execute SQL files sequentially
>>> (from
>>> >> top
>>> >>>>>>> to bottom).
>>> >>>>>>> - Batch SQL users are used to SQL statements being executed
>>> blocking.
>>> >>>>>>> One after the other. Esp. when moving around data with INSERT
>>> INTO.
>>> >>>>>>> - Streaming users prefer async execution because unbounded
>>> stream are
>>> >>>>>>> more frequent than bounded streams.
>>> >>>>>>> - We decided to make Flink Table API is async because in a
>>> >> programming
>>> >>>>>>> language it is easy to call `.await()` on the result to make it
>>> >>>> blocking.
>>> >>>>>>> - INSERT INTO statements in the current SQL Client
>>> implementation are
>>> >>>>>>> always submitted asynchrounous.
>>> >>>>>>> - Other client's such as Ververica platform allow only one INSERT
>>> >> INTO
>>> >>>>>>> or a STATEMENT SET at the end of a file that will run
>>> >> asynchrounously.
>>> >>>>>>>
>>> >>>>>>> Questions:
>>> >>>>>>>
>>> >>>>>>> - How should we execute statements in CLI and in file? Should
>>> there
>>> >> be
>>> >>>> a
>>> >>>>>>> difference?
>>> >>>>>>> - Should we have different behavior for batch and streaming?
>>> >>>>>>> - Shall we solve parts with a config option or is it better to
>>> make
>>> >> it
>>> >>>>>>> explicit in the SQL job definition because it influences the
>>> >> semantics
>>> >>>>>>> of multiple INSERT INTOs?
>>> >>>>>>>
>>> >>>>>>> Let me summarize my opinion at the moment:
>>> >>>>>>>
>>> >>>>>>> - SQL files should always be executed blocking by default.
>>> Because
>>> >> they
>>> >>>>>>> could potentially contain a long list of INSERT INTO statements.
>>> This
>>> >>>>>>> would be SQL standard compliant.
>>> >>>>>>> - If we allow async execution, we should make this explicit in
>>> the
>>> >> SQL
>>> >>>>>>> file via `BEGIN ASYNC; ... END;`.
>>> >>>>>>> - In the CLI, we always execute async to maintain the old
>>> behavior.
>>> >> We
>>> >>>>>>> can also assume that people are only using the CLI to fire
>>> statements
>>> >>>>>>> and close the CLI afterwards.
>>> >>>>>>>
>>> >>>>>>> Alternative 1:
>>> >>>>>>> - We consider batch/streaming mode and block for batch INSERT
>>> INTO
>>> >> and
>>> >>>>>>> async for streaming INSERT INTO/STATEMENT SET
>>> >>>>>>>
>>> >>>>>>> What do others think?
>>> >>>>>>>
>>> >>>>>>> Regards,
>>> >>>>>>> Timo
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On 05.02.21 04:03, Jark Wu wrote:
>>> >>>>>>>> Hi all,
>>> >>>>>>>>
>>> >>>>>>>> After an offline discussion with Timo and Kurt, we have reached
>>> some
>>> >>>>>>>> consensus.
>>> >>>>>>>> Please correct me if I am wrong or missed anything.
>>> >>>>>>>>
>>> >>>>>>>> 1) We will introduce "table.planner" and "table.execution-mode"
>>> >>>> instead
>>> >>>>>>> of
>>> >>>>>>>> "sql-client" prefix,
>>> >>>>>>>> and add `TableEnvironment.create(Configuration)` interface.
>>> These 2
>>> >>>>>>> options
>>> >>>>>>>> can only be used
>>> >>>>>>>> for tableEnv initialization. If used after initialization, Flink
>>> >>>> should
>>> >>>>>>>> throw an exception. We may can
>>> >>>>>>>> support dynamic switch the planner in the future.
>>> >>>>>>>>
>>> >>>>>>>> 2) We will have only one parser,
>>> >>>>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a
>>> string
>>> >>>>>>>> statement, and returns a list of Operation. It will first use
>>> regex
>>> >> to
>>> >>>>>>>> match some special statement,
>>> >>>>>>>>      e.g. SET, ADD JAR, others will be delegated to the
>>> underlying
>>> >>>> Calcite
>>> >>>>>>>> parser. The Parser can
>>> >>>>>>>> have different implementations, e.g. HiveParser.
>>> >>>>>>>>
>>> >>>>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink
>>> dialect.
>>> >>>> But
>>> >>>>>>> we
>>> >>>>>>>> can allow
>>> >>>>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
>>> >>>>>>>>
>>> >>>>>>>> 4) We don't have a conclusion for async/sync execution behavior
>>> yet.
>>> >>>>>>>>
>>> >>>>>>>> Best,
>>> >>>>>>>> Jark
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> Hi Ingo,
>>> >>>>>>>>>
>>> >>>>>>>>> Since we have supported the WITH syntax and SET command since
>>> v1.9
>>> >>>>>>> [1][2],
>>> >>>>>>>>> and
>>> >>>>>>>>> we have never received such complaints, I think it's fine for
>>> such
>>> >>>>>>>>> differences.
>>> >>>>>>>>>
>>> >>>>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
>>> >>>>>> requires
>>> >>>>>>>>> string literal keys[3],
>>> >>>>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
>>> >>>>>>>>>
>>> >>>>>>>>> Best,
>>> >>>>>>>>> Jark
>>> >>>>>>>>>
>>> >>>>>>>>> [1]:
>>> >>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
>>> >>>>>>>>> [2]:
>>> >>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
>>> >>>>>>>>> [3]:
>>> >>>>>>>
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>>> >>>>>>>>> [4]:
>>> >>>>>>>
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
>>> >>>>>>>>> (search "set mapred.reduce.tasks=32")
>>> >>>>>>>>>
>>> >>>>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com>
>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> Hi,
>>> >>>>>>>>>>
>>> >>>>>>>>>> regarding the (un-)quoted question, compatibility is of
>>> course an
>>> >>>>>>>>>> important
>>> >>>>>>>>>> argument, but in terms of consistency I'd find it a bit
>>> surprising
>>> >>>>>> that
>>> >>>>>>>>>> WITH handles it differently than SET, and I wonder if that
>>> could
>>> >>>>>> cause
>>> >>>>>>>>>> friction for developers when writing their SQL.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> Regards
>>> >>>>>>>>>> Ingo
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com>
>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Hi all,
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Regarding "One Parser", I think it's not possible for now
>>> because
>>> >>>>>>>>>> Calcite
>>> >>>>>>>>>>> parser can't parse
>>> >>>>>>>>>>> special characters (e.g. "-") unless quoting them as string
>>> >>>>>> literals.
>>> >>>>>>>>>>> That's why the WITH option
>>> >>>>>>>>>>> key are string literals not identifiers.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
>>> >>>>>>>>>>> /local/my-home/test.jar
>>> >>>>>>>>>>> have the same
>>> >>>>>>>>>>> problems. That's why we propose two parser, one splits lines
>>> into
>>> >>>>>>>>>> multiple
>>> >>>>>>>>>>> statements and match special
>>> >>>>>>>>>>> command through regex which is light-weight, and delegate
>>> other
>>> >>>>>>>>>> statements
>>> >>>>>>>>>>> to the other parser which is Calcite parser.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Note: we should stick on the unquoted SET
>>> >>>>>>> table.exec.mini-batch.enabled
>>> >>>>>>>>>> =
>>> >>>>>>>>>>> true syntax,
>>> >>>>>>>>>>> both for backward-compatibility and easy-to-use, and all the
>>> >> other
>>> >>>>>>>>>> systems
>>> >>>>>>>>>>> don't have quotes on the key.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Regarding "table.planner" vs "sql-client.planner",
>>> >>>>>>>>>>> if we want to use "table.planner", I think we should explain
>>> >>>> clearly
>>> >>>>>>>>>> what's
>>> >>>>>>>>>>> the scope it can be used in documentation.
>>> >>>>>>>>>>> Otherwise, there will be users complaining why the planner
>>> >> doesn't
>>> >>>>>>>>>> change
>>> >>>>>>>>>>> when setting the configuration on TableEnv.
>>> >>>>>>>>>>> Would be better throwing an exception to indicate users it's
>>> now
>>> >>>>>>>>>> allowed to
>>> >>>>>>>>>>> change planner after TableEnv is initialized.
>>> >>>>>>>>>>> However, it seems not easy to implement.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Best,
>>> >>>>>>>>>>> Jark
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <godfreyhe@gmail.com
>>> >
>>> >>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Hi everyone,
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Regarding "table.planner" and "table.execution-mode"
>>> >>>>>>>>>>>> If we define that those two options are just used to
>>> initialize
>>> >>>> the
>>> >>>>>>>>>>>> TableEnvironment, +1 for introducing table options instead
>>> of
>>> >>>>>>>>>> sql-client
>>> >>>>>>>>>>>> options.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Regarding "the sql client, we will maintain two parsers", I
>>> want
>>> >>>> to
>>> >>>>>>>>>> give
>>> >>>>>>>>>>>> more inputs:
>>> >>>>>>>>>>>> We want to introduce sql-gateway into the Flink project (see
>>> >>>>>> FLIP-24
>>> >>>>>>> &
>>> >>>>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the
>>> CLI
>>> >>>>>> client
>>> >>>>>>>>>> and
>>> >>>>>>>>>>>> the gateway service will communicate through Rest API. The
>>> " ADD
>>> >>>>>> JAR
>>> >>>>>>>>>>>> /local/path/jar " will be executed in the CLI client
>>> machine. So
>>> >>>>>> when
>>> >>>>>>>>>> we
>>> >>>>>>>>>>>> submit a sql file which contains multiple statements, the
>>> CLI
>>> >>>>>> client
>>> >>>>>>>>>>> needs
>>> >>>>>>>>>>>> to pick out the "ADD JAR" line, and also statements need to
>>> be
>>> >>>>>>>>>> submitted
>>> >>>>>>>>>>> or
>>> >>>>>>>>>>>> executed one by one to make sure the result is correct. The
>>> sql
>>> >>>>>> file
>>> >>>>>>>>>> may
>>> >>>>>>>>>>> be
>>> >>>>>>>>>>>> look like:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> SET xxx=yyy;
>>> >>>>>>>>>>>> create table my_table ...;
>>> >>>>>>>>>>>> create table my_sink ...;
>>> >>>>>>>>>>>> ADD JAR /local/path/jar1;
>>> >>>>>>>>>>>> create function my_udf as com....MyUdf;
>>> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>> >>>>>>>>>>>> REMOVE JAR /local/path/jar1;
>>> >>>>>>>>>>>> drop function my_udf;
>>> >>>>>>>>>>>> ADD JAR /local/path/jar2;
>>> >>>>>>>>>>>> create function my_udf as com....MyUdf2;
>>> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> The lines need to be splitted into multiple statements
>>> first in
>>> >>>> the
>>> >>>>>>>>>> CLI
>>> >>>>>>>>>>>> client, there are two approaches:
>>> >>>>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser
>>> >> splits
>>> >>>>>>> the
>>> >>>>>>>>>>>> lines and tells which lines are "ADD JAR".
>>> >>>>>>>>>>>> pro: there is only one parser
>>> >>>>>>>>>>>> cons: It's a little heavy that the CLI client depends on the
>>> >>>>>>>>>> sql-parser,
>>> >>>>>>>>>>>> because the CLI client is just a simple tool which receives
>>> the
>>> >>>>>> user
>>> >>>>>>>>>>>> commands and displays the result. The non "ADD JAR" command
>>> will
>>> >>>> be
>>> >>>>>>>>>>> parsed
>>> >>>>>>>>>>>> twice.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 2. The CLI client splits the lines into multiple statements
>>> and
>>> >>>>>> finds
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>> ADD JAR command through regex matching.
>>> >>>>>>>>>>>> pro: The CLI client is very light-weight.
>>> >>>>>>>>>>>> cons: there are two parsers.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> (personally, I prefer the second option)
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them
>>> both.
>>> >>>>>>>>>>>> For default dialect, we support SHOW JARS, but if we switch
>>> to
>>> >>>> hive
>>> >>>>>>>>>>>> dialect, LIST JARS is also supported.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> [1]
>>> >>>>>>>>>>>
>>> >>>>>>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>>> >>>>>>>>>>>> [2]
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Best,
>>> >>>>>>>>>>>> Godfrey
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>> Hi guys,
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent
>>> with
>>> >>>>>> other
>>> >>>>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion
>>> about
>>> >>>>>> REMOVE
>>> >>>>>>>>>> vs
>>> >>>>>>>>>>>>> DELETE though.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I
>>> >> know,
>>> >>>>>>>>>> most
>>> >>>>>>>>>>>>> users who are requesting these features are previously hive
>>> >>>> users.
>>> >>>>>>>>>> So I
>>> >>>>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
>>> >>>>>> REMOVE/DELETE
>>> >>>>>>>>>>> JARS
>>> >>>>>>>>>>>>> as synonyms? It's just like lots of systems accept both
>>> EXIT
>>> >> and
>>> >>>>>>>>>> QUIT
>>> >>>>>>>>>>> as
>>> >>>>>>>>>>>>> the command to terminate the program. So if that's not
>>> hard to
>>> >>>>>>>>>> achieve,
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>> will make users happier, I don't see a reason why we must
>>> >> choose
>>> >>>>>> one
>>> >>>>>>>>>>> over
>>> >>>>>>>>>>>>> the other.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <
>>> >> twalthr@apache.org
>>> >>>>>
>>> >>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Hi everyone,
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> some feedback regarding the open questions. Maybe we can
>>> >> discuss
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to
>>> determine
>>> >>>> how
>>> >>>>>>>>>> we
>>> >>>>>>>>>>>>>> proceed with this in the near future.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> 1) "whether the table environment has the ability to
>>> update
>>> >>>>>>>>>> itself"
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Maybe there was some misunderstanding. I don't think that
>>> we
>>> >>>>>>>>>> should
>>> >>>>>>>>>>>>>> support
>>> >>>>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>>> >>>>>>>>>>>>>> "old")`. Instead I'm proposing to support
>>> >>>>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
>>> >>>>>>>>>> execution
>>> >>>>>>>>>>>>>> mode are read immediately and a subsequent changes to
>>> these
>>> >>>>>>>>>> options
>>> >>>>>>>>>>>> will
>>> >>>>>>>>>>>>>> have no effect. We are doing it similar in `new
>>> >>>>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
>>> >>>>>>>>>> ConfigOption's
>>> >>>>>>>>>>>>>> must not be SQL Client specific but can be part of the
>>> core
>>> >>>> table
>>> >>>>>>>>>>> code
>>> >>>>>>>>>>>>>> base. Many users would like to get a 100% preconfigured
>>> >>>>>>>>>> environment
>>> >>>>>>>>>>>> from
>>> >>>>>>>>>>>>>> just Configuration. And this is not possible right now.
>>> We can
>>> >>>>>>>>>> solve
>>> >>>>>>>>>>>>>> both use cases in one change.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> I remember we had some discussion about this and decided
>>> that
>>> >> we
>>> >>>>>>>>>>> would
>>> >>>>>>>>>>>>>> like to maintain only one parser. In the end it is "One
>>> Flink
>>> >>>>>> SQL"
>>> >>>>>>>>>>>> where
>>> >>>>>>>>>>>>>> commands influence each other also with respect to
>>> keywords.
>>> >> It
>>> >>>>>>>>>>> should
>>> >>>>>>>>>>>>>> be fine to include the SQL Client commands in the Flink
>>> >> parser.
>>> >>>>>> Of
>>> >>>>>>>>>>>>>> cource the table environment would not be able to handle
>>> the
>>> >>>>>>>>>>>> `Operation`
>>> >>>>>>>>>>>>>> instance that would be the result but we can introduce
>>> hooks
>>> >> to
>>> >>>>>>>>>>> handle
>>> >>>>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Can we skip `table.job.async` in the first version? We
>>> should
>>> >>>>>>>>>> further
>>> >>>>>>>>>>>>>> discuss whether we introduce a special SQL clause for
>>> wrapping
>>> >>>>>>>>>> async
>>> >>>>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming
>>> >>>> queries
>>> >>>>>>>>>> we
>>> >>>>>>>>>>>>>> need to be careful and should force users to either "one
>>> >> INSERT
>>> >>>>>>>>>> INTO"
>>> >>>>>>>>>>>> or
>>> >>>>>>>>>>>>>> "one STATEMENT SET".
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> 3) 4) "HIVE also uses these commands"
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> In general, Hive is not a good reference. Aligning the
>>> >> commands
>>> >>>>>>>>>> more
>>> >>>>>>>>>>>>>> with the remaining commands should be our goal. We just
>>> had a
>>> >>>>>>>>>> MODULE
>>> >>>>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it
>>> is
>>> >>>> true
>>> >>>>>>>>>>> that
>>> >>>>>>>>>>>>>> JARs are not part of the catalog which is why I would not
>>> use
>>> >>>>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the
>>> English
>>> >>>>>>>>>>> language.
>>> >>>>>>>>>>>>>> Take a look at the Java collection API as another example.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> 6) "Most of the commands should belong to the table
>>> >> environment"
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
>>> >>>>>>>>>> understand.
>>> >>>>>>>>>>> It
>>> >>>>>>>>>>>>>> is good to see that most commends will be available in
>>> >>>>>>>>>>>> TableEnvironment.
>>> >>>>>>>>>>>>>> However, I would also support SET and RESET for
>>> consistency.
>>> >>>>>>>>>> Again,
>>> >>>>>>>>>>>> from
>>> >>>>>>>>>>>>>> an architectural point of view, if we would allow some
>>> kind of
>>> >>>>>>>>>>>>>> `Operation` hook in table environment, we could check for
>>> SQL
>>> >>>>>>>>>> Client
>>> >>>>>>>>>>>>>> specific options and forward to regular
>>> >>>>>>>>>>> `TableConfig.getConfiguration`
>>> >>>>>>>>>>>>>> otherwise. What do you think?
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Regards,
>>> >>>>>>>>>>>>>> Timo
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
>>> >>>>>>>>>>>>>>> Hi Timo,
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> I will respond some of the questions:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> 1) SQL client specific options
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on
>>> >> where
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>>> configuration takes effect.
>>> >>>>>>>>>>>>>>> If it is a table configuration, we should make clear
>>> what's
>>> >> the
>>> >>>>>>>>>>>>> behavior
>>> >>>>>>>>>>>>>>> when users change
>>> >>>>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
>>> >>>>>>>>>>>>>> `sql-client.execution.mode`
>>> >>>>>>>>>>>>>>> are something special
>>> >>>>>>>>>>>>>>> that can't be changed after TableEnvironment has been
>>> >>>>>>>>>> initialized.
>>> >>>>>>>>>>>> You
>>> >>>>>>>>>>>>>> can
>>> >>>>>>>>>>>>>>> see
>>> >>>>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`
>>> method
>>> >> to
>>> >>>>>>>>>>>> override
>>> >>>>>>>>>>>>>>> configuration after
>>> >>>>>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Therefore, I think it would be better to still use
>>> >>>>>>>>>>>>> `sql-client.planner`
>>> >>>>>>>>>>>>>>> and `sql-client.execution.mode`.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> 2) Execution file
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> >From my point of view, there is a big difference between
>>> >>>>>>>>>>>>>>> `sql-client.job.detach` and
>>> >>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
>>> >>>>>>>>>> `sql-client.job.detach`
>>> >>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>> affect every single DML statement
>>> >>>>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I
>>> >> think
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>> single
>>> >>>>>>>>>>>>>>> DML statement in the interactive
>>> >>>>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
>>> >>>>>>>>>>>>>>> tEnv#executeMultiSql.
>>> >>>>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
>>> >>>>>>>>>>>>> `table.multi-sql-async`.
>>> >>>>>>>>>>>>>>> I just find that runtime provides a configuration called
>>> >>>>>>>>>>>>>>> "execution.attached" [1] which is false by default
>>> >>>>>>>>>>>>>>> which specifies if the pipeline is submitted in attached
>>> or
>>> >>>>>>>>>>> detached
>>> >>>>>>>>>>>>>> mode.
>>> >>>>>>>>>>>>>>> It provides exactly the same
>>> >>>>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you
>>> think
>>> >>>>>>>>>> about
>>> >>>>>>>>>>>> using
>>> >>>>>>>>>>>>>>> this option?
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> If we also want to support this config in
>>> TableEnvironment, I
>>> >>>>>>>>>> think
>>> >>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>> should also affect the DML execution
>>> >>>>>>>>>>>>>>>       of `tEnv#executeSql()`, not only DMLs in
>>> >>>>>>>>>>> `tEnv#executeMultiSql()`.
>>> >>>>>>>>>>>>>>> Therefore, the behavior may look like this:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==>
>>> >> async
>>> >>>>>>>>>> by
>>> >>>>>>>>>>>>>> default
>>> >>>>>>>>>>>>>>> tableResult.await()   ==> manually block until finish
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>
>>> >> tEnv.getConfig().getConfiguration().setString("execution.attached",
>>> >>>>>>>>>>>>>> "true")
>>> >>>>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")
>>> ==>
>>> >>>> sync,
>>> >>>>>>>>>>>> don't
>>> >>>>>>>>>>>>>> need
>>> >>>>>>>>>>>>>>> to wait on the TableResult
>>> >>>>>>>>>>>>>>> tEnv.executeMultiSql(
>>> >>>>>>>>>>>>>>> """
>>> >>>>>>>>>>>>>>> CREATE TABLE ....  ==> always sync
>>> >>>>>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration
>>> above
>>> >>>>>>>>>>>>>>> SET execution.attached = false;
>>> >>>>>>>>>>>>>>> INSERT INTO ...  => async
>>> >>>>>>>>>>>>>>> """)
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
>>> >>>>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two
>>> >> separate
>>> >>>>>>>>>>>> topics,
>>> >>>>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
>>> >>>>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
>>> >>>>>>>>>> statements.
>>> >>>>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't
>>> want
>>> >>>>>>>>>> it to
>>> >>>>>>>>>>>>> block
>>> >>>>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>> Jark
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> [1]:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <
>>> >> fskmine@gmail.com>
>>> >>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Hi, Timo.
>>> >>>>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts
>>> >> about
>>> >>>>>>>>>> your
>>> >>>>>>>>>>>>>>>> feedback.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the
>>> >> table
>>> >>>>>>>>>>>>>> environment
>>> >>>>>>>>>>>>>>>> has the ability to update itself. Let's take a simple
>>> >> program
>>> >>>>>>>>>> as
>>> >>>>>>>>>>> an
>>> >>>>>>>>>>>>>>>> example.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> ```
>>> >>>>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> tEnv.getConfig.getConfiguration.setString("table.planner",
>>> >>>>>>>>>> "old");
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> tEnv.executeSql("...");
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> ```
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> If we regard this option as a table option, users don't
>>> have
>>> >>>> to
>>> >>>>>>>>>>>> create
>>> >>>>>>>>>>>>>>>> another table environment manually. In that case, tEnv
>>> needs
>>> >>>> to
>>> >>>>>>>>>>>> check
>>> >>>>>>>>>>>>>>>> whether the current mode and planner are the same as
>>> before
>>> >>>>>>>>>> when
>>> >>>>>>>>>>>>>> executeSql
>>> >>>>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the
>>> table
>>> >>>>>>>>>>>> environment,
>>> >>>>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment
>>> but
>>> >> set
>>> >>>>>>>>>> old
>>> >>>>>>>>>>>>>> planner
>>> >>>>>>>>>>>>>>>> and batch mode. But when we make this option as a sql
>>> client
>>> >>>>>>>>>>> option,
>>> >>>>>>>>>>>>>> users
>>> >>>>>>>>>>>>>>>> only use the SET command to change the setting. We can
>>> >> rebuild
>>> >>>>>>>>>> a
>>> >>>>>>>>>>> new
>>> >>>>>>>>>>>>>> table
>>> >>>>>>>>>>>>>>>> environment when set successes.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the
>>> >> implementation
>>> >>>>>>>>>>> before
>>> >>>>>>>>>>>>>>>> continuing this topic. In the sql client, we will
>>> maintain
>>> >> two
>>> >>>>>>>>>>>>> parsers.
>>> >>>>>>>>>>>>>> The
>>> >>>>>>>>>>>>>>>> first parser(client parser) will only match the sql
>>> client
>>> >>>>>>>>>>> commands.
>>> >>>>>>>>>>>>> If
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> client parser can't parse the statement, we will
>>> leverage
>>> >> the
>>> >>>>>>>>>>> power
>>> >>>>>>>>>>>> of
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> table environment to execute. According to our
>>> blueprint,
>>> >>>>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql
>>> client.
>>> >>>>>>>>>>> Therefore,
>>> >>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for
>>> this
>>> >>>> FLIP.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> But if we need to introduce the
>>> >>>>>>>>>> `TableEnvironment.executeMultiSql`
>>> >>>>>>>>>>>> in
>>> >>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> future, I think it's OK to use the option
>>> >>>>>>>>>> `table.multi-sql-async`
>>> >>>>>>>>>>>>> rather
>>> >>>>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the
>>> name
>>> >> is
>>> >>>>>>>>>> not
>>> >>>>>>>>>>>>>> suitable
>>> >>>>>>>>>>>>>>>> because the name is confusing for others. When setting
>>> the
>>> >>>>>>>>>> option
>>> >>>>>>>>>>>>>> false, we
>>> >>>>>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
>>> >>>>>>>>>>> statement,
>>> >>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>> DDL
>>> >>>>>>>>>>>>>>>> or others(other sql statements are always executed
>>> >>>>>>>>>> synchronously).
>>> >>>>>>>>>>>> So
>>> >>>>>>>>>>>>>> how
>>> >>>>>>>>>>>>>>>> about `table.job.async`? It only works for the
>>> sql-client
>>> >> and
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
>>> >>>>>>>>>> environment
>>> >>>>>>>>>>>>> will
>>> >>>>>>>>>>>>>>>> return the result until the job finishes.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE
>>> JAR
>>> >> and
>>> >>>>>>>>>>> LIST
>>> >>>>>>>>>>>>> JAR
>>> >>>>>>>>>>>>>>>> because HIVE also uses these commands to add the jar
>>> into
>>> >> the
>>> >>>>>>>>>>>>> classpath
>>> >>>>>>>>>>>>>> or
>>> >>>>>>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce
>>> our
>>> >>>>>>>>>> work
>>> >>>>>>>>>>> for
>>> >>>>>>>>>>>>>> hive
>>> >>>>>>>>>>>>>>>> compatibility.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are
>>> not
>>> >>>>>>>>>>>> maintained
>>> >>>>>>>>>>>>> by
>>> >>>>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with
>>> SQL
>>> >>>>>>>>>>> grammar,
>>> >>>>>>>>>>>>>> maybe
>>> >>>>>>>>>>>>>>>> we should use
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
>>> >>>>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
>>> >>>>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
>>> >>>>>>>>>> consistent.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong
>>> to
>>> >> the
>>> >>>>>>>>>>> table
>>> >>>>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE>
>>> tag to
>>> >>>>>>>>>>>> identify
>>> >>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>> commands should belong to the sql client and which
>>> commands
>>> >>>>>>>>>> should
>>> >>>>>>>>>>>>>> belong
>>> >>>>>>>>>>>>>>>> to the table environment. I also add a new section about
>>> >>>>>>>>>>>>> implementation
>>> >>>>>>>>>>>>>>>> details in the FLIP.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>> Shengkai
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二
>>> 下午6:43写道：
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will
>>> give the
>>> >>>>>>>>>> SQL
>>> >>>>>>>>>>>>> Client
>>> >>>>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>> very good update and make it production ready.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Here is some feedback from my side:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> 1) SQL client specific options
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I don't think that `sql-client.planner` and
>>> >>>>>>>>>>>>> `sql-client.execution.mode`
>>> >>>>>>>>>>>>>>>>> are SQL Client specific. Similar to
>>> >>>>>>>>>> `StreamExecutionEnvironment`
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added
>>> recently,
>>> >> we
>>> >>>>>>>>>>>> should
>>> >>>>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we
>>> >> offer
>>> >>>>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
>>> >>>>>>>>>>> `table.planner`
>>> >>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>> `table.execution-mode` to
>>> >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> 2) Execution file
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1]
>>> >> including
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>>> mailing
>>> >>>>>>>>>>>>>>>>> list thread at that time? Could you further elaborate
>>> how
>>> >> the
>>> >>>>>>>>>>>>>>>>> multi-statement execution should work for a unified
>>> >>>>>>>>>>> batch/streaming
>>> >>>>>>>>>>>>>>>>> story? According to our past discussions, each line in
>>> an
>>> >>>>>>>>>>> execution
>>> >>>>>>>>>>>>>> file
>>> >>>>>>>>>>>>>>>>> should be executed blocking which means a streaming
>>> query
>>> >>>>>>>>>> needs a
>>> >>>>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO
>>> statement,
>>> >>>>>>>>>> correct?
>>> >>>>>>>>>>>> We
>>> >>>>>>>>>>>>>>>>> should also offer this functionality in
>>> >>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
>>> >>>>>>>>>>>> `sql-client.job.detach`
>>> >>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could
>>> also
>>> >> be
>>> >>>> a
>>> >>>>>>>>>>>>> general
>>> >>>>>>>>>>>>>>>>> `table.multi-sql-async` option?
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> 3) DELETE JAR
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE"
>>> >> sounds
>>> >>>>>>>>>> like
>>> >>>>>>>>>>>> one
>>> >>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> 4) LIST JAR
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL
>>> commands
>>> >>>>>>>>>> such
>>> >>>>>>>>>>> as
>>> >>>>>>>>>>>>>> `SHOW
>>> >>>>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> We should keep the details in sync with
>>> >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
>>> >>>> confusion
>>> >>>>>>>>>>>> about
>>> >>>>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
>>> >>>>>>>>>>> `ESTIMATED_COST`
>>> >>>>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a
>>> >> reason
>>> >>>>>>>>>> why
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>>>>> call
>>> >>>>>>>>>>>>>>>>> it that way.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> 6) Implementation details
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> It would be nice to understand how we plan to
>>> implement the
>>> >>>>>>>>>> given
>>> >>>>>>>>>>>>>>>>> features. Most of the commands and config options
>>> should go
>>> >>>>>>>>>> into
>>> >>>>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This
>>> way
>>> >>>>>>>>>> users
>>> >>>>>>>>>>>>> have a
>>> >>>>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
>>> >>>>>>>>>> provide a
>>> >>>>>>>>>>>>>> similar
>>> >>>>>>>>>>>>>>>>> user experience in notebooks or interactive programs
>>> than
>>> >> the
>>> >>>>>>>>>> SQL
>>> >>>>>>>>>>>>>> Client.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>> >>>>>>>>>>>>>>>>> [2]
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Regards,
>>> >>>>>>>>>>>>>>>>> Timo
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>> >>>>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better
>>> rather
>>> >>>> than
>>> >>>>>>>>>>>>> `UNSET`.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二
>>> 下午4:44写道：
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Hi, Jingsong.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much
>>> better.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> 1. We don't need to introduce another command
>>> `UNSET`.
>>> >>>>>>>>>> `RESET`
>>> >>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal
>>> >> just
>>> >>>>>>>>>>>> extends
>>> >>>>>>>>>>>>>> its
>>> >>>>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
>>> >>>>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to
>>> the
>>> >>>>>>>>>> default
>>> >>>>>>>>>>>>>>>>> value[1].
>>> >>>>>>>>>>>>>>>>>>> I think it is more friendly for batch users.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>> Shengkai
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>
>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
>>> >>>> 下午1:56写道：
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too
>>> >> outdated.
>>> >>>>>>>>>> +1
>>> >>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>>>>>> improving it.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and
>>> "UNSET"?
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>> Jingsong
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>>> >>>>>>>>>> lirui.fudan@gmail.com>
>>> >>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed
>>> changes
>>> >> look
>>> >>>>>>>>>>> good
>>> >>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>> me.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>>> >>>>>>>>>>>> fskmine@gmail.com
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Hi, Rui.
>>> >>>>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> The main changes:
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the
>>> statement
>>> >>>>>>>>>> type.
>>> >>>>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the
>>> result
>>> >> of
>>> >>>>>>>>>>>> queries
>>> >>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>> debug
>>> >>>>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
>>> >>>> convenient
>>> >>>>>>>>>>>>>> comparing
>>> >>>>>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> # Add a new sql client option
>>> `sql-client.job.detach`
>>> >> .
>>> >>>>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the
>>> batch
>>> >>>>>>>>>> mode.
>>> >>>>>>>>>>>> Users
>>> >>>>>>>>>>>>>>>> can
>>> >>>>>>>>>>>>>>>>>>>>> set
>>> >>>>>>>>>>>>>>>>>>>>>> this option false and the client will process the
>>> next
>>> >>>>>>>>>> job
>>> >>>>>>>>>>>> until
>>> >>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this
>>> option
>>> >>>> is
>>> >>>>>>>>>>>> false,
>>> >>>>>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>>>>> means the client will execute the next job when
>>> the
>>> >>>>>>>>>> current
>>> >>>>>>>>>>>> job
>>> >>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>> submitted.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>> Shengkai
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>>> >> 下午4:52写道：
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and
>>> hive
>>> >>>>>>>>>> have
>>> >>>>>>>>>>>>>>>> different
>>> >>>>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the
>>> behavior. For
>>> >>>>>>>>>>>> example,
>>> >>>>>>>>>>>>> if
>>> >>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what
>>> happens
>>> >> if
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>> file
>>> >>>>>>>>>>>>>>>>>>>>> contains
>>> >>>>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should
>>> treat
>>> >>>>>>>>>> them
>>> >>>>>>>>>>> as
>>> >>>>>>>>>>>> a
>>> >>>>>>>>>>>>>>>>>>>>> statement
>>> >>>>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
>>> >>>>>>>>>> STATEMENT
>>> >>>>>>>>>>>> SET
>>> >>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously
>>> submit
>>> >>>> the
>>> >>>>>>>>>>> two
>>> >>>>>>>>>>>>>> jobs,
>>> >>>>>>>>>>>>>>>>>>>>> because
>>> >>>>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>>> >>>>>>>>>>>>> fskmine@gmail.com
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Rui,
>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
>>> >>>>>>>>>> suggestions.
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to
>>> strengthen
>>> >>>>>>>>>> the
>>> >>>>>>>>>>> set
>>> >>>>>>>>>>>>>>>>>>>>> command. In
>>> >>>>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the
>>> key-value
>>> >>>> into
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate
>>> the
>>> >>>>>>>>>> table
>>> >>>>>>>>>>>>>> config.
>>> >>>>>>>>>>>>>>>>> If
>>> >>>>>>>>>>>>>>>>>>>>> hive
>>> >>>>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table
>>> config,
>>> >>>>>>>>>> users
>>> >>>>>>>>>>>> are
>>> >>>>>>>>>>>>>>>> able
>>> >>>>>>>>>>>>>>>>>>>>> to set
>>> >>>>>>>>>>>>>>>>>>>>>>>> the hive-related settings.
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will
>>> submit
>>> >> the
>>> >>>>>>>>>> job
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>> exit.
>>> >>>>>>>>>>>>>>>>>>>>> If
>>> >>>>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the
>>> job
>>> >> by
>>> >>>>>>>>>>>>>>>> themselves,
>>> >>>>>>>>>>>>>>>>>>>>> which is
>>> >>>>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In
>>> most
>>> >>>>>>>>>> case,
>>> >>>>>>>>>>>>>> queries
>>> >>>>>>>>>>>>>>>>>>>>> are used
>>> >>>>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries
>>> in the
>>> >>>>>>>>>>>>> interactive
>>> >>>>>>>>>>>>>>>>>>>>> mode.
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>>> >>>> 下午3:18写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this
>>> discussion. I
>>> >>>>>>>>>> think
>>> >>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>> covers a
>>> >>>>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
>>> >>>> improve
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>> usability of our
>>> >>>>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the
>>> >> FLIP.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
>>> >>>>>>>>>>>> configurations
>>> >>>>>>>>>>>>>>>> via
>>> >>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
>>> >>>>>>>>>> configurations
>>> >>>>>>>>>>>> and
>>> >>>>>>>>>>>>> we
>>> >>>>>>>>>>>>>>>>>>>>> don't have
>>> >>>>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such
>>> configurations in
>>> >>>> SQL
>>> >>>>>>>>>>>>> Client.
>>> >>>>>>>>>>>>>>>> For
>>> >>>>>>>>>>>>>>>>>>>>> example,
>>> >>>>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf
>>> when
>>> >>>>>>>>>> using
>>> >>>>>>>>>>>> hive
>>> >>>>>>>>>>>>>>>>>>>>> connector [1].
>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in
>>> SQL
>>> >>>>>>>>>> files
>>> >>>>>>>>>>>>>>>> specified
>>> >>>>>>>>>>>>>>>>>>>>> with
>>> >>>>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f
>>> option
>>> >> but
>>> >>>>>>>>>>> allows
>>> >>>>>>>>>>>>>>>>> queries
>>> >>>>>>>>>>>>>>>>>>>>> in the
>>> >>>>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some
>>> query
>>> >> and
>>> >>>>>>>>>>>> redirect
>>> >>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>> results
>>> >>>>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would
>>> like
>>> >> to
>>> >>>>>>>>>> do
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>> same,
>>> >>>>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>> https://issues.apache.org/jira/browse/FLINK-20590
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu
>>> <
>>> >>>>>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
>>> >>>>>>>>>> additional
>>> >>>>>>>>>>>>>>>>>>>>> suggestions:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in
>>> ExecutionContext
>>> >>>> to
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and
>>> >> batch
>>> >>>>>>>>>> sql.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql
>>> >> client
>>> >>>>>>>>>>>> collect
>>> >>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> results
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at
>>> present,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>             which may have memory issues in
>>> JM or
>>> >>>> Local
>>> >>>>>>>>>> for
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>>> big
>>> >>>>>>>>>>>>>>>>> query
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> result.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing
>>> purpose.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>             We may change to use
>>> SelectTableSink,
>>> >>>> which
>>> >>>>>>>>>> is
>>> >>>>>>>>>>>> based
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway
>>> which
>>> >>>>>>>>>> is in
>>> >>>>>>>>>>>>>>>> FLIP-91.
>>> >>>>>>>>>>>>>>>>>>>>> Seems
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a
>>> long
>>> >>>> time.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>             Provide a long running service
>>> out of
>>> >> the
>>> >>>>>>>>>> box to
>>> >>>>>>>>>>>>>>>>> facilitate
>>> >>>>>>>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> sql
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com>
>>> 于2021年1月28日周四
>>> >>>>>>>>>>> 下午8:54写道：
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
>>> >>>>>>>>>>> FLIP-163:SQL
>>> >>>>>>>>>>>>>>>> Client
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Improvements.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the
>>> problems of
>>> >>>> the
>>> >>>>>>>>>>> sql
>>> >>>>>>>>>>>>>>>> client.
>>> >>>>>>>>>>>>>>>>>>>>> For
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table
>>> >> proposed
>>> >>>>>>>>>> by
>>> >>>>>>>>>>>>>> FLIP-95.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
>>> >>>>>>>>>>> initialize
>>> >>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>> table
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated
>>> '-u'
>>> >>>>>>>>>>>> parameter;
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
>>> >>>>>>>>>> FLIP-163[1].
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> *With kind regards
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>> ------------------------------------------------------------
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese
>>> Academy
>>> >>>> of
>>> >>>>>>>>>>>>> Science
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
>>> >>>> liuyang0704@gmail.com
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
>>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best regards!
>>> >>>>>>>>>>>>>>>>>>>>>>>>> Rui Li
>>> >>>>>>>>>>>
>>> >
>>>
>>>
>
> --
> Best regards!
> Rui Li
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Rui Li <li...@gmail.com>.

Hi,

Glad to see we have reached consensus on option #2. +1 to it.

Regarding the name, I'm fine with `table.dml-async`. But I wonder whether
this config also applies to table API. E.g. if a user
sets table.dml-async=false and calls TableEnvironment::executeSql to run a
DML, will he get sync behavior?

On Mon, Feb 8, 2021 at 11:28 PM Jark Wu <im...@gmail.com> wrote:

> Ah, I just forgot the option name.
>
> I'm also fine with `table.dml-async`.
>
> What do you think @Rui Li <li...@gmail.com> @Shengkai Fang
> <fs...@gmail.com> ?
>
> Best,
> Jark
>
> On Mon, 8 Feb 2021 at 23:06, Timo Walther <tw...@apache.org> wrote:
>
>> Great to hear that. Can someone update the FLIP a final time before we
>> start a vote?
>>
>> We should quickly discuss how we would like to name the config option
>> for the async/sync mode. I heared voices internally that are strongly
>> against calling it "detach" due to historical reasons with a Flink job
>> detach mode. How about `table.dml-async`?
>>
>> Thanks,
>> Timo
>>
>>
>> On 08.02.21 15:55, Jark Wu wrote:
>> > Thanks Timo,
>> >
>> > I'm +1 for option#2 too.
>> >
>> > I think we have addressed all the concerns and can start a vote.
>> >
>> > Best,
>> > Jark
>> >
>> > On Mon, 8 Feb 2021 at 22:19, Timo Walther <tw...@apache.org> wrote:
>> >
>> >> Hi Jark,
>> >>
>> >> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
>> >>
>> >> So let's stick to the config option approach.
>> >>
>> >> However, I strongly believe that we should not use the batch/streaming
>> >> mode for deriving semantics. This discussion is similar to time
>> function
>> >> discussion. We should not derive sync/async submission behavior from a
>> >> flag that should only influence runtime operators and the incremental
>> >> computation. Statements for bounded streams should have the same
>> >> semantics in batch mode.
>> >>
>> >> I think your proposed option 2) is a good tradeoff. For the following
>> >> reasons:
>> >>
>> >> pros:
>> >> - by default, batch and streaming behave exactly the same
>> >> - SQL Client CLI behavior does not change compared to 1.12 and remains
>> >> async for batch and streaming
>> >> - consistent with the async Table API behavior
>> >>
>> >> con:
>> >> - batch files are not 100% SQL compliant by default
>> >>
>> >> The last item might not be an issue since we can expect that users have
>> >> long-running jobs and prefer async execution in most cases.
>> >>
>> >> Regards,
>> >> Timo
>> >>
>> >>
>> >> On 08.02.21 14:15, Jark Wu wrote:
>> >>> Hi Timo,
>> >>>
>> >>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
>> >>> Because it makes submitting streaming jobs very verbose, every INSERT
>> >> INTO
>> >>> and STATEMENT SET must be wrapped in the ASYNC clause which is
>> >>> not user-friendly and not backward-compatible.
>> >>>
>> >>> I agree we will have unified behavior but this is at the cost of
>> hurting
>> >>> our main users.
>> >>> I'm worried that end users can't understand the technical decision,
>> and
>> >>> they would
>> >>> feel streaming is harder to use.
>> >>>
>> >>> If we want to have an unified behavior, and let users decide what's
>> the
>> >>> desirable behavior, I prefer to have a config option. A Flink cluster
>> can
>> >>> be set to async, then
>> >>> users don't need to wrap every DML in an ASYNC clause. This is the
>> least
>> >>> intrusive
>> >>> way to the users.
>> >>>
>> >>>
>> >>> Personally, I'm fine with following options in priority:
>> >>>
>> >>> 1) sync for batch DML and async for streaming DML
>> >>> ==> only breaks batch behavior, but makes both happy
>> >>>
>> >>> 2) async for both batch and streaming DML, and can be set to sync via
>> a
>> >>> configuration.
>> >>> ==> compatible, and provides flexible configurable behavior
>> >>>
>> >>> 3) sync for both batch and streaming DML, and can be
>> >>>       set to async via a configuration.
>> >>> ==> +0 for this, because it breaks all the compatibility, esp. our
>> main
>> >>> users.
>> >>>
>> >>> Best,
>> >>> Jark
>> >>>
>> >>> On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org> wrote:
>> >>>
>> >>>> Hi Jark, Hi Rui,
>> >>>>
>> >>>> 1) How should we execute statements in CLI and in file? Should there
>> be
>> >>>> a difference?
>> >>>> So it seems we have consensus here with unified bahavior. Even though
>> >>>> this means we are breaking existing batch INSERT INTOs that were
>> >>>> asynchronous before.
>> >>>>
>> >>>> 2) Should we have different behavior for batch and streaming?
>> >>>> I think also batch users prefer async behavior because usually even
>> >>>> those pipelines take some time to execute. But we need should stick
>> to
>> >>>> standard SQL blocking semantics.
>> >>>>
>> >>>> What are your opinions on making async explicit in SQL via `BEGIN
>> ASYNC;
>> >>>> ... END;`? This would allow us to really have unified semantics
>> because
>> >>>> batch and streaming would behave the same?
>> >>>>
>> >>>> Regards,
>> >>>> Timo
>> >>>>
>> >>>>
>> >>>> On 07.02.21 04:46, Rui Li wrote:
>> >>>>> Hi Timo,
>> >>>>>
>> >>>>> I agree with Jark that we should provide consistent experience
>> >> regarding
>> >>>>> SQL CLI and files. Some systems even allow users to execute SQL
>> files
>> >> in
>> >>>>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support
>> that
>> >>>> in
>> >>>>> the future, it's a little tricky to decide whether that should be
>> >> treated
>> >>>>> as CLI or file.
>> >>>>>
>> >>>>> I actually prefer a config option and let users decide what's the
>> >>>>> desirable behavior. But if we have agreed not to use options, I'm
>> also
>> >>>> fine
>> >>>>> with Alternative #1.
>> >>>>>
>> >>>>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
>> >>>>>
>> >>>>>> Hi Timo,
>> >>>>>>
>> >>>>>> 1) How should we execute statements in CLI and in file? Should
>> there
>> >> be
>> >>>> a
>> >>>>>> difference?
>> >>>>>> I do think we should unify the behavior of CLI and SQL files. SQL
>> >> files
>> >>>> can
>> >>>>>> be thought of as a shortcut of
>> >>>>>> "start CLI" => "copy content of SQL files" => "past content in
>> CLI".
>> >>>>>> Actually, we already did this in kafka_e2e.sql [1].
>> >>>>>> I think it's hard for users to understand why SQL files behave
>> >>>> differently
>> >>>>>> from CLI, all the other systems don't have such a difference.
>> >>>>>>
>> >>>>>> If we distinguish SQL files and CLI, should there be a difference
>> in
>> >>>> JDBC
>> >>>>>> driver and UI platform?
>> >>>>>> Personally, they all should have consistent behavior.
>> >>>>>>
>> >>>>>> 2) Should we have different behavior for batch and streaming?
>> >>>>>> I think we all agree streaming users prefer async execution,
>> otherwise
>> >>>> it's
>> >>>>>> weird and difficult to use if the
>> >>>>>> submit script or CLI never exists. On the other hand, batch SQL
>> users
>> >>>> are
>> >>>>>> used to SQL statements being
>> >>>>>> executed blockly.
>> >>>>>>
>> >>>>>> Either unified async execution or unified sync execution, will hurt
>> >> one
>> >>>>>> side of the streaming
>> >>>>>> batch users. In order to make both sides happy, I think we can have
>> >>>>>> different behavior for batch and streaming.
>> >>>>>> There are many essential differences between batch and stream
>> >> systems, I
>> >>>>>> think it's normal to have some
>> >>>>>> different behaviors, and the behavior doesn't break the unified
>> batch
>> >>>>>> stream semantics.
>> >>>>>>
>> >>>>>>
>> >>>>>> Thus, I'm +1 to Alternative 1:
>> >>>>>> We consider batch/streaming mode and block for batch INSERT INTO
>> and
>> >>>> async
>> >>>>>> for streaming INSERT INTO/STATEMENT SET.
>> >>>>>> And this behavior is consistent across CLI and files.
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Jark
>> >>>>>>
>> >>>>>> [1]:
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
>> >>>>>>
>> >>>>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org>
>> wrote:
>> >>>>>>
>> >>>>>>> Hi Jark,
>> >>>>>>>
>> >>>>>>> thanks for the summary. I hope we can also find a good long-term
>> >>>>>>> solution on the async/sync execution behavior topic.
>> >>>>>>>
>> >>>>>>> It should be discussed in a bigger round because it is (similar to
>> >> the
>> >>>>>>> time function discussion) related to batch-streaming unification
>> >> where
>> >>>>>>> we should stick to the SQL standard to some degree but also need
>> to
>> >>>> come
>> >>>>>>> up with good streaming semantics.
>> >>>>>>>
>> >>>>>>> Let me summarize the problem again to hear opinions:
>> >>>>>>>
>> >>>>>>> - Batch SQL users are used to execute SQL files sequentially (from
>> >> top
>> >>>>>>> to bottom).
>> >>>>>>> - Batch SQL users are used to SQL statements being executed
>> blocking.
>> >>>>>>> One after the other. Esp. when moving around data with INSERT
>> INTO.
>> >>>>>>> - Streaming users prefer async execution because unbounded stream
>> are
>> >>>>>>> more frequent than bounded streams.
>> >>>>>>> - We decided to make Flink Table API is async because in a
>> >> programming
>> >>>>>>> language it is easy to call `.await()` on the result to make it
>> >>>> blocking.
>> >>>>>>> - INSERT INTO statements in the current SQL Client implementation
>> are
>> >>>>>>> always submitted asynchrounous.
>> >>>>>>> - Other client's such as Ververica platform allow only one INSERT
>> >> INTO
>> >>>>>>> or a STATEMENT SET at the end of a file that will run
>> >> asynchrounously.
>> >>>>>>>
>> >>>>>>> Questions:
>> >>>>>>>
>> >>>>>>> - How should we execute statements in CLI and in file? Should
>> there
>> >> be
>> >>>> a
>> >>>>>>> difference?
>> >>>>>>> - Should we have different behavior for batch and streaming?
>> >>>>>>> - Shall we solve parts with a config option or is it better to
>> make
>> >> it
>> >>>>>>> explicit in the SQL job definition because it influences the
>> >> semantics
>> >>>>>>> of multiple INSERT INTOs?
>> >>>>>>>
>> >>>>>>> Let me summarize my opinion at the moment:
>> >>>>>>>
>> >>>>>>> - SQL files should always be executed blocking by default. Because
>> >> they
>> >>>>>>> could potentially contain a long list of INSERT INTO statements.
>> This
>> >>>>>>> would be SQL standard compliant.
>> >>>>>>> - If we allow async execution, we should make this explicit in the
>> >> SQL
>> >>>>>>> file via `BEGIN ASYNC; ... END;`.
>> >>>>>>> - In the CLI, we always execute async to maintain the old
>> behavior.
>> >> We
>> >>>>>>> can also assume that people are only using the CLI to fire
>> statements
>> >>>>>>> and close the CLI afterwards.
>> >>>>>>>
>> >>>>>>> Alternative 1:
>> >>>>>>> - We consider batch/streaming mode and block for batch INSERT INTO
>> >> and
>> >>>>>>> async for streaming INSERT INTO/STATEMENT SET
>> >>>>>>>
>> >>>>>>> What do others think?
>> >>>>>>>
>> >>>>>>> Regards,
>> >>>>>>> Timo
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 05.02.21 04:03, Jark Wu wrote:
>> >>>>>>>> Hi all,
>> >>>>>>>>
>> >>>>>>>> After an offline discussion with Timo and Kurt, we have reached
>> some
>> >>>>>>>> consensus.
>> >>>>>>>> Please correct me if I am wrong or missed anything.
>> >>>>>>>>
>> >>>>>>>> 1) We will introduce "table.planner" and "table.execution-mode"
>> >>>> instead
>> >>>>>>> of
>> >>>>>>>> "sql-client" prefix,
>> >>>>>>>> and add `TableEnvironment.create(Configuration)` interface.
>> These 2
>> >>>>>>> options
>> >>>>>>>> can only be used
>> >>>>>>>> for tableEnv initialization. If used after initialization, Flink
>> >>>> should
>> >>>>>>>> throw an exception. We may can
>> >>>>>>>> support dynamic switch the planner in the future.
>> >>>>>>>>
>> >>>>>>>> 2) We will have only one parser,
>> >>>>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a
>> string
>> >>>>>>>> statement, and returns a list of Operation. It will first use
>> regex
>> >> to
>> >>>>>>>> match some special statement,
>> >>>>>>>>      e.g. SET, ADD JAR, others will be delegated to the
>> underlying
>> >>>> Calcite
>> >>>>>>>> parser. The Parser can
>> >>>>>>>> have different implementations, e.g. HiveParser.
>> >>>>>>>>
>> >>>>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink
>> dialect.
>> >>>> But
>> >>>>>>> we
>> >>>>>>>> can allow
>> >>>>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
>> >>>>>>>>
>> >>>>>>>> 4) We don't have a conclusion for async/sync execution behavior
>> yet.
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Jark
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi Ingo,
>> >>>>>>>>>
>> >>>>>>>>> Since we have supported the WITH syntax and SET command since
>> v1.9
>> >>>>>>> [1][2],
>> >>>>>>>>> and
>> >>>>>>>>> we have never received such complaints, I think it's fine for
>> such
>> >>>>>>>>> differences.
>> >>>>>>>>>
>> >>>>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
>> >>>>>> requires
>> >>>>>>>>> string literal keys[3],
>> >>>>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> Jark
>> >>>>>>>>>
>> >>>>>>>>> [1]:
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
>> >>>>>>>>> [2]:
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
>> >>>>>>>>> [3]:
>> >>>>>>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>> >>>>>>>>> [4]:
>> >>>>>>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
>> >>>>>>>>> (search "set mapred.reduce.tasks=32")
>> >>>>>>>>>
>> >>>>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com>
>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hi,
>> >>>>>>>>>>
>> >>>>>>>>>> regarding the (un-)quoted question, compatibility is of course
>> an
>> >>>>>>>>>> important
>> >>>>>>>>>> argument, but in terms of consistency I'd find it a bit
>> surprising
>> >>>>>> that
>> >>>>>>>>>> WITH handles it differently than SET, and I wonder if that
>> could
>> >>>>>> cause
>> >>>>>>>>>> friction for developers when writing their SQL.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Regards
>> >>>>>>>>>> Ingo
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Hi all,
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regarding "One Parser", I think it's not possible for now
>> because
>> >>>>>>>>>> Calcite
>> >>>>>>>>>>> parser can't parse
>> >>>>>>>>>>> special characters (e.g. "-") unless quoting them as string
>> >>>>>> literals.
>> >>>>>>>>>>> That's why the WITH option
>> >>>>>>>>>>> key are string literals not identifiers.
>> >>>>>>>>>>>
>> >>>>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
>> >>>>>>>>>>> /local/my-home/test.jar
>> >>>>>>>>>>> have the same
>> >>>>>>>>>>> problems. That's why we propose two parser, one splits lines
>> into
>> >>>>>>>>>> multiple
>> >>>>>>>>>>> statements and match special
>> >>>>>>>>>>> command through regex which is light-weight, and delegate
>> other
>> >>>>>>>>>> statements
>> >>>>>>>>>>> to the other parser which is Calcite parser.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Note: we should stick on the unquoted SET
>> >>>>>>> table.exec.mini-batch.enabled
>> >>>>>>>>>> =
>> >>>>>>>>>>> true syntax,
>> >>>>>>>>>>> both for backward-compatibility and easy-to-use, and all the
>> >> other
>> >>>>>>>>>> systems
>> >>>>>>>>>>> don't have quotes on the key.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regarding "table.planner" vs "sql-client.planner",
>> >>>>>>>>>>> if we want to use "table.planner", I think we should explain
>> >>>> clearly
>> >>>>>>>>>> what's
>> >>>>>>>>>>> the scope it can be used in documentation.
>> >>>>>>>>>>> Otherwise, there will be users complaining why the planner
>> >> doesn't
>> >>>>>>>>>> change
>> >>>>>>>>>>> when setting the configuration on TableEnv.
>> >>>>>>>>>>> Would be better throwing an exception to indicate users it's
>> now
>> >>>>>>>>>> allowed to
>> >>>>>>>>>>> change planner after TableEnv is initialized.
>> >>>>>>>>>>> However, it seems not easy to implement.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>> Jark
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
>> >>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi everyone,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regarding "table.planner" and "table.execution-mode"
>> >>>>>>>>>>>> If we define that those two options are just used to
>> initialize
>> >>>> the
>> >>>>>>>>>>>> TableEnvironment, +1 for introducing table options instead of
>> >>>>>>>>>> sql-client
>> >>>>>>>>>>>> options.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regarding "the sql client, we will maintain two parsers", I
>> want
>> >>>> to
>> >>>>>>>>>> give
>> >>>>>>>>>>>> more inputs:
>> >>>>>>>>>>>> We want to introduce sql-gateway into the Flink project (see
>> >>>>>> FLIP-24
>> >>>>>>> &
>> >>>>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the
>> CLI
>> >>>>>> client
>> >>>>>>>>>> and
>> >>>>>>>>>>>> the gateway service will communicate through Rest API. The "
>> ADD
>> >>>>>> JAR
>> >>>>>>>>>>>> /local/path/jar " will be executed in the CLI client
>> machine. So
>> >>>>>> when
>> >>>>>>>>>> we
>> >>>>>>>>>>>> submit a sql file which contains multiple statements, the CLI
>> >>>>>> client
>> >>>>>>>>>>> needs
>> >>>>>>>>>>>> to pick out the "ADD JAR" line, and also statements need to
>> be
>> >>>>>>>>>> submitted
>> >>>>>>>>>>> or
>> >>>>>>>>>>>> executed one by one to make sure the result is correct. The
>> sql
>> >>>>>> file
>> >>>>>>>>>> may
>> >>>>>>>>>>> be
>> >>>>>>>>>>>> look like:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> SET xxx=yyy;
>> >>>>>>>>>>>> create table my_table ...;
>> >>>>>>>>>>>> create table my_sink ...;
>> >>>>>>>>>>>> ADD JAR /local/path/jar1;
>> >>>>>>>>>>>> create function my_udf as com....MyUdf;
>> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>> >>>>>>>>>>>> REMOVE JAR /local/path/jar1;
>> >>>>>>>>>>>> drop function my_udf;
>> >>>>>>>>>>>> ADD JAR /local/path/jar2;
>> >>>>>>>>>>>> create function my_udf as com....MyUdf2;
>> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The lines need to be splitted into multiple statements first
>> in
>> >>>> the
>> >>>>>>>>>> CLI
>> >>>>>>>>>>>> client, there are two approaches:
>> >>>>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser
>> >> splits
>> >>>>>>> the
>> >>>>>>>>>>>> lines and tells which lines are "ADD JAR".
>> >>>>>>>>>>>> pro: there is only one parser
>> >>>>>>>>>>>> cons: It's a little heavy that the CLI client depends on the
>> >>>>>>>>>> sql-parser,
>> >>>>>>>>>>>> because the CLI client is just a simple tool which receives
>> the
>> >>>>>> user
>> >>>>>>>>>>>> commands and displays the result. The non "ADD JAR" command
>> will
>> >>>> be
>> >>>>>>>>>>> parsed
>> >>>>>>>>>>>> twice.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2. The CLI client splits the lines into multiple statements
>> and
>> >>>>>> finds
>> >>>>>>>>>> the
>> >>>>>>>>>>>> ADD JAR command through regex matching.
>> >>>>>>>>>>>> pro: The CLI client is very light-weight.
>> >>>>>>>>>>>> cons: there are two parsers.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> (personally, I prefer the second option)
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them
>> both.
>> >>>>>>>>>>>> For default dialect, we support SHOW JARS, but if we switch
>> to
>> >>>> hive
>> >>>>>>>>>>>> dialect, LIST JARS is also supported.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [1]
>> >>>>>>>>>>>
>> >>>>>>>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>> >>>>>>>>>>>> [2]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best,
>> >>>>>>>>>>>> Godfrey
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi guys,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent
>> with
>> >>>>>> other
>> >>>>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion about
>> >>>>>> REMOVE
>> >>>>>>>>>> vs
>> >>>>>>>>>>>>> DELETE though.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I
>> >> know,
>> >>>>>>>>>> most
>> >>>>>>>>>>>>> users who are requesting these features are previously hive
>> >>>> users.
>> >>>>>>>>>> So I
>> >>>>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
>> >>>>>> REMOVE/DELETE
>> >>>>>>>>>>> JARS
>> >>>>>>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT
>> >> and
>> >>>>>>>>>> QUIT
>> >>>>>>>>>>> as
>> >>>>>>>>>>>>> the command to terminate the program. So if that's not hard
>> to
>> >>>>>>>>>> achieve,
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>> will make users happier, I don't see a reason why we must
>> >> choose
>> >>>>>> one
>> >>>>>>>>>>> over
>> >>>>>>>>>>>>> the other.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <
>> >> twalthr@apache.org
>> >>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Hi everyone,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> some feedback regarding the open questions. Maybe we can
>> >> discuss
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to
>> determine
>> >>>> how
>> >>>>>>>>>> we
>> >>>>>>>>>>>>>> proceed with this in the near future.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 1) "whether the table environment has the ability to update
>> >>>>>>>>>> itself"
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Maybe there was some misunderstanding. I don't think that
>> we
>> >>>>>>>>>> should
>> >>>>>>>>>>>>>> support
>> >>>>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>> >>>>>>>>>>>>>> "old")`. Instead I'm proposing to support
>> >>>>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
>> >>>>>>>>>> execution
>> >>>>>>>>>>>>>> mode are read immediately and a subsequent changes to these
>> >>>>>>>>>> options
>> >>>>>>>>>>>> will
>> >>>>>>>>>>>>>> have no effect. We are doing it similar in `new
>> >>>>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
>> >>>>>>>>>> ConfigOption's
>> >>>>>>>>>>>>>> must not be SQL Client specific but can be part of the core
>> >>>> table
>> >>>>>>>>>>> code
>> >>>>>>>>>>>>>> base. Many users would like to get a 100% preconfigured
>> >>>>>>>>>> environment
>> >>>>>>>>>>>> from
>> >>>>>>>>>>>>>> just Configuration. And this is not possible right now. We
>> can
>> >>>>>>>>>> solve
>> >>>>>>>>>>>>>> both use cases in one change.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I remember we had some discussion about this and decided
>> that
>> >> we
>> >>>>>>>>>>> would
>> >>>>>>>>>>>>>> like to maintain only one parser. In the end it is "One
>> Flink
>> >>>>>> SQL"
>> >>>>>>>>>>>> where
>> >>>>>>>>>>>>>> commands influence each other also with respect to
>> keywords.
>> >> It
>> >>>>>>>>>>> should
>> >>>>>>>>>>>>>> be fine to include the SQL Client commands in the Flink
>> >> parser.
>> >>>>>> Of
>> >>>>>>>>>>>>>> cource the table environment would not be able to handle
>> the
>> >>>>>>>>>>>> `Operation`
>> >>>>>>>>>>>>>> instance that would be the result but we can introduce
>> hooks
>> >> to
>> >>>>>>>>>>> handle
>> >>>>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Can we skip `table.job.async` in the first version? We
>> should
>> >>>>>>>>>> further
>> >>>>>>>>>>>>>> discuss whether we introduce a special SQL clause for
>> wrapping
>> >>>>>>>>>> async
>> >>>>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming
>> >>>> queries
>> >>>>>>>>>> we
>> >>>>>>>>>>>>>> need to be careful and should force users to either "one
>> >> INSERT
>> >>>>>>>>>> INTO"
>> >>>>>>>>>>>> or
>> >>>>>>>>>>>>>> "one STATEMENT SET".
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 3) 4) "HIVE also uses these commands"
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> In general, Hive is not a good reference. Aligning the
>> >> commands
>> >>>>>>>>>> more
>> >>>>>>>>>>>>>> with the remaining commands should be our goal. We just
>> had a
>> >>>>>>>>>> MODULE
>> >>>>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it
>> is
>> >>>> true
>> >>>>>>>>>>> that
>> >>>>>>>>>>>>>> JARs are not part of the catalog which is why I would not
>> use
>> >>>>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the
>> English
>> >>>>>>>>>>> language.
>> >>>>>>>>>>>>>> Take a look at the Java collection API as another example.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 6) "Most of the commands should belong to the table
>> >> environment"
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
>> >>>>>>>>>> understand.
>> >>>>>>>>>>> It
>> >>>>>>>>>>>>>> is good to see that most commends will be available in
>> >>>>>>>>>>>> TableEnvironment.
>> >>>>>>>>>>>>>> However, I would also support SET and RESET for
>> consistency.
>> >>>>>>>>>> Again,
>> >>>>>>>>>>>> from
>> >>>>>>>>>>>>>> an architectural point of view, if we would allow some
>> kind of
>> >>>>>>>>>>>>>> `Operation` hook in table environment, we could check for
>> SQL
>> >>>>>>>>>> Client
>> >>>>>>>>>>>>>> specific options and forward to regular
>> >>>>>>>>>>> `TableConfig.getConfiguration`
>> >>>>>>>>>>>>>> otherwise. What do you think?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
>> >>>>>>>>>>>>>>> Hi Timo,
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I will respond some of the questions:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> 1) SQL client specific options
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on
>> >> where
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>>> configuration takes effect.
>> >>>>>>>>>>>>>>> If it is a table configuration, we should make clear
>> what's
>> >> the
>> >>>>>>>>>>>>> behavior
>> >>>>>>>>>>>>>>> when users change
>> >>>>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
>> >>>>>>>>>>>>>> `sql-client.execution.mode`
>> >>>>>>>>>>>>>>> are something special
>> >>>>>>>>>>>>>>> that can't be changed after TableEnvironment has been
>> >>>>>>>>>> initialized.
>> >>>>>>>>>>>> You
>> >>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>> see
>> >>>>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`
>> method
>> >> to
>> >>>>>>>>>>>> override
>> >>>>>>>>>>>>>>> configuration after
>> >>>>>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Therefore, I think it would be better to still use
>> >>>>>>>>>>>>> `sql-client.planner`
>> >>>>>>>>>>>>>>> and `sql-client.execution.mode`.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> 2) Execution file
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> >From my point of view, there is a big difference between
>> >>>>>>>>>>>>>>> `sql-client.job.detach` and
>> >>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
>> >>>>>>>>>> `sql-client.job.detach`
>> >>>>>>>>>>>> will
>> >>>>>>>>>>>>>>> affect every single DML statement
>> >>>>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I
>> >> think
>> >>>>>>>>>> the
>> >>>>>>>>>>>>> single
>> >>>>>>>>>>>>>>> DML statement in the interactive
>> >>>>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
>> >>>>>>>>>>>>>>> tEnv#executeMultiSql.
>> >>>>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
>> >>>>>>>>>>>>> `table.multi-sql-async`.
>> >>>>>>>>>>>>>>> I just find that runtime provides a configuration called
>> >>>>>>>>>>>>>>> "execution.attached" [1] which is false by default
>> >>>>>>>>>>>>>>> which specifies if the pipeline is submitted in attached
>> or
>> >>>>>>>>>>> detached
>> >>>>>>>>>>>>>> mode.
>> >>>>>>>>>>>>>>> It provides exactly the same
>> >>>>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you
>> think
>> >>>>>>>>>> about
>> >>>>>>>>>>>> using
>> >>>>>>>>>>>>>>> this option?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> If we also want to support this config in
>> TableEnvironment, I
>> >>>>>>>>>> think
>> >>>>>>>>>>>> it
>> >>>>>>>>>>>>>>> should also affect the DML execution
>> >>>>>>>>>>>>>>>       of `tEnv#executeSql()`, not only DMLs in
>> >>>>>>>>>>> `tEnv#executeMultiSql()`.
>> >>>>>>>>>>>>>>> Therefore, the behavior may look like this:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==>
>> >> async
>> >>>>>>>>>> by
>> >>>>>>>>>>>>>> default
>> >>>>>>>>>>>>>>> tableResult.await()   ==> manually block until finish
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>
>> >> tEnv.getConfig().getConfiguration().setString("execution.attached",
>> >>>>>>>>>>>>>> "true")
>> >>>>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==>
>> >>>> sync,
>> >>>>>>>>>>>> don't
>> >>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>> to wait on the TableResult
>> >>>>>>>>>>>>>>> tEnv.executeMultiSql(
>> >>>>>>>>>>>>>>> """
>> >>>>>>>>>>>>>>> CREATE TABLE ....  ==> always sync
>> >>>>>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration
>> above
>> >>>>>>>>>>>>>>> SET execution.attached = false;
>> >>>>>>>>>>>>>>> INSERT INTO ...  => async
>> >>>>>>>>>>>>>>> """)
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
>> >>>>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two
>> >> separate
>> >>>>>>>>>>>> topics,
>> >>>>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
>> >>>>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
>> >>>>>>>>>> statements.
>> >>>>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't
>> want
>> >>>>>>>>>> it to
>> >>>>>>>>>>>>> block
>> >>>>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>> Jark
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> [1]:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <
>> >> fskmine@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Hi, Timo.
>> >>>>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts
>> >> about
>> >>>>>>>>>> your
>> >>>>>>>>>>>>>>>> feedback.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the
>> >> table
>> >>>>>>>>>>>>>> environment
>> >>>>>>>>>>>>>>>> has the ability to update itself. Let's take a simple
>> >> program
>> >>>>>>>>>> as
>> >>>>>>>>>>> an
>> >>>>>>>>>>>>>>>> example.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> ```
>> >>>>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> tEnv.getConfig.getConfiguration.setString("table.planner",
>> >>>>>>>>>> "old");
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> tEnv.executeSql("...");
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> ```
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> If we regard this option as a table option, users don't
>> have
>> >>>> to
>> >>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>> another table environment manually. In that case, tEnv
>> needs
>> >>>> to
>> >>>>>>>>>>>> check
>> >>>>>>>>>>>>>>>> whether the current mode and planner are the same as
>> before
>> >>>>>>>>>> when
>> >>>>>>>>>>>>>> executeSql
>> >>>>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the table
>> >>>>>>>>>>>> environment,
>> >>>>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but
>> >> set
>> >>>>>>>>>> old
>> >>>>>>>>>>>>>> planner
>> >>>>>>>>>>>>>>>> and batch mode. But when we make this option as a sql
>> client
>> >>>>>>>>>>> option,
>> >>>>>>>>>>>>>> users
>> >>>>>>>>>>>>>>>> only use the SET command to change the setting. We can
>> >> rebuild
>> >>>>>>>>>> a
>> >>>>>>>>>>> new
>> >>>>>>>>>>>>>> table
>> >>>>>>>>>>>>>>>> environment when set successes.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the
>> >> implementation
>> >>>>>>>>>>> before
>> >>>>>>>>>>>>>>>> continuing this topic. In the sql client, we will
>> maintain
>> >> two
>> >>>>>>>>>>>>> parsers.
>> >>>>>>>>>>>>>> The
>> >>>>>>>>>>>>>>>> first parser(client parser) will only match the sql
>> client
>> >>>>>>>>>>> commands.
>> >>>>>>>>>>>>> If
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> client parser can't parse the statement, we will leverage
>> >> the
>> >>>>>>>>>>> power
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> table environment to execute. According to our blueprint,
>> >>>>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
>> >>>>>>>>>>> Therefore,
>> >>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this
>> >>>> FLIP.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> But if we need to introduce the
>> >>>>>>>>>> `TableEnvironment.executeMultiSql`
>> >>>>>>>>>>>> in
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> future, I think it's OK to use the option
>> >>>>>>>>>> `table.multi-sql-async`
>> >>>>>>>>>>>>> rather
>> >>>>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the
>> name
>> >> is
>> >>>>>>>>>> not
>> >>>>>>>>>>>>>> suitable
>> >>>>>>>>>>>>>>>> because the name is confusing for others. When setting
>> the
>> >>>>>>>>>> option
>> >>>>>>>>>>>>>> false, we
>> >>>>>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
>> >>>>>>>>>>> statement,
>> >>>>>>>>>>>>> not
>> >>>>>>>>>>>>>> DDL
>> >>>>>>>>>>>>>>>> or others(other sql statements are always executed
>> >>>>>>>>>> synchronously).
>> >>>>>>>>>>>> So
>> >>>>>>>>>>>>>> how
>> >>>>>>>>>>>>>>>> about `table.job.async`? It only works for the sql-client
>> >> and
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
>> >>>>>>>>>> environment
>> >>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>> return the result until the job finishes.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE
>> JAR
>> >> and
>> >>>>>>>>>>> LIST
>> >>>>>>>>>>>>> JAR
>> >>>>>>>>>>>>>>>> because HIVE also uses these commands to add the jar into
>> >> the
>> >>>>>>>>>>>>> classpath
>> >>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce
>> our
>> >>>>>>>>>> work
>> >>>>>>>>>>> for
>> >>>>>>>>>>>>>> hive
>> >>>>>>>>>>>>>>>> compatibility.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are
>> not
>> >>>>>>>>>>>> maintained
>> >>>>>>>>>>>>> by
>> >>>>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with
>> SQL
>> >>>>>>>>>>> grammar,
>> >>>>>>>>>>>>>> maybe
>> >>>>>>>>>>>>>>>> we should use
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
>> >>>>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
>> >>>>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
>> >>>>>>>>>> consistent.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong
>> to
>> >> the
>> >>>>>>>>>>> table
>> >>>>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE>
>> tag to
>> >>>>>>>>>>>> identify
>> >>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>> commands should belong to the sql client and which
>> commands
>> >>>>>>>>>> should
>> >>>>>>>>>>>>>> belong
>> >>>>>>>>>>>>>>>> to the table environment. I also add a new section about
>> >>>>>>>>>>>>> implementation
>> >>>>>>>>>>>>>>>> details in the FLIP.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>> Shengkai
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give
>> the
>> >>>>>>>>>> SQL
>> >>>>>>>>>>>>> Client
>> >>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>> very good update and make it production ready.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Here is some feedback from my side:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 1) SQL client specific options
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I don't think that `sql-client.planner` and
>> >>>>>>>>>>>>> `sql-client.execution.mode`
>> >>>>>>>>>>>>>>>>> are SQL Client specific. Similar to
>> >>>>>>>>>> `StreamExecutionEnvironment`
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added
>> recently,
>> >> we
>> >>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we
>> >> offer
>> >>>>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
>> >>>>>>>>>>> `table.planner`
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> `table.execution-mode` to
>> >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 2) Execution file
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1]
>> >> including
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>> mailing
>> >>>>>>>>>>>>>>>>> list thread at that time? Could you further elaborate
>> how
>> >> the
>> >>>>>>>>>>>>>>>>> multi-statement execution should work for a unified
>> >>>>>>>>>>> batch/streaming
>> >>>>>>>>>>>>>>>>> story? According to our past discussions, each line in
>> an
>> >>>>>>>>>>> execution
>> >>>>>>>>>>>>>> file
>> >>>>>>>>>>>>>>>>> should be executed blocking which means a streaming
>> query
>> >>>>>>>>>> needs a
>> >>>>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
>> >>>>>>>>>> correct?
>> >>>>>>>>>>>> We
>> >>>>>>>>>>>>>>>>> should also offer this functionality in
>> >>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
>> >>>>>>>>>>>> `sql-client.job.detach`
>> >>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could
>> also
>> >> be
>> >>>> a
>> >>>>>>>>>>>>> general
>> >>>>>>>>>>>>>>>>> `table.multi-sql-async` option?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 3) DELETE JAR
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE"
>> >> sounds
>> >>>>>>>>>> like
>> >>>>>>>>>>>> one
>> >>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 4) LIST JAR
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL
>> commands
>> >>>>>>>>>> such
>> >>>>>>>>>>> as
>> >>>>>>>>>>>>>> `SHOW
>> >>>>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> We should keep the details in sync with
>> >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
>> >>>> confusion
>> >>>>>>>>>>>> about
>> >>>>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
>> >>>>>>>>>>> `ESTIMATED_COST`
>> >>>>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a
>> >> reason
>> >>>>>>>>>> why
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>>> call
>> >>>>>>>>>>>>>>>>> it that way.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 6) Implementation details
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> It would be nice to understand how we plan to implement
>> the
>> >>>>>>>>>> given
>> >>>>>>>>>>>>>>>>> features. Most of the commands and config options
>> should go
>> >>>>>>>>>> into
>> >>>>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This
>> way
>> >>>>>>>>>> users
>> >>>>>>>>>>>>> have a
>> >>>>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
>> >>>>>>>>>> provide a
>> >>>>>>>>>>>>>> similar
>> >>>>>>>>>>>>>>>>> user experience in notebooks or interactive programs
>> than
>> >> the
>> >>>>>>>>>> SQL
>> >>>>>>>>>>>>>> Client.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>> >>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
>> >>>>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better
>> rather
>> >>>> than
>> >>>>>>>>>>>>> `UNSET`.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二
>> 下午4:44写道：
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Hi, Jingsong.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
>> >>>>>>>>>> `RESET`
>> >>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal
>> >> just
>> >>>>>>>>>>>> extends
>> >>>>>>>>>>>>>> its
>> >>>>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
>> >>>>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to
>> the
>> >>>>>>>>>> default
>> >>>>>>>>>>>>>>>>> value[1].
>> >>>>>>>>>>>>>>>>>>> I think it is more friendly for batch users.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>> Shengkai
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>
>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
>> >>>> 下午1:56写道：
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too
>> >> outdated.
>> >>>>>>>>>> +1
>> >>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>> improving it.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and
>> "UNSET"?
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>> Jingsong
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>> >>>>>>>>>> lirui.fudan@gmail.com>
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes
>> >> look
>> >>>>>>>>>>> good
>> >>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>> me.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>> >>>>>>>>>>>> fskmine@gmail.com
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Hi, Rui.
>> >>>>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> The main changes:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the
>> statement
>> >>>>>>>>>> type.
>> >>>>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the
>> result
>> >> of
>> >>>>>>>>>>>> queries
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> debug
>> >>>>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
>> >>>> convenient
>> >>>>>>>>>>>>>> comparing
>> >>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> # Add a new sql client option
>> `sql-client.job.detach`
>> >> .
>> >>>>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the
>> batch
>> >>>>>>>>>> mode.
>> >>>>>>>>>>>> Users
>> >>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>> set
>> >>>>>>>>>>>>>>>>>>>>>> this option false and the client will process the
>> next
>> >>>>>>>>>> job
>> >>>>>>>>>>>> until
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this
>> option
>> >>>> is
>> >>>>>>>>>>>> false,
>> >>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>> means the client will execute the next job when the
>> >>>>>>>>>> current
>> >>>>>>>>>>>> job
>> >>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>> submitted.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>> Shengkai
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>> >> 下午4:52写道：
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and
>> hive
>> >>>>>>>>>> have
>> >>>>>>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior.
>> For
>> >>>>>>>>>>>> example,
>> >>>>>>>>>>>>> if
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what
>> happens
>> >> if
>> >>>>>>>>>> the
>> >>>>>>>>>>>> file
>> >>>>>>>>>>>>>>>>>>>>> contains
>> >>>>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should
>> treat
>> >>>>>>>>>> them
>> >>>>>>>>>>> as
>> >>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>> statement
>> >>>>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
>> >>>>>>>>>> STATEMENT
>> >>>>>>>>>>>> SET
>> >>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously
>> submit
>> >>>> the
>> >>>>>>>>>>> two
>> >>>>>>>>>>>>>> jobs,
>> >>>>>>>>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>> >>>>>>>>>>>>> fskmine@gmail.com
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Rui,
>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
>> >>>>>>>>>> suggestions.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to
>> strengthen
>> >>>>>>>>>> the
>> >>>>>>>>>>> set
>> >>>>>>>>>>>>>>>>>>>>> command. In
>> >>>>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the
>> key-value
>> >>>> into
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate
>> the
>> >>>>>>>>>> table
>> >>>>>>>>>>>>>> config.
>> >>>>>>>>>>>>>>>>> If
>> >>>>>>>>>>>>>>>>>>>>> hive
>> >>>>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table
>> config,
>> >>>>>>>>>> users
>> >>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>> able
>> >>>>>>>>>>>>>>>>>>>>> to set
>> >>>>>>>>>>>>>>>>>>>>>>>> the hive-related settings.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will
>> submit
>> >> the
>> >>>>>>>>>> job
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> exit.
>> >>>>>>>>>>>>>>>>>>>>> If
>> >>>>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the
>> job
>> >> by
>> >>>>>>>>>>>>>>>> themselves,
>> >>>>>>>>>>>>>>>>>>>>> which is
>> >>>>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In
>> most
>> >>>>>>>>>> case,
>> >>>>>>>>>>>>>> queries
>> >>>>>>>>>>>>>>>>>>>>> are used
>> >>>>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in
>> the
>> >>>>>>>>>>>>> interactive
>> >>>>>>>>>>>>>>>>>>>>> mode.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>> >>>> 下午3:18写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this
>> discussion. I
>> >>>>>>>>>> think
>> >>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>> covers a
>> >>>>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
>> >>>> improve
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> usability of our
>> >>>>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the
>> >> FLIP.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
>> >>>>>>>>>>>> configurations
>> >>>>>>>>>>>>>>>> via
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
>> >>>>>>>>>> configurations
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>> don't have
>> >>>>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations
>> in
>> >>>> SQL
>> >>>>>>>>>>>>> Client.
>> >>>>>>>>>>>>>>>> For
>> >>>>>>>>>>>>>>>>>>>>> example,
>> >>>>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf
>> when
>> >>>>>>>>>> using
>> >>>>>>>>>>>> hive
>> >>>>>>>>>>>>>>>>>>>>> connector [1].
>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in
>> SQL
>> >>>>>>>>>> files
>> >>>>>>>>>>>>>>>> specified
>> >>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option
>> >> but
>> >>>>>>>>>>> allows
>> >>>>>>>>>>>>>>>>> queries
>> >>>>>>>>>>>>>>>>>>>>> in the
>> >>>>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query
>> >> and
>> >>>>>>>>>>>> redirect
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> results
>> >>>>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would
>> like
>> >> to
>> >>>>>>>>>> do
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> same,
>> >>>>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>> https://issues.apache.org/jira/browse/FLINK-20590
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>> >>>>>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
>> >>>>>>>>>> additional
>> >>>>>>>>>>>>>>>>>>>>> suggestions:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in
>> ExecutionContext
>> >>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and
>> >> batch
>> >>>>>>>>>> sql.
>> >>>>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql
>> >> client
>> >>>>>>>>>>>> collect
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>> results
>> >>>>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at
>> present,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>             which may have memory issues in JM
>> or
>> >>>> Local
>> >>>>>>>>>> for
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>> big
>> >>>>>>>>>>>>>>>>> query
>> >>>>>>>>>>>>>>>>>>>>>>>>>> result.
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing
>> purpose.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>             We may change to use
>> SelectTableSink,
>> >>>> which
>> >>>>>>>>>> is
>> >>>>>>>>>>>> based
>> >>>>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>> >>>>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway
>> which
>> >>>>>>>>>> is in
>> >>>>>>>>>>>>>>>> FLIP-91.
>> >>>>>>>>>>>>>>>>>>>>> Seems
>> >>>>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long
>> >>>> time.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>             Provide a long running service out
>> of
>> >> the
>> >>>>>>>>>> box to
>> >>>>>>>>>>>>>>>>> facilitate
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>> sql
>> >>>>>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com>
>> 于2021年1月28日周四
>> >>>>>>>>>>> 下午8:54写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
>> >>>>>>>>>>> FLIP-163:SQL
>> >>>>>>>>>>>>>>>> Client
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Improvements.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems
>> of
>> >>>> the
>> >>>>>>>>>>> sql
>> >>>>>>>>>>>>>>>> client.
>> >>>>>>>>>>>>>>>>>>>>> For
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table
>> >> proposed
>> >>>>>>>>>> by
>> >>>>>>>>>>>>>> FLIP-95.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
>> >>>>>>>>>>> initialize
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> table
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated
>> '-u'
>> >>>>>>>>>>>> parameter;
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
>> >>>>>>>>>> FLIP-163[1].
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> *With kind regards
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>> ------------------------------------------------------------
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese
>> Academy
>> >>>> of
>> >>>>>>>>>>>>> Science
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>> >>>>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
>> >>>> liuyang0704@gmail.com
>> >>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>>>>>>> Best regards!
>> >>>>>>>>>>>>>>>>>>>>>>>>> Rui Li
>> >>>>>>>>>>>
>> >
>>
>>

-- 
Best regards!
Rui Li

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Ah, I just forgot the option name.

I'm also fine with `table.dml-async`.

What do you think @Rui Li <li...@gmail.com> @Shengkai Fang
<fs...@gmail.com> ?

Best,
Jark

On Mon, 8 Feb 2021 at 23:06, Timo Walther <tw...@apache.org> wrote:

> Great to hear that. Can someone update the FLIP a final time before we
> start a vote?
>
> We should quickly discuss how we would like to name the config option
> for the async/sync mode. I heared voices internally that are strongly
> against calling it "detach" due to historical reasons with a Flink job
> detach mode. How about `table.dml-async`?
>
> Thanks,
> Timo
>
>
> On 08.02.21 15:55, Jark Wu wrote:
> > Thanks Timo,
> >
> > I'm +1 for option#2 too.
> >
> > I think we have addressed all the concerns and can start a vote.
> >
> > Best,
> > Jark
> >
> > On Mon, 8 Feb 2021 at 22:19, Timo Walther <tw...@apache.org> wrote:
> >
> >> Hi Jark,
> >>
> >> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
> >>
> >> So let's stick to the config option approach.
> >>
> >> However, I strongly believe that we should not use the batch/streaming
> >> mode for deriving semantics. This discussion is similar to time function
> >> discussion. We should not derive sync/async submission behavior from a
> >> flag that should only influence runtime operators and the incremental
> >> computation. Statements for bounded streams should have the same
> >> semantics in batch mode.
> >>
> >> I think your proposed option 2) is a good tradeoff. For the following
> >> reasons:
> >>
> >> pros:
> >> - by default, batch and streaming behave exactly the same
> >> - SQL Client CLI behavior does not change compared to 1.12 and remains
> >> async for batch and streaming
> >> - consistent with the async Table API behavior
> >>
> >> con:
> >> - batch files are not 100% SQL compliant by default
> >>
> >> The last item might not be an issue since we can expect that users have
> >> long-running jobs and prefer async execution in most cases.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 08.02.21 14:15, Jark Wu wrote:
> >>> Hi Timo,
> >>>
> >>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
> >>> Because it makes submitting streaming jobs very verbose, every INSERT
> >> INTO
> >>> and STATEMENT SET must be wrapped in the ASYNC clause which is
> >>> not user-friendly and not backward-compatible.
> >>>
> >>> I agree we will have unified behavior but this is at the cost of
> hurting
> >>> our main users.
> >>> I'm worried that end users can't understand the technical decision, and
> >>> they would
> >>> feel streaming is harder to use.
> >>>
> >>> If we want to have an unified behavior, and let users decide what's the
> >>> desirable behavior, I prefer to have a config option. A Flink cluster
> can
> >>> be set to async, then
> >>> users don't need to wrap every DML in an ASYNC clause. This is the
> least
> >>> intrusive
> >>> way to the users.
> >>>
> >>>
> >>> Personally, I'm fine with following options in priority:
> >>>
> >>> 1) sync for batch DML and async for streaming DML
> >>> ==> only breaks batch behavior, but makes both happy
> >>>
> >>> 2) async for both batch and streaming DML, and can be set to sync via a
> >>> configuration.
> >>> ==> compatible, and provides flexible configurable behavior
> >>>
> >>> 3) sync for both batch and streaming DML, and can be
> >>>       set to async via a configuration.
> >>> ==> +0 for this, because it breaks all the compatibility, esp. our main
> >>> users.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org> wrote:
> >>>
> >>>> Hi Jark, Hi Rui,
> >>>>
> >>>> 1) How should we execute statements in CLI and in file? Should there
> be
> >>>> a difference?
> >>>> So it seems we have consensus here with unified bahavior. Even though
> >>>> this means we are breaking existing batch INSERT INTOs that were
> >>>> asynchronous before.
> >>>>
> >>>> 2) Should we have different behavior for batch and streaming?
> >>>> I think also batch users prefer async behavior because usually even
> >>>> those pipelines take some time to execute. But we need should stick to
> >>>> standard SQL blocking semantics.
> >>>>
> >>>> What are your opinions on making async explicit in SQL via `BEGIN
> ASYNC;
> >>>> ... END;`? This would allow us to really have unified semantics
> because
> >>>> batch and streaming would behave the same?
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 07.02.21 04:46, Rui Li wrote:
> >>>>> Hi Timo,
> >>>>>
> >>>>> I agree with Jark that we should provide consistent experience
> >> regarding
> >>>>> SQL CLI and files. Some systems even allow users to execute SQL files
> >> in
> >>>>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support
> that
> >>>> in
> >>>>> the future, it's a little tricky to decide whether that should be
> >> treated
> >>>>> as CLI or file.
> >>>>>
> >>>>> I actually prefer a config option and let users decide what's the
> >>>>> desirable behavior. But if we have agreed not to use options, I'm
> also
> >>>> fine
> >>>>> with Alternative #1.
> >>>>>
> >>>>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Timo,
> >>>>>>
> >>>>>> 1) How should we execute statements in CLI and in file? Should there
> >> be
> >>>> a
> >>>>>> difference?
> >>>>>> I do think we should unify the behavior of CLI and SQL files. SQL
> >> files
> >>>> can
> >>>>>> be thought of as a shortcut of
> >>>>>> "start CLI" => "copy content of SQL files" => "past content in CLI".
> >>>>>> Actually, we already did this in kafka_e2e.sql [1].
> >>>>>> I think it's hard for users to understand why SQL files behave
> >>>> differently
> >>>>>> from CLI, all the other systems don't have such a difference.
> >>>>>>
> >>>>>> If we distinguish SQL files and CLI, should there be a difference in
> >>>> JDBC
> >>>>>> driver and UI platform?
> >>>>>> Personally, they all should have consistent behavior.
> >>>>>>
> >>>>>> 2) Should we have different behavior for batch and streaming?
> >>>>>> I think we all agree streaming users prefer async execution,
> otherwise
> >>>> it's
> >>>>>> weird and difficult to use if the
> >>>>>> submit script or CLI never exists. On the other hand, batch SQL
> users
> >>>> are
> >>>>>> used to SQL statements being
> >>>>>> executed blockly.
> >>>>>>
> >>>>>> Either unified async execution or unified sync execution, will hurt
> >> one
> >>>>>> side of the streaming
> >>>>>> batch users. In order to make both sides happy, I think we can have
> >>>>>> different behavior for batch and streaming.
> >>>>>> There are many essential differences between batch and stream
> >> systems, I
> >>>>>> think it's normal to have some
> >>>>>> different behaviors, and the behavior doesn't break the unified
> batch
> >>>>>> stream semantics.
> >>>>>>
> >>>>>>
> >>>>>> Thus, I'm +1 to Alternative 1:
> >>>>>> We consider batch/streaming mode and block for batch INSERT INTO and
> >>>> async
> >>>>>> for streaming INSERT INTO/STATEMENT SET.
> >>>>>> And this behavior is consistent across CLI and files.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>> [1]:
> >>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
> >>>>>>
> >>>>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org>
> wrote:
> >>>>>>
> >>>>>>> Hi Jark,
> >>>>>>>
> >>>>>>> thanks for the summary. I hope we can also find a good long-term
> >>>>>>> solution on the async/sync execution behavior topic.
> >>>>>>>
> >>>>>>> It should be discussed in a bigger round because it is (similar to
> >> the
> >>>>>>> time function discussion) related to batch-streaming unification
> >> where
> >>>>>>> we should stick to the SQL standard to some degree but also need to
> >>>> come
> >>>>>>> up with good streaming semantics.
> >>>>>>>
> >>>>>>> Let me summarize the problem again to hear opinions:
> >>>>>>>
> >>>>>>> - Batch SQL users are used to execute SQL files sequentially (from
> >> top
> >>>>>>> to bottom).
> >>>>>>> - Batch SQL users are used to SQL statements being executed
> blocking.
> >>>>>>> One after the other. Esp. when moving around data with INSERT INTO.
> >>>>>>> - Streaming users prefer async execution because unbounded stream
> are
> >>>>>>> more frequent than bounded streams.
> >>>>>>> - We decided to make Flink Table API is async because in a
> >> programming
> >>>>>>> language it is easy to call `.await()` on the result to make it
> >>>> blocking.
> >>>>>>> - INSERT INTO statements in the current SQL Client implementation
> are
> >>>>>>> always submitted asynchrounous.
> >>>>>>> - Other client's such as Ververica platform allow only one INSERT
> >> INTO
> >>>>>>> or a STATEMENT SET at the end of a file that will run
> >> asynchrounously.
> >>>>>>>
> >>>>>>> Questions:
> >>>>>>>
> >>>>>>> - How should we execute statements in CLI and in file? Should there
> >> be
> >>>> a
> >>>>>>> difference?
> >>>>>>> - Should we have different behavior for batch and streaming?
> >>>>>>> - Shall we solve parts with a config option or is it better to make
> >> it
> >>>>>>> explicit in the SQL job definition because it influences the
> >> semantics
> >>>>>>> of multiple INSERT INTOs?
> >>>>>>>
> >>>>>>> Let me summarize my opinion at the moment:
> >>>>>>>
> >>>>>>> - SQL files should always be executed blocking by default. Because
> >> they
> >>>>>>> could potentially contain a long list of INSERT INTO statements.
> This
> >>>>>>> would be SQL standard compliant.
> >>>>>>> - If we allow async execution, we should make this explicit in the
> >> SQL
> >>>>>>> file via `BEGIN ASYNC; ... END;`.
> >>>>>>> - In the CLI, we always execute async to maintain the old behavior.
> >> We
> >>>>>>> can also assume that people are only using the CLI to fire
> statements
> >>>>>>> and close the CLI afterwards.
> >>>>>>>
> >>>>>>> Alternative 1:
> >>>>>>> - We consider batch/streaming mode and block for batch INSERT INTO
> >> and
> >>>>>>> async for streaming INSERT INTO/STATEMENT SET
> >>>>>>>
> >>>>>>> What do others think?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 05.02.21 04:03, Jark Wu wrote:
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> After an offline discussion with Timo and Kurt, we have reached
> some
> >>>>>>>> consensus.
> >>>>>>>> Please correct me if I am wrong or missed anything.
> >>>>>>>>
> >>>>>>>> 1) We will introduce "table.planner" and "table.execution-mode"
> >>>> instead
> >>>>>>> of
> >>>>>>>> "sql-client" prefix,
> >>>>>>>> and add `TableEnvironment.create(Configuration)` interface. These
> 2
> >>>>>>> options
> >>>>>>>> can only be used
> >>>>>>>> for tableEnv initialization. If used after initialization, Flink
> >>>> should
> >>>>>>>> throw an exception. We may can
> >>>>>>>> support dynamic switch the planner in the future.
> >>>>>>>>
> >>>>>>>> 2) We will have only one parser,
> >>>>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a string
> >>>>>>>> statement, and returns a list of Operation. It will first use
> regex
> >> to
> >>>>>>>> match some special statement,
> >>>>>>>>      e.g. SET, ADD JAR, others will be delegated to the underlying
> >>>> Calcite
> >>>>>>>> parser. The Parser can
> >>>>>>>> have different implementations, e.g. HiveParser.
> >>>>>>>>
> >>>>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink
> dialect.
> >>>> But
> >>>>>>> we
> >>>>>>>> can allow
> >>>>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
> >>>>>>>>
> >>>>>>>> 4) We don't have a conclusion for async/sync execution behavior
> yet.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Ingo,
> >>>>>>>>>
> >>>>>>>>> Since we have supported the WITH syntax and SET command since
> v1.9
> >>>>>>> [1][2],
> >>>>>>>>> and
> >>>>>>>>> we have never received such complaints, I think it's fine for
> such
> >>>>>>>>> differences.
> >>>>>>>>>
> >>>>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
> >>>>>> requires
> >>>>>>>>> string literal keys[3],
> >>>>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jark
> >>>>>>>>>
> >>>>>>>>> [1]:
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
> >>>>>>>>> [2]:
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
> >>>>>>>>> [3]:
> >>>>>>>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> >>>>>>>>> [4]:
> >>>>>>>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
> >>>>>>>>> (search "set mapred.reduce.tasks=32")
> >>>>>>>>>
> >>>>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com>
> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> regarding the (un-)quoted question, compatibility is of course
> an
> >>>>>>>>>> important
> >>>>>>>>>> argument, but in terms of consistency I'd find it a bit
> surprising
> >>>>>> that
> >>>>>>>>>> WITH handles it differently than SET, and I wonder if that could
> >>>>>> cause
> >>>>>>>>>> friction for developers when writing their SQL.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Regards
> >>>>>>>>>> Ingo
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding "One Parser", I think it's not possible for now
> because
> >>>>>>>>>> Calcite
> >>>>>>>>>>> parser can't parse
> >>>>>>>>>>> special characters (e.g. "-") unless quoting them as string
> >>>>>> literals.
> >>>>>>>>>>> That's why the WITH option
> >>>>>>>>>>> key are string literals not identifiers.
> >>>>>>>>>>>
> >>>>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
> >>>>>>>>>>> /local/my-home/test.jar
> >>>>>>>>>>> have the same
> >>>>>>>>>>> problems. That's why we propose two parser, one splits lines
> into
> >>>>>>>>>> multiple
> >>>>>>>>>>> statements and match special
> >>>>>>>>>>> command through regex which is light-weight, and delegate other
> >>>>>>>>>> statements
> >>>>>>>>>>> to the other parser which is Calcite parser.
> >>>>>>>>>>>
> >>>>>>>>>>> Note: we should stick on the unquoted SET
> >>>>>>> table.exec.mini-batch.enabled
> >>>>>>>>>> =
> >>>>>>>>>>> true syntax,
> >>>>>>>>>>> both for backward-compatibility and easy-to-use, and all the
> >> other
> >>>>>>>>>> systems
> >>>>>>>>>>> don't have quotes on the key.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding "table.planner" vs "sql-client.planner",
> >>>>>>>>>>> if we want to use "table.planner", I think we should explain
> >>>> clearly
> >>>>>>>>>> what's
> >>>>>>>>>>> the scope it can be used in documentation.
> >>>>>>>>>>> Otherwise, there will be users complaining why the planner
> >> doesn't
> >>>>>>>>>> change
> >>>>>>>>>>> when setting the configuration on TableEnv.
> >>>>>>>>>>> Would be better throwing an exception to indicate users it's
> now
> >>>>>>>>>> allowed to
> >>>>>>>>>>> change planner after TableEnv is initialized.
> >>>>>>>>>>> However, it seems not easy to implement.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Jark
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regarding "table.planner" and "table.execution-mode"
> >>>>>>>>>>>> If we define that those two options are just used to
> initialize
> >>>> the
> >>>>>>>>>>>> TableEnvironment, +1 for introducing table options instead of
> >>>>>>>>>> sql-client
> >>>>>>>>>>>> options.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regarding "the sql client, we will maintain two parsers", I
> want
> >>>> to
> >>>>>>>>>> give
> >>>>>>>>>>>> more inputs:
> >>>>>>>>>>>> We want to introduce sql-gateway into the Flink project (see
> >>>>>> FLIP-24
> >>>>>>> &
> >>>>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI
> >>>>>> client
> >>>>>>>>>> and
> >>>>>>>>>>>> the gateway service will communicate through Rest API. The "
> ADD
> >>>>>> JAR
> >>>>>>>>>>>> /local/path/jar " will be executed in the CLI client machine.
> So
> >>>>>> when
> >>>>>>>>>> we
> >>>>>>>>>>>> submit a sql file which contains multiple statements, the CLI
> >>>>>> client
> >>>>>>>>>>> needs
> >>>>>>>>>>>> to pick out the "ADD JAR" line, and also statements need to be
> >>>>>>>>>> submitted
> >>>>>>>>>>> or
> >>>>>>>>>>>> executed one by one to make sure the result is correct. The
> sql
> >>>>>> file
> >>>>>>>>>> may
> >>>>>>>>>>> be
> >>>>>>>>>>>> look like:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SET xxx=yyy;
> >>>>>>>>>>>> create table my_table ...;
> >>>>>>>>>>>> create table my_sink ...;
> >>>>>>>>>>>> ADD JAR /local/path/jar1;
> >>>>>>>>>>>> create function my_udf as com....MyUdf;
> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>>>>>>>>> REMOVE JAR /local/path/jar1;
> >>>>>>>>>>>> drop function my_udf;
> >>>>>>>>>>>> ADD JAR /local/path/jar2;
> >>>>>>>>>>>> create function my_udf as com....MyUdf2;
> >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>>>>>>>>>
> >>>>>>>>>>>> The lines need to be splitted into multiple statements first
> in
> >>>> the
> >>>>>>>>>> CLI
> >>>>>>>>>>>> client, there are two approaches:
> >>>>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser
> >> splits
> >>>>>>> the
> >>>>>>>>>>>> lines and tells which lines are "ADD JAR".
> >>>>>>>>>>>> pro: there is only one parser
> >>>>>>>>>>>> cons: It's a little heavy that the CLI client depends on the
> >>>>>>>>>> sql-parser,
> >>>>>>>>>>>> because the CLI client is just a simple tool which receives
> the
> >>>>>> user
> >>>>>>>>>>>> commands and displays the result. The non "ADD JAR" command
> will
> >>>> be
> >>>>>>>>>>> parsed
> >>>>>>>>>>>> twice.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. The CLI client splits the lines into multiple statements
> and
> >>>>>> finds
> >>>>>>>>>> the
> >>>>>>>>>>>> ADD JAR command through regex matching.
> >>>>>>>>>>>> pro: The CLI client is very light-weight.
> >>>>>>>>>>>> cons: there are two parsers.
> >>>>>>>>>>>>
> >>>>>>>>>>>> (personally, I prefer the second option)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them
> both.
> >>>>>>>>>>>> For default dialect, we support SHOW JARS, but if we switch to
> >>>> hive
> >>>>>>>>>>>> dialect, LIST JARS is also supported.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>
> >>>>>>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >>>>>>>>>>>> [2]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Godfrey
> >>>>>>>>>>>>
> >>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi guys,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent
> with
> >>>>>> other
> >>>>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion about
> >>>>>> REMOVE
> >>>>>>>>>> vs
> >>>>>>>>>>>>> DELETE though.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I
> >> know,
> >>>>>>>>>> most
> >>>>>>>>>>>>> users who are requesting these features are previously hive
> >>>> users.
> >>>>>>>>>> So I
> >>>>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
> >>>>>> REMOVE/DELETE
> >>>>>>>>>>> JARS
> >>>>>>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT
> >> and
> >>>>>>>>>> QUIT
> >>>>>>>>>>> as
> >>>>>>>>>>>>> the command to terminate the program. So if that's not hard
> to
> >>>>>>>>>> achieve,
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> will make users happier, I don't see a reason why we must
> >> choose
> >>>>>> one
> >>>>>>>>>>> over
> >>>>>>>>>>>>> the other.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <
> >> twalthr@apache.org
> >>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> some feedback regarding the open questions. Maybe we can
> >> discuss
> >>>>>>>>>> the
> >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to
> determine
> >>>> how
> >>>>>>>>>> we
> >>>>>>>>>>>>>> proceed with this in the near future.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1) "whether the table environment has the ability to update
> >>>>>>>>>> itself"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe there was some misunderstanding. I don't think that we
> >>>>>>>>>> should
> >>>>>>>>>>>>>> support
> >>>>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
> >>>>>>>>>>>>>> "old")`. Instead I'm proposing to support
> >>>>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
> >>>>>>>>>> execution
> >>>>>>>>>>>>>> mode are read immediately and a subsequent changes to these
> >>>>>>>>>> options
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>> have no effect. We are doing it similar in `new
> >>>>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
> >>>>>>>>>> ConfigOption's
> >>>>>>>>>>>>>> must not be SQL Client specific but can be part of the core
> >>>> table
> >>>>>>>>>>> code
> >>>>>>>>>>>>>> base. Many users would like to get a 100% preconfigured
> >>>>>>>>>> environment
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>> just Configuration. And this is not possible right now. We
> can
> >>>>>>>>>> solve
> >>>>>>>>>>>>>> both use cases in one change.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I remember we had some discussion about this and decided
> that
> >> we
> >>>>>>>>>>> would
> >>>>>>>>>>>>>> like to maintain only one parser. In the end it is "One
> Flink
> >>>>>> SQL"
> >>>>>>>>>>>> where
> >>>>>>>>>>>>>> commands influence each other also with respect to keywords.
> >> It
> >>>>>>>>>>> should
> >>>>>>>>>>>>>> be fine to include the SQL Client commands in the Flink
> >> parser.
> >>>>>> Of
> >>>>>>>>>>>>>> cource the table environment would not be able to handle the
> >>>>>>>>>>>> `Operation`
> >>>>>>>>>>>>>> instance that would be the result but we can introduce hooks
> >> to
> >>>>>>>>>>> handle
> >>>>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can we skip `table.job.async` in the first version? We
> should
> >>>>>>>>>> further
> >>>>>>>>>>>>>> discuss whether we introduce a special SQL clause for
> wrapping
> >>>>>>>>>> async
> >>>>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming
> >>>> queries
> >>>>>>>>>> we
> >>>>>>>>>>>>>> need to be careful and should force users to either "one
> >> INSERT
> >>>>>>>>>> INTO"
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>> "one STATEMENT SET".
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 3) 4) "HIVE also uses these commands"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In general, Hive is not a good reference. Aligning the
> >> commands
> >>>>>>>>>> more
> >>>>>>>>>>>>>> with the remaining commands should be our goal. We just had
> a
> >>>>>>>>>> MODULE
> >>>>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it is
> >>>> true
> >>>>>>>>>>> that
> >>>>>>>>>>>>>> JARs are not part of the catalog which is why I would not
> use
> >>>>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> >>>>>>>>>>> language.
> >>>>>>>>>>>>>> Take a look at the Java collection API as another example.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 6) "Most of the commands should belong to the table
> >> environment"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
> >>>>>>>>>> understand.
> >>>>>>>>>>> It
> >>>>>>>>>>>>>> is good to see that most commends will be available in
> >>>>>>>>>>>> TableEnvironment.
> >>>>>>>>>>>>>> However, I would also support SET and RESET for consistency.
> >>>>>>>>>> Again,
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>> an architectural point of view, if we would allow some kind
> of
> >>>>>>>>>>>>>> `Operation` hook in table environment, we could check for
> SQL
> >>>>>>>>>> Client
> >>>>>>>>>>>>>> specific options and forward to regular
> >>>>>>>>>>> `TableConfig.getConfiguration`
> >>>>>>>>>>>>>> otherwise. What do you think?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
> >>>>>>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I will respond some of the questions:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1) SQL client specific options
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on
> >> where
> >>>>>>>>>> the
> >>>>>>>>>>>>>>> configuration takes effect.
> >>>>>>>>>>>>>>> If it is a table configuration, we should make clear what's
> >> the
> >>>>>>>>>>>>> behavior
> >>>>>>>>>>>>>>> when users change
> >>>>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
> >>>>>>>>>>>>>> `sql-client.execution.mode`
> >>>>>>>>>>>>>>> are something special
> >>>>>>>>>>>>>>> that can't be changed after TableEnvironment has been
> >>>>>>>>>> initialized.
> >>>>>>>>>>>> You
> >>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> see
> >>>>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method
> >> to
> >>>>>>>>>>>> override
> >>>>>>>>>>>>>>> configuration after
> >>>>>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Therefore, I think it would be better to still use
> >>>>>>>>>>>>> `sql-client.planner`
> >>>>>>>>>>>>>>> and `sql-client.execution.mode`.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2) Execution file
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> >From my point of view, there is a big difference between
> >>>>>>>>>>>>>>> `sql-client.job.detach` and
> >>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
> >>>>>>>>>> `sql-client.job.detach`
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>>> affect every single DML statement
> >>>>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I
> >> think
> >>>>>>>>>> the
> >>>>>>>>>>>>> single
> >>>>>>>>>>>>>>> DML statement in the interactive
> >>>>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
> >>>>>>>>>>>>>>> tEnv#executeMultiSql.
> >>>>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
> >>>>>>>>>>>>> `table.multi-sql-async`.
> >>>>>>>>>>>>>>> I just find that runtime provides a configuration called
> >>>>>>>>>>>>>>> "execution.attached" [1] which is false by default
> >>>>>>>>>>>>>>> which specifies if the pipeline is submitted in attached or
> >>>>>>>>>>> detached
> >>>>>>>>>>>>>> mode.
> >>>>>>>>>>>>>>> It provides exactly the same
> >>>>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you think
> >>>>>>>>>> about
> >>>>>>>>>>>> using
> >>>>>>>>>>>>>>> this option?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If we also want to support this config in
> TableEnvironment, I
> >>>>>>>>>> think
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>>> should also affect the DML execution
> >>>>>>>>>>>>>>>       of `tEnv#executeSql()`, not only DMLs in
> >>>>>>>>>>> `tEnv#executeMultiSql()`.
> >>>>>>>>>>>>>>> Therefore, the behavior may look like this:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==>
> >> async
> >>>>>>>>>> by
> >>>>>>>>>>>>>> default
> >>>>>>>>>>>>>>> tableResult.await()   ==> manually block until finish
> >>>>>>>>>>>>>>>
> >>>>>>>>>>
> >> tEnv.getConfig().getConfiguration().setString("execution.attached",
> >>>>>>>>>>>>>> "true")
> >>>>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==>
> >>>> sync,
> >>>>>>>>>>>> don't
> >>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>> to wait on the TableResult
> >>>>>>>>>>>>>>> tEnv.executeMultiSql(
> >>>>>>>>>>>>>>> """
> >>>>>>>>>>>>>>> CREATE TABLE ....  ==> always sync
> >>>>>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration
> above
> >>>>>>>>>>>>>>> SET execution.attached = false;
> >>>>>>>>>>>>>>> INSERT INTO ...  => async
> >>>>>>>>>>>>>>> """)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
> >>>>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two
> >> separate
> >>>>>>>>>>>> topics,
> >>>>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
> >>>>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
> >>>>>>>>>> statements.
> >>>>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't
> want
> >>>>>>>>>> it to
> >>>>>>>>>>>>> block
> >>>>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <
> >> fskmine@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi, Timo.
> >>>>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts
> >> about
> >>>>>>>>>> your
> >>>>>>>>>>>>>>>> feedback.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the
> >> table
> >>>>>>>>>>>>>> environment
> >>>>>>>>>>>>>>>> has the ability to update itself. Let's take a simple
> >> program
> >>>>>>>>>> as
> >>>>>>>>>>> an
> >>>>>>>>>>>>>>>> example.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
> >>>>>>>>>> "old");
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> tEnv.executeSql("...");
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If we regard this option as a table option, users don't
> have
> >>>> to
> >>>>>>>>>>>> create
> >>>>>>>>>>>>>>>> another table environment manually. In that case, tEnv
> needs
> >>>> to
> >>>>>>>>>>>> check
> >>>>>>>>>>>>>>>> whether the current mode and planner are the same as
> before
> >>>>>>>>>> when
> >>>>>>>>>>>>>> executeSql
> >>>>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the table
> >>>>>>>>>>>> environment,
> >>>>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but
> >> set
> >>>>>>>>>> old
> >>>>>>>>>>>>>> planner
> >>>>>>>>>>>>>>>> and batch mode. But when we make this option as a sql
> client
> >>>>>>>>>>> option,
> >>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>> only use the SET command to change the setting. We can
> >> rebuild
> >>>>>>>>>> a
> >>>>>>>>>>> new
> >>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>> environment when set successes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the
> >> implementation
> >>>>>>>>>>> before
> >>>>>>>>>>>>>>>> continuing this topic. In the sql client, we will maintain
> >> two
> >>>>>>>>>>>>> parsers.
> >>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>> first parser(client parser) will only match the sql client
> >>>>>>>>>>> commands.
> >>>>>>>>>>>>> If
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> client parser can't parse the statement, we will leverage
> >> the
> >>>>>>>>>>> power
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> table environment to execute. According to our blueprint,
> >>>>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
> >>>>>>>>>>> Therefore,
> >>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this
> >>>> FLIP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> But if we need to introduce the
> >>>>>>>>>> `TableEnvironment.executeMultiSql`
> >>>>>>>>>>>> in
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> future, I think it's OK to use the option
> >>>>>>>>>> `table.multi-sql-async`
> >>>>>>>>>>>>> rather
> >>>>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the name
> >> is
> >>>>>>>>>> not
> >>>>>>>>>>>>>> suitable
> >>>>>>>>>>>>>>>> because the name is confusing for others. When setting the
> >>>>>>>>>> option
> >>>>>>>>>>>>>> false, we
> >>>>>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
> >>>>>>>>>>> statement,
> >>>>>>>>>>>>> not
> >>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>> or others(other sql statements are always executed
> >>>>>>>>>> synchronously).
> >>>>>>>>>>>> So
> >>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>> about `table.job.async`? It only works for the sql-client
> >> and
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
> >>>>>>>>>> environment
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> return the result until the job finishes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR
> >> and
> >>>>>>>>>>> LIST
> >>>>>>>>>>>>> JAR
> >>>>>>>>>>>>>>>> because HIVE also uses these commands to add the jar into
> >> the
> >>>>>>>>>>>>> classpath
> >>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce
> our
> >>>>>>>>>> work
> >>>>>>>>>>> for
> >>>>>>>>>>>>>> hive
> >>>>>>>>>>>>>>>> compatibility.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
> >>>>>>>>>>>> maintained
> >>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with
> SQL
> >>>>>>>>>>> grammar,
> >>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>> we should use
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
> >>>>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
> >>>>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
> >>>>>>>>>> consistent.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to
> >> the
> >>>>>>>>>>> table
> >>>>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag
> to
> >>>>>>>>>>>> identify
> >>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>> commands should belong to the sql client and which
> commands
> >>>>>>>>>> should
> >>>>>>>>>>>>>> belong
> >>>>>>>>>>>>>>>> to the table environment. I also add a new section about
> >>>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>> details in the FLIP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give
> the
> >>>>>>>>>> SQL
> >>>>>>>>>>>>> Client
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> very good update and make it production ready.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Here is some feedback from my side:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 1) SQL client specific options
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I don't think that `sql-client.planner` and
> >>>>>>>>>>>>> `sql-client.execution.mode`
> >>>>>>>>>>>>>>>>> are SQL Client specific. Similar to
> >>>>>>>>>> `StreamExecutionEnvironment`
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added
> recently,
> >> we
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we
> >> offer
> >>>>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
> >>>>>>>>>>> `table.planner`
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> `table.execution-mode` to
> >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2) Execution file
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1]
> >> including
> >>>>>>>>>> the
> >>>>>>>>>>>>>> mailing
> >>>>>>>>>>>>>>>>> list thread at that time? Could you further elaborate how
> >> the
> >>>>>>>>>>>>>>>>> multi-statement execution should work for a unified
> >>>>>>>>>>> batch/streaming
> >>>>>>>>>>>>>>>>> story? According to our past discussions, each line in an
> >>>>>>>>>>> execution
> >>>>>>>>>>>>>> file
> >>>>>>>>>>>>>>>>> should be executed blocking which means a streaming query
> >>>>>>>>>> needs a
> >>>>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
> >>>>>>>>>> correct?
> >>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>> should also offer this functionality in
> >>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
> >>>>>>>>>>>> `sql-client.job.detach`
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could also
> >> be
> >>>> a
> >>>>>>>>>>>>> general
> >>>>>>>>>>>>>>>>> `table.multi-sql-async` option?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 3) DELETE JAR
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE"
> >> sounds
> >>>>>>>>>> like
> >>>>>>>>>>>> one
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 4) LIST JAR
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL
> commands
> >>>>>>>>>> such
> >>>>>>>>>>> as
> >>>>>>>>>>>>>> `SHOW
> >>>>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> We should keep the details in sync with
> >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
> >>>> confusion
> >>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
> >>>>>>>>>>> `ESTIMATED_COST`
> >>>>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a
> >> reason
> >>>>>>>>>> why
> >>>>>>>>>>> to
> >>>>>>>>>>>>>> call
> >>>>>>>>>>>>>>>>> it that way.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 6) Implementation details
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> It would be nice to understand how we plan to implement
> the
> >>>>>>>>>> given
> >>>>>>>>>>>>>>>>> features. Most of the commands and config options should
> go
> >>>>>>>>>> into
> >>>>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This
> way
> >>>>>>>>>> users
> >>>>>>>>>>>>> have a
> >>>>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
> >>>>>>>>>> provide a
> >>>>>>>>>>>>>> similar
> >>>>>>>>>>>>>>>>> user experience in notebooks or interactive programs than
> >> the
> >>>>>>>>>> SQL
> >>>>>>>>>>>>>> Client.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
> >>>>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather
> >>>> than
> >>>>>>>>>>>>> `UNSET`.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二
> 下午4:44写道：
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi, Jingsong.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
> >>>>>>>>>> `RESET`
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal
> >> just
> >>>>>>>>>>>> extends
> >>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
> >>>>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
> >>>>>>>>>> default
> >>>>>>>>>>>>>>>>> value[1].
> >>>>>>>>>>>>>>>>>>> I think it is more friendly for batch users.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
> >>>> 下午1:56写道：
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too
> >> outdated.
> >>>>>>>>>> +1
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> improving it.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and
> "UNSET"?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> Jingsong
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> >>>>>>>>>> lirui.fudan@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes
> >> look
> >>>>>>>>>>> good
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> >>>>>>>>>>>> fskmine@gmail.com
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi, Rui.
> >>>>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The main changes:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the
> statement
> >>>>>>>>>> type.
> >>>>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result
> >> of
> >>>>>>>>>>>> queries
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> debug
> >>>>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
> >>>> convenient
> >>>>>>>>>>>>>> comparing
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> # Add a new sql client option
> `sql-client.job.detach`
> >> .
> >>>>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
> >>>>>>>>>> mode.
> >>>>>>>>>>>> Users
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>>>> this option false and the client will process the
> next
> >>>>>>>>>> job
> >>>>>>>>>>>> until
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this
> option
> >>>> is
> >>>>>>>>>>>> false,
> >>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>> means the client will execute the next job when the
> >>>>>>>>>> current
> >>>>>>>>>>>> job
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> submitted.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
> >> 下午4:52写道：
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and
> hive
> >>>>>>>>>> have
> >>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior.
> For
> >>>>>>>>>>>> example,
> >>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what happens
> >> if
> >>>>>>>>>> the
> >>>>>>>>>>>> file
> >>>>>>>>>>>>>>>>>>>>> contains
> >>>>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should
> treat
> >>>>>>>>>> them
> >>>>>>>>>>> as
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>> statement
> >>>>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
> >>>>>>>>>> STATEMENT
> >>>>>>>>>>>> SET
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously
> submit
> >>>> the
> >>>>>>>>>>> two
> >>>>>>>>>>>>>> jobs,
> >>>>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> >>>>>>>>>>>>> fskmine@gmail.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hi Rui,
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
> >>>>>>>>>> suggestions.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to
> strengthen
> >>>>>>>>>> the
> >>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>>> command. In
> >>>>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value
> >>>> into
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate
> the
> >>>>>>>>>> table
> >>>>>>>>>>>>>> config.
> >>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>>> hive
> >>>>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table
> config,
> >>>>>>>>>> users
> >>>>>>>>>>>> are
> >>>>>>>>>>>>>>>> able
> >>>>>>>>>>>>>>>>>>>>> to set
> >>>>>>>>>>>>>>>>>>>>>>>> the hive-related settings.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit
> >> the
> >>>>>>>>>> job
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> exit.
> >>>>>>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the
> job
> >> by
> >>>>>>>>>>>>>>>> themselves,
> >>>>>>>>>>>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In
> most
> >>>>>>>>>> case,
> >>>>>>>>>>>>>> queries
> >>>>>>>>>>>>>>>>>>>>> are used
> >>>>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in
> the
> >>>>>>>>>>>>> interactive
> >>>>>>>>>>>>>>>>>>>>> mode.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
> >>>> 下午3:18写道：
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion.
> I
> >>>>>>>>>> think
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> covers a
> >>>>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
> >>>> improve
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> usability of our
> >>>>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the
> >> FLIP.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
> >>>>>>>>>>>> configurations
> >>>>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
> >>>>>>>>>> configurations
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>> don't have
> >>>>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations
> in
> >>>> SQL
> >>>>>>>>>>>>> Client.
> >>>>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>>> example,
> >>>>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf
> when
> >>>>>>>>>> using
> >>>>>>>>>>>> hive
> >>>>>>>>>>>>>>>>>>>>> connector [1].
> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in
> SQL
> >>>>>>>>>> files
> >>>>>>>>>>>>>>>> specified
> >>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option
> >> but
> >>>>>>>>>>> allows
> >>>>>>>>>>>>>>>>> queries
> >>>>>>>>>>>>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query
> >> and
> >>>>>>>>>>>> redirect
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would
> like
> >> to
> >>>>>>>>>> do
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> same,
> >>>>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>> https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>>>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
> >>>>>>>>>> additional
> >>>>>>>>>>>>>>>>>>>>> suggestions:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in
> ExecutionContext
> >>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and
> >> batch
> >>>>>>>>>> sql.
> >>>>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql
> >> client
> >>>>>>>>>>>> collect
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at
> present,
> >>>>>>>>>>>>>>>>>>>>>>>>>>             which may have memory issues in JM
> or
> >>>> Local
> >>>>>>>>>> for
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> big
> >>>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>>>>>>>>> result.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing
> purpose.
> >>>>>>>>>>>>>>>>>>>>>>>>>>             We may change to use
> SelectTableSink,
> >>>> which
> >>>>>>>>>> is
> >>>>>>>>>>>> based
> >>>>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway
> which
> >>>>>>>>>> is in
> >>>>>>>>>>>>>>>> FLIP-91.
> >>>>>>>>>>>>>>>>>>>>> Seems
> >>>>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long
> >>>> time.
> >>>>>>>>>>>>>>>>>>>>>>>>>>             Provide a long running service out
> of
> >> the
> >>>>>>>>>> box to
> >>>>>>>>>>>>>>>>> facilitate
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> sql
> >>>>>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
> >>>>>>>>>>> 下午8:54写道：
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
> >>>>>>>>>>> FLIP-163:SQL
> >>>>>>>>>>>>>>>> Client
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Improvements.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems
> of
> >>>> the
> >>>>>>>>>>> sql
> >>>>>>>>>>>>>>>> client.
> >>>>>>>>>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table
> >> proposed
> >>>>>>>>>> by
> >>>>>>>>>>>>>> FLIP-95.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
> >>>>>>>>>>> initialize
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> >>>>>>>>>>>> parameter;
> >>>>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
> >>>>>>>>>> FLIP-163[1].
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> *With kind regards
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> ------------------------------------------------------------
> >>>>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese
> Academy
> >>>> of
> >>>>>>>>>>>>> Science
> >>>>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
> >>>> liuyang0704@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Great to hear that. Can someone update the FLIP a final time before we 
start a vote?

We should quickly discuss how we would like to name the config option 
for the async/sync mode. I heared voices internally that are strongly 
against calling it "detach" due to historical reasons with a Flink job 
detach mode. How about `table.dml-async`?

Thanks,
Timo


On 08.02.21 15:55, Jark Wu wrote:
> Thanks Timo,
> 
> I'm +1 for option#2 too.
> 
> I think we have addressed all the concerns and can start a vote.
> 
> Best,
> Jark
> 
> On Mon, 8 Feb 2021 at 22:19, Timo Walther <tw...@apache.org> wrote:
> 
>> Hi Jark,
>>
>> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
>>
>> So let's stick to the config option approach.
>>
>> However, I strongly believe that we should not use the batch/streaming
>> mode for deriving semantics. This discussion is similar to time function
>> discussion. We should not derive sync/async submission behavior from a
>> flag that should only influence runtime operators and the incremental
>> computation. Statements for bounded streams should have the same
>> semantics in batch mode.
>>
>> I think your proposed option 2) is a good tradeoff. For the following
>> reasons:
>>
>> pros:
>> - by default, batch and streaming behave exactly the same
>> - SQL Client CLI behavior does not change compared to 1.12 and remains
>> async for batch and streaming
>> - consistent with the async Table API behavior
>>
>> con:
>> - batch files are not 100% SQL compliant by default
>>
>> The last item might not be an issue since we can expect that users have
>> long-running jobs and prefer async execution in most cases.
>>
>> Regards,
>> Timo
>>
>>
>> On 08.02.21 14:15, Jark Wu wrote:
>>> Hi Timo,
>>>
>>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
>>> Because it makes submitting streaming jobs very verbose, every INSERT
>> INTO
>>> and STATEMENT SET must be wrapped in the ASYNC clause which is
>>> not user-friendly and not backward-compatible.
>>>
>>> I agree we will have unified behavior but this is at the cost of hurting
>>> our main users.
>>> I'm worried that end users can't understand the technical decision, and
>>> they would
>>> feel streaming is harder to use.
>>>
>>> If we want to have an unified behavior, and let users decide what's the
>>> desirable behavior, I prefer to have a config option. A Flink cluster can
>>> be set to async, then
>>> users don't need to wrap every DML in an ASYNC clause. This is the least
>>> intrusive
>>> way to the users.
>>>
>>>
>>> Personally, I'm fine with following options in priority:
>>>
>>> 1) sync for batch DML and async for streaming DML
>>> ==> only breaks batch behavior, but makes both happy
>>>
>>> 2) async for both batch and streaming DML, and can be set to sync via a
>>> configuration.
>>> ==> compatible, and provides flexible configurable behavior
>>>
>>> 3) sync for both batch and streaming DML, and can be
>>>       set to async via a configuration.
>>> ==> +0 for this, because it breaks all the compatibility, esp. our main
>>> users.
>>>
>>> Best,
>>> Jark
>>>
>>> On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi Jark, Hi Rui,
>>>>
>>>> 1) How should we execute statements in CLI and in file? Should there be
>>>> a difference?
>>>> So it seems we have consensus here with unified bahavior. Even though
>>>> this means we are breaking existing batch INSERT INTOs that were
>>>> asynchronous before.
>>>>
>>>> 2) Should we have different behavior for batch and streaming?
>>>> I think also batch users prefer async behavior because usually even
>>>> those pipelines take some time to execute. But we need should stick to
>>>> standard SQL blocking semantics.
>>>>
>>>> What are your opinions on making async explicit in SQL via `BEGIN ASYNC;
>>>> ... END;`? This would allow us to really have unified semantics because
>>>> batch and streaming would behave the same?
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 07.02.21 04:46, Rui Li wrote:
>>>>> Hi Timo,
>>>>>
>>>>> I agree with Jark that we should provide consistent experience
>> regarding
>>>>> SQL CLI and files. Some systems even allow users to execute SQL files
>> in
>>>>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that
>>>> in
>>>>> the future, it's a little tricky to decide whether that should be
>> treated
>>>>> as CLI or file.
>>>>>
>>>>> I actually prefer a config option and let users decide what's the
>>>>> desirable behavior. But if we have agreed not to use options, I'm also
>>>> fine
>>>>> with Alternative #1.
>>>>>
>>>>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
>>>>>
>>>>>> Hi Timo,
>>>>>>
>>>>>> 1) How should we execute statements in CLI and in file? Should there
>> be
>>>> a
>>>>>> difference?
>>>>>> I do think we should unify the behavior of CLI and SQL files. SQL
>> files
>>>> can
>>>>>> be thought of as a shortcut of
>>>>>> "start CLI" => "copy content of SQL files" => "past content in CLI".
>>>>>> Actually, we already did this in kafka_e2e.sql [1].
>>>>>> I think it's hard for users to understand why SQL files behave
>>>> differently
>>>>>> from CLI, all the other systems don't have such a difference.
>>>>>>
>>>>>> If we distinguish SQL files and CLI, should there be a difference in
>>>> JDBC
>>>>>> driver and UI platform?
>>>>>> Personally, they all should have consistent behavior.
>>>>>>
>>>>>> 2) Should we have different behavior for batch and streaming?
>>>>>> I think we all agree streaming users prefer async execution, otherwise
>>>> it's
>>>>>> weird and difficult to use if the
>>>>>> submit script or CLI never exists. On the other hand, batch SQL users
>>>> are
>>>>>> used to SQL statements being
>>>>>> executed blockly.
>>>>>>
>>>>>> Either unified async execution or unified sync execution, will hurt
>> one
>>>>>> side of the streaming
>>>>>> batch users. In order to make both sides happy, I think we can have
>>>>>> different behavior for batch and streaming.
>>>>>> There are many essential differences between batch and stream
>> systems, I
>>>>>> think it's normal to have some
>>>>>> different behaviors, and the behavior doesn't break the unified batch
>>>>>> stream semantics.
>>>>>>
>>>>>>
>>>>>> Thus, I'm +1 to Alternative 1:
>>>>>> We consider batch/streaming mode and block for batch INSERT INTO and
>>>> async
>>>>>> for streaming INSERT INTO/STATEMENT SET.
>>>>>> And this behavior is consistent across CLI and files.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>> [1]:
>>>>>>
>>>>>>
>>>>
>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
>>>>>>
>>>>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org> wrote:
>>>>>>
>>>>>>> Hi Jark,
>>>>>>>
>>>>>>> thanks for the summary. I hope we can also find a good long-term
>>>>>>> solution on the async/sync execution behavior topic.
>>>>>>>
>>>>>>> It should be discussed in a bigger round because it is (similar to
>> the
>>>>>>> time function discussion) related to batch-streaming unification
>> where
>>>>>>> we should stick to the SQL standard to some degree but also need to
>>>> come
>>>>>>> up with good streaming semantics.
>>>>>>>
>>>>>>> Let me summarize the problem again to hear opinions:
>>>>>>>
>>>>>>> - Batch SQL users are used to execute SQL files sequentially (from
>> top
>>>>>>> to bottom).
>>>>>>> - Batch SQL users are used to SQL statements being executed blocking.
>>>>>>> One after the other. Esp. when moving around data with INSERT INTO.
>>>>>>> - Streaming users prefer async execution because unbounded stream are
>>>>>>> more frequent than bounded streams.
>>>>>>> - We decided to make Flink Table API is async because in a
>> programming
>>>>>>> language it is easy to call `.await()` on the result to make it
>>>> blocking.
>>>>>>> - INSERT INTO statements in the current SQL Client implementation are
>>>>>>> always submitted asynchrounous.
>>>>>>> - Other client's such as Ververica platform allow only one INSERT
>> INTO
>>>>>>> or a STATEMENT SET at the end of a file that will run
>> asynchrounously.
>>>>>>>
>>>>>>> Questions:
>>>>>>>
>>>>>>> - How should we execute statements in CLI and in file? Should there
>> be
>>>> a
>>>>>>> difference?
>>>>>>> - Should we have different behavior for batch and streaming?
>>>>>>> - Shall we solve parts with a config option or is it better to make
>> it
>>>>>>> explicit in the SQL job definition because it influences the
>> semantics
>>>>>>> of multiple INSERT INTOs?
>>>>>>>
>>>>>>> Let me summarize my opinion at the moment:
>>>>>>>
>>>>>>> - SQL files should always be executed blocking by default. Because
>> they
>>>>>>> could potentially contain a long list of INSERT INTO statements. This
>>>>>>> would be SQL standard compliant.
>>>>>>> - If we allow async execution, we should make this explicit in the
>> SQL
>>>>>>> file via `BEGIN ASYNC; ... END;`.
>>>>>>> - In the CLI, we always execute async to maintain the old behavior.
>> We
>>>>>>> can also assume that people are only using the CLI to fire statements
>>>>>>> and close the CLI afterwards.
>>>>>>>
>>>>>>> Alternative 1:
>>>>>>> - We consider batch/streaming mode and block for batch INSERT INTO
>> and
>>>>>>> async for streaming INSERT INTO/STATEMENT SET
>>>>>>>
>>>>>>> What do others think?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05.02.21 04:03, Jark Wu wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> After an offline discussion with Timo and Kurt, we have reached some
>>>>>>>> consensus.
>>>>>>>> Please correct me if I am wrong or missed anything.
>>>>>>>>
>>>>>>>> 1) We will introduce "table.planner" and "table.execution-mode"
>>>> instead
>>>>>>> of
>>>>>>>> "sql-client" prefix,
>>>>>>>> and add `TableEnvironment.create(Configuration)` interface. These 2
>>>>>>> options
>>>>>>>> can only be used
>>>>>>>> for tableEnv initialization. If used after initialization, Flink
>>>> should
>>>>>>>> throw an exception. We may can
>>>>>>>> support dynamic switch the planner in the future.
>>>>>>>>
>>>>>>>> 2) We will have only one parser,
>>>>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a string
>>>>>>>> statement, and returns a list of Operation. It will first use regex
>> to
>>>>>>>> match some special statement,
>>>>>>>>      e.g. SET, ADD JAR, others will be delegated to the underlying
>>>> Calcite
>>>>>>>> parser. The Parser can
>>>>>>>> have different implementations, e.g. HiveParser.
>>>>>>>>
>>>>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect.
>>>> But
>>>>>>> we
>>>>>>>> can allow
>>>>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
>>>>>>>>
>>>>>>>> 4) We don't have a conclusion for async/sync execution behavior yet.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jark
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Ingo,
>>>>>>>>>
>>>>>>>>> Since we have supported the WITH syntax and SET command since v1.9
>>>>>>> [1][2],
>>>>>>>>> and
>>>>>>>>> we have never received such complaints, I think it's fine for such
>>>>>>>>> differences.
>>>>>>>>>
>>>>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
>>>>>> requires
>>>>>>>>> string literal keys[3],
>>>>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jark
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
>>>>>>>>> [2]:
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
>>>>>>>>> [3]:
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>>>>>>>>> [4]:
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
>>>>>>>>> (search "set mapred.reduce.tasks=32")
>>>>>>>>>
>>>>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> regarding the (un-)quoted question, compatibility is of course an
>>>>>>>>>> important
>>>>>>>>>> argument, but in terms of consistency I'd find it a bit surprising
>>>>>> that
>>>>>>>>>> WITH handles it differently than SET, and I wonder if that could
>>>>>> cause
>>>>>>>>>> friction for developers when writing their SQL.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Ingo
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Regarding "One Parser", I think it's not possible for now because
>>>>>>>>>> Calcite
>>>>>>>>>>> parser can't parse
>>>>>>>>>>> special characters (e.g. "-") unless quoting them as string
>>>>>> literals.
>>>>>>>>>>> That's why the WITH option
>>>>>>>>>>> key are string literals not identifiers.
>>>>>>>>>>>
>>>>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
>>>>>>>>>>> /local/my-home/test.jar
>>>>>>>>>>> have the same
>>>>>>>>>>> problems. That's why we propose two parser, one splits lines into
>>>>>>>>>> multiple
>>>>>>>>>>> statements and match special
>>>>>>>>>>> command through regex which is light-weight, and delegate other
>>>>>>>>>> statements
>>>>>>>>>>> to the other parser which is Calcite parser.
>>>>>>>>>>>
>>>>>>>>>>> Note: we should stick on the unquoted SET
>>>>>>> table.exec.mini-batch.enabled
>>>>>>>>>> =
>>>>>>>>>>> true syntax,
>>>>>>>>>>> both for backward-compatibility and easy-to-use, and all the
>> other
>>>>>>>>>> systems
>>>>>>>>>>> don't have quotes on the key.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regarding "table.planner" vs "sql-client.planner",
>>>>>>>>>>> if we want to use "table.planner", I think we should explain
>>>> clearly
>>>>>>>>>> what's
>>>>>>>>>>> the scope it can be used in documentation.
>>>>>>>>>>> Otherwise, there will be users complaining why the planner
>> doesn't
>>>>>>>>>> change
>>>>>>>>>>> when setting the configuration on TableEnv.
>>>>>>>>>>> Would be better throwing an exception to indicate users it's now
>>>>>>>>>> allowed to
>>>>>>>>>>> change planner after TableEnv is initialized.
>>>>>>>>>>> However, it seems not easy to implement.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding "table.planner" and "table.execution-mode"
>>>>>>>>>>>> If we define that those two options are just used to initialize
>>>> the
>>>>>>>>>>>> TableEnvironment, +1 for introducing table options instead of
>>>>>>>>>> sql-client
>>>>>>>>>>>> options.
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding "the sql client, we will maintain two parsers", I want
>>>> to
>>>>>>>>>> give
>>>>>>>>>>>> more inputs:
>>>>>>>>>>>> We want to introduce sql-gateway into the Flink project (see
>>>>>> FLIP-24
>>>>>>> &
>>>>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI
>>>>>> client
>>>>>>>>>> and
>>>>>>>>>>>> the gateway service will communicate through Rest API. The " ADD
>>>>>> JAR
>>>>>>>>>>>> /local/path/jar " will be executed in the CLI client machine. So
>>>>>> when
>>>>>>>>>> we
>>>>>>>>>>>> submit a sql file which contains multiple statements, the CLI
>>>>>> client
>>>>>>>>>>> needs
>>>>>>>>>>>> to pick out the "ADD JAR" line, and also statements need to be
>>>>>>>>>> submitted
>>>>>>>>>>> or
>>>>>>>>>>>> executed one by one to make sure the result is correct. The sql
>>>>>> file
>>>>>>>>>> may
>>>>>>>>>>> be
>>>>>>>>>>>> look like:
>>>>>>>>>>>>
>>>>>>>>>>>> SET xxx=yyy;
>>>>>>>>>>>> create table my_table ...;
>>>>>>>>>>>> create table my_sink ...;
>>>>>>>>>>>> ADD JAR /local/path/jar1;
>>>>>>>>>>>> create function my_udf as com....MyUdf;
>>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>>>>>>>>> REMOVE JAR /local/path/jar1;
>>>>>>>>>>>> drop function my_udf;
>>>>>>>>>>>> ADD JAR /local/path/jar2;
>>>>>>>>>>>> create function my_udf as com....MyUdf2;
>>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>>>>>>>>>
>>>>>>>>>>>> The lines need to be splitted into multiple statements first in
>>>> the
>>>>>>>>>> CLI
>>>>>>>>>>>> client, there are two approaches:
>>>>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser
>> splits
>>>>>>> the
>>>>>>>>>>>> lines and tells which lines are "ADD JAR".
>>>>>>>>>>>> pro: there is only one parser
>>>>>>>>>>>> cons: It's a little heavy that the CLI client depends on the
>>>>>>>>>> sql-parser,
>>>>>>>>>>>> because the CLI client is just a simple tool which receives the
>>>>>> user
>>>>>>>>>>>> commands and displays the result. The non "ADD JAR" command will
>>>> be
>>>>>>>>>>> parsed
>>>>>>>>>>>> twice.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. The CLI client splits the lines into multiple statements and
>>>>>> finds
>>>>>>>>>> the
>>>>>>>>>>>> ADD JAR command through regex matching.
>>>>>>>>>>>> pro: The CLI client is very light-weight.
>>>>>>>>>>>> cons: there are two parsers.
>>>>>>>>>>>>
>>>>>>>>>>>> (personally, I prefer the second option)
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
>>>>>>>>>>>> For default dialect, we support SHOW JARS, but if we switch to
>>>> hive
>>>>>>>>>>>> dialect, LIST JARS is also supported.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>>>>>>>>>>>> [2]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Godfrey
>>>>>>>>>>>>
>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with
>>>>>> other
>>>>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion about
>>>>>> REMOVE
>>>>>>>>>> vs
>>>>>>>>>>>>> DELETE though.
>>>>>>>>>>>>>
>>>>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I
>> know,
>>>>>>>>>> most
>>>>>>>>>>>>> users who are requesting these features are previously hive
>>>> users.
>>>>>>>>>> So I
>>>>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
>>>>>> REMOVE/DELETE
>>>>>>>>>>> JARS
>>>>>>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT
>> and
>>>>>>>>>> QUIT
>>>>>>>>>>> as
>>>>>>>>>>>>> the command to terminate the program. So if that's not hard to
>>>>>>>>>> achieve,
>>>>>>>>>>>> and
>>>>>>>>>>>>> will make users happier, I don't see a reason why we must
>> choose
>>>>>> one
>>>>>>>>>>> over
>>>>>>>>>>>>> the other.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <
>> twalthr@apache.org
>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> some feedback regarding the open questions. Maybe we can
>> discuss
>>>>>>>>>> the
>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to determine
>>>> how
>>>>>>>>>> we
>>>>>>>>>>>>>> proceed with this in the near future.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) "whether the table environment has the ability to update
>>>>>>>>>> itself"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe there was some misunderstanding. I don't think that we
>>>>>>>>>> should
>>>>>>>>>>>>>> support
>>>>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>>>>>>>>>>>>>> "old")`. Instead I'm proposing to support
>>>>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
>>>>>>>>>> execution
>>>>>>>>>>>>>> mode are read immediately and a subsequent changes to these
>>>>>>>>>> options
>>>>>>>>>>>> will
>>>>>>>>>>>>>> have no effect. We are doing it similar in `new
>>>>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
>>>>>>>>>> ConfigOption's
>>>>>>>>>>>>>> must not be SQL Client specific but can be part of the core
>>>> table
>>>>>>>>>>> code
>>>>>>>>>>>>>> base. Many users would like to get a 100% preconfigured
>>>>>>>>>> environment
>>>>>>>>>>>> from
>>>>>>>>>>>>>> just Configuration. And this is not possible right now. We can
>>>>>>>>>> solve
>>>>>>>>>>>>>> both use cases in one change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I remember we had some discussion about this and decided that
>> we
>>>>>>>>>>> would
>>>>>>>>>>>>>> like to maintain only one parser. In the end it is "One Flink
>>>>>> SQL"
>>>>>>>>>>>> where
>>>>>>>>>>>>>> commands influence each other also with respect to keywords.
>> It
>>>>>>>>>>> should
>>>>>>>>>>>>>> be fine to include the SQL Client commands in the Flink
>> parser.
>>>>>> Of
>>>>>>>>>>>>>> cource the table environment would not be able to handle the
>>>>>>>>>>>> `Operation`
>>>>>>>>>>>>>> instance that would be the result but we can introduce hooks
>> to
>>>>>>>>>>> handle
>>>>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can we skip `table.job.async` in the first version? We should
>>>>>>>>>> further
>>>>>>>>>>>>>> discuss whether we introduce a special SQL clause for wrapping
>>>>>>>>>> async
>>>>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming
>>>> queries
>>>>>>>>>> we
>>>>>>>>>>>>>> need to be careful and should force users to either "one
>> INSERT
>>>>>>>>>> INTO"
>>>>>>>>>>>> or
>>>>>>>>>>>>>> "one STATEMENT SET".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) 4) "HIVE also uses these commands"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In general, Hive is not a good reference. Aligning the
>> commands
>>>>>>>>>> more
>>>>>>>>>>>>>> with the remaining commands should be our goal. We just had a
>>>>>>>>>> MODULE
>>>>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it is
>>>> true
>>>>>>>>>>> that
>>>>>>>>>>>>>> JARs are not part of the catalog which is why I would not use
>>>>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
>>>>>>>>>>> language.
>>>>>>>>>>>>>> Take a look at the Java collection API as another example.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 6) "Most of the commands should belong to the table
>> environment"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
>>>>>>>>>> understand.
>>>>>>>>>>> It
>>>>>>>>>>>>>> is good to see that most commends will be available in
>>>>>>>>>>>> TableEnvironment.
>>>>>>>>>>>>>> However, I would also support SET and RESET for consistency.
>>>>>>>>>> Again,
>>>>>>>>>>>> from
>>>>>>>>>>>>>> an architectural point of view, if we would allow some kind of
>>>>>>>>>>>>>> `Operation` hook in table environment, we could check for SQL
>>>>>>>>>> Client
>>>>>>>>>>>>>> specific options and forward to regular
>>>>>>>>>>> `TableConfig.getConfiguration`
>>>>>>>>>>>>>> otherwise. What do you think?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
>>>>>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will respond some of the questions:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) SQL client specific options
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on
>> where
>>>>>>>>>> the
>>>>>>>>>>>>>>> configuration takes effect.
>>>>>>>>>>>>>>> If it is a table configuration, we should make clear what's
>> the
>>>>>>>>>>>>> behavior
>>>>>>>>>>>>>>> when users change
>>>>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
>>>>>>>>>>>>>> `sql-client.execution.mode`
>>>>>>>>>>>>>>> are something special
>>>>>>>>>>>>>>> that can't be changed after TableEnvironment has been
>>>>>>>>>> initialized.
>>>>>>>>>>>> You
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method
>> to
>>>>>>>>>>>> override
>>>>>>>>>>>>>>> configuration after
>>>>>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Therefore, I think it would be better to still use
>>>>>>>>>>>>> `sql-client.planner`
>>>>>>>>>>>>>>> and `sql-client.execution.mode`.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Execution file
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >From my point of view, there is a big difference between
>>>>>>>>>>>>>>> `sql-client.job.detach` and
>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
>>>>>>>>>> `sql-client.job.detach`
>>>>>>>>>>>> will
>>>>>>>>>>>>>>> affect every single DML statement
>>>>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I
>> think
>>>>>>>>>> the
>>>>>>>>>>>>> single
>>>>>>>>>>>>>>> DML statement in the interactive
>>>>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
>>>>>>>>>>>>>>> tEnv#executeMultiSql.
>>>>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
>>>>>>>>>>>>> `table.multi-sql-async`.
>>>>>>>>>>>>>>> I just find that runtime provides a configuration called
>>>>>>>>>>>>>>> "execution.attached" [1] which is false by default
>>>>>>>>>>>>>>> which specifies if the pipeline is submitted in attached or
>>>>>>>>>>> detached
>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>> It provides exactly the same
>>>>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you think
>>>>>>>>>> about
>>>>>>>>>>>> using
>>>>>>>>>>>>>>> this option?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we also want to support this config in TableEnvironment, I
>>>>>>>>>> think
>>>>>>>>>>>> it
>>>>>>>>>>>>>>> should also affect the DML execution
>>>>>>>>>>>>>>>       of `tEnv#executeSql()`, not only DMLs in
>>>>>>>>>>> `tEnv#executeMultiSql()`.
>>>>>>>>>>>>>>> Therefore, the behavior may look like this:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==>
>> async
>>>>>>>>>> by
>>>>>>>>>>>>>> default
>>>>>>>>>>>>>>> tableResult.await()   ==> manually block until finish
>>>>>>>>>>>>>>>
>>>>>>>>>>
>> tEnv.getConfig().getConfiguration().setString("execution.attached",
>>>>>>>>>>>>>> "true")
>>>>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==>
>>>> sync,
>>>>>>>>>>>> don't
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>> to wait on the TableResult
>>>>>>>>>>>>>>> tEnv.executeMultiSql(
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>> CREATE TABLE ....  ==> always sync
>>>>>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration above
>>>>>>>>>>>>>>> SET execution.attached = false;
>>>>>>>>>>>>>>> INSERT INTO ...  => async
>>>>>>>>>>>>>>> """)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
>>>>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two
>> separate
>>>>>>>>>>>> topics,
>>>>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
>>>>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
>>>>>>>>>> statements.
>>>>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
>>>>>>>>>> it to
>>>>>>>>>>>>> block
>>>>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <
>> fskmine@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi, Timo.
>>>>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts
>> about
>>>>>>>>>> your
>>>>>>>>>>>>>>>> feedback.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the
>> table
>>>>>>>>>>>>>> environment
>>>>>>>>>>>>>>>> has the ability to update itself. Let's take a simple
>> program
>>>>>>>>>> as
>>>>>>>>>>> an
>>>>>>>>>>>>>>>> example.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
>>>>>>>>>> "old");
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> tEnv.executeSql("...");
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we regard this option as a table option, users don't have
>>>> to
>>>>>>>>>>>> create
>>>>>>>>>>>>>>>> another table environment manually. In that case, tEnv needs
>>>> to
>>>>>>>>>>>> check
>>>>>>>>>>>>>>>> whether the current mode and planner are the same as before
>>>>>>>>>> when
>>>>>>>>>>>>>> executeSql
>>>>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the table
>>>>>>>>>>>> environment,
>>>>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but
>> set
>>>>>>>>>> old
>>>>>>>>>>>>>> planner
>>>>>>>>>>>>>>>> and batch mode. But when we make this option as a sql client
>>>>>>>>>>> option,
>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>>> only use the SET command to change the setting. We can
>> rebuild
>>>>>>>>>> a
>>>>>>>>>>> new
>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>> environment when set successes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the
>> implementation
>>>>>>>>>>> before
>>>>>>>>>>>>>>>> continuing this topic. In the sql client, we will maintain
>> two
>>>>>>>>>>>>> parsers.
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>> first parser(client parser) will only match the sql client
>>>>>>>>>>> commands.
>>>>>>>>>>>>> If
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> client parser can't parse the statement, we will leverage
>> the
>>>>>>>>>>> power
>>>>>>>>>>>> of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> table environment to execute. According to our blueprint,
>>>>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
>>>>>>>>>>> Therefore,
>>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this
>>>> FLIP.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But if we need to introduce the
>>>>>>>>>> `TableEnvironment.executeMultiSql`
>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> future, I think it's OK to use the option
>>>>>>>>>> `table.multi-sql-async`
>>>>>>>>>>>>> rather
>>>>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the name
>> is
>>>>>>>>>> not
>>>>>>>>>>>>>> suitable
>>>>>>>>>>>>>>>> because the name is confusing for others. When setting the
>>>>>>>>>> option
>>>>>>>>>>>>>> false, we
>>>>>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
>>>>>>>>>>> statement,
>>>>>>>>>>>>> not
>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>> or others(other sql statements are always executed
>>>>>>>>>> synchronously).
>>>>>>>>>>>> So
>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>> about `table.job.async`? It only works for the sql-client
>> and
>>>>>>>>>> the
>>>>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
>>>>>>>>>> environment
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> return the result until the job finishes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR
>> and
>>>>>>>>>>> LIST
>>>>>>>>>>>>> JAR
>>>>>>>>>>>>>>>> because HIVE also uses these commands to add the jar into
>> the
>>>>>>>>>>>>> classpath
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce our
>>>>>>>>>> work
>>>>>>>>>>> for
>>>>>>>>>>>>>> hive
>>>>>>>>>>>>>>>> compatibility.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
>>>>>>>>>>>> maintained
>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
>>>>>>>>>>> grammar,
>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>> we should use
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
>>>>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
>>>>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
>>>>>>>>>> consistent.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to
>> the
>>>>>>>>>>> table
>>>>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
>>>>>>>>>>>> identify
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>> commands should belong to the sql client and which commands
>>>>>>>>>> should
>>>>>>>>>>>>>> belong
>>>>>>>>>>>>>>>> to the table environment. I also add a new section about
>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>> details in the FLIP.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
>>>>>>>>>> SQL
>>>>>>>>>>>>> Client
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> very good update and make it production ready.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is some feedback from my side:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) SQL client specific options
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't think that `sql-client.planner` and
>>>>>>>>>>>>> `sql-client.execution.mode`
>>>>>>>>>>>>>>>>> are SQL Client specific. Similar to
>>>>>>>>>> `StreamExecutionEnvironment`
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added recently,
>> we
>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we
>> offer
>>>>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
>>>>>>>>>>> `table.planner`
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> `table.execution-mode` to
>>>>>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) Execution file
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1]
>> including
>>>>>>>>>> the
>>>>>>>>>>>>>> mailing
>>>>>>>>>>>>>>>>> list thread at that time? Could you further elaborate how
>> the
>>>>>>>>>>>>>>>>> multi-statement execution should work for a unified
>>>>>>>>>>> batch/streaming
>>>>>>>>>>>>>>>>> story? According to our past discussions, each line in an
>>>>>>>>>>> execution
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>>>> should be executed blocking which means a streaming query
>>>>>>>>>> needs a
>>>>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
>>>>>>>>>> correct?
>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>> should also offer this functionality in
>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
>>>>>>>>>>>> `sql-client.job.detach`
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could also
>> be
>>>> a
>>>>>>>>>>>>> general
>>>>>>>>>>>>>>>>> `table.multi-sql-async` option?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 3) DELETE JAR
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE"
>> sounds
>>>>>>>>>> like
>>>>>>>>>>>> one
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 4) LIST JAR
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
>>>>>>>>>> such
>>>>>>>>>>> as
>>>>>>>>>>>>>> `SHOW
>>>>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We should keep the details in sync with
>>>>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
>>>> confusion
>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
>>>>>>>>>>> `ESTIMATED_COST`
>>>>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a
>> reason
>>>>>>>>>> why
>>>>>>>>>>> to
>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>>> it that way.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 6) Implementation details
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It would be nice to understand how we plan to implement the
>>>>>>>>>> given
>>>>>>>>>>>>>>>>> features. Most of the commands and config options should go
>>>>>>>>>> into
>>>>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
>>>>>>>>>> users
>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
>>>>>>>>>> provide a
>>>>>>>>>>>>>> similar
>>>>>>>>>>>>>>>>> user experience in notebooks or interactive programs than
>> the
>>>>>>>>>> SQL
>>>>>>>>>>>>>> Client.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>>>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather
>>>> than
>>>>>>>>>>>>> `UNSET`.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi, Jingsong.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
>>>>>>>>>> `RESET`
>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal
>> just
>>>>>>>>>>>> extends
>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
>>>>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
>>>>>>>>>> default
>>>>>>>>>>>>>>>>> value[1].
>>>>>>>>>>>>>>>>>>> I think it is more friendly for batch users.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
>>>> 下午1:56写道：
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too
>> outdated.
>>>>>>>>>> +1
>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> improving it.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Jingsong
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>>>>>>>>>> lirui.fudan@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes
>> look
>>>>>>>>>>> good
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>>>>>>>>>>>> fskmine@gmail.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi, Rui.
>>>>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The main changes:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
>>>>>>>>>> type.
>>>>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result
>> of
>>>>>>>>>>>> queries
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> debug
>>>>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
>>>> convenient
>>>>>>>>>>>>>> comparing
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach`
>> .
>>>>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
>>>>>>>>>> mode.
>>>>>>>>>>>> Users
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>> this option false and the client will process the next
>>>>>>>>>> job
>>>>>>>>>>>> until
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this option
>>>> is
>>>>>>>>>>>> false,
>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>> means the client will execute the next job when the
>>>>>>>>>> current
>>>>>>>>>>>> job
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> submitted.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>> 下午4:52写道：
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
>>>>>>>>>> have
>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
>>>>>>>>>>>> example,
>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what happens
>> if
>>>>>>>>>> the
>>>>>>>>>>>> file
>>>>>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
>>>>>>>>>> them
>>>>>>>>>>> as
>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> statement
>>>>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
>>>>>>>>>> STATEMENT
>>>>>>>>>>>> SET
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit
>>>> the
>>>>>>>>>>> two
>>>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>>>>>>>>>>>>> fskmine@gmail.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Rui,
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
>>>>>>>>>> suggestions.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
>>>>>>>>>> the
>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>> command. In
>>>>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value
>>>> into
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
>>>>>>>>>> table
>>>>>>>>>>>>>> config.
>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>>> hive
>>>>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table config,
>>>>>>>>>> users
>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>>>>>> to set
>>>>>>>>>>>>>>>>>>>>>>>> the hive-related settings.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit
>> the
>>>>>>>>>> job
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> exit.
>>>>>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job
>> by
>>>>>>>>>>>>>>>> themselves,
>>>>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
>>>>>>>>>> case,
>>>>>>>>>>>>>> queries
>>>>>>>>>>>>>>>>>>>>> are used
>>>>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
>>>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>>>> 下午3:18写道：
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
>>>>>>>>>> think
>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> covers a
>>>>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
>>>> improve
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> usability of our
>>>>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the
>> FLIP.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
>>>>>>>>>>>> configurations
>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
>>>>>>>>>> configurations
>>>>>>>>>>>> and
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> don't have
>>>>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in
>>>> SQL
>>>>>>>>>>>>> Client.
>>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
>>>>>>>>>> using
>>>>>>>>>>>> hive
>>>>>>>>>>>>>>>>>>>>> connector [1].
>>>>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
>>>>>>>>>> files
>>>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option
>> but
>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>> queries
>>>>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query
>> and
>>>>>>>>>>>> redirect
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like
>> to
>>>>>>>>>> do
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same,
>>>>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>>>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>>> suggestions:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext
>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and
>> batch
>>>>>>>>>> sql.
>>>>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql
>> client
>>>>>>>>>>>> collect
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>>>>>>>>>>>>>>>>>>             which may have memory issues in JM or
>>>> Local
>>>>>>>>>> for
>>>>>>>>>>>> the
>>>>>>>>>>>>>> big
>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>>>>>>> result.
>>>>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>>>>>>>>>>>>>>>>>>             We may change to use SelectTableSink,
>>>> which
>>>>>>>>>> is
>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
>>>>>>>>>> is in
>>>>>>>>>>>>>>>> FLIP-91.
>>>>>>>>>>>>>>>>>>>>> Seems
>>>>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long
>>>> time.
>>>>>>>>>>>>>>>>>>>>>>>>>>             Provide a long running service out of
>> the
>>>>>>>>>> box to
>>>>>>>>>>>>>>>>> facilitate
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> sql
>>>>>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
>>>>>>>>>>> 下午8:54写道：
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
>>>>>>>>>>> FLIP-163:SQL
>>>>>>>>>>>>>>>> Client
>>>>>>>>>>>>>>>>>>>>>>>>>>> Improvements.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of
>>>> the
>>>>>>>>>>> sql
>>>>>>>>>>>>>>>> client.
>>>>>>>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table
>> proposed
>>>>>>>>>> by
>>>>>>>>>>>>>> FLIP-95.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
>>>>>>>>>>> initialize
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
>>>>>>>>>>>> parameter;
>>>>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
>>>>>>>>>> FLIP-163[1].
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> *With kind regards
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy
>>>> of
>>>>>>>>>>>>> Science
>>>>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
>>>> liuyang0704@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Thanks Timo,

I'm +1 for option#2 too.

I think we have addressed all the concerns and can start a vote.

Best,
Jark

On Mon, 8 Feb 2021 at 22:19, Timo Walther <tw...@apache.org> wrote:

> Hi Jark,
>
> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
>
> So let's stick to the config option approach.
>
> However, I strongly believe that we should not use the batch/streaming
> mode for deriving semantics. This discussion is similar to time function
> discussion. We should not derive sync/async submission behavior from a
> flag that should only influence runtime operators and the incremental
> computation. Statements for bounded streams should have the same
> semantics in batch mode.
>
> I think your proposed option 2) is a good tradeoff. For the following
> reasons:
>
> pros:
> - by default, batch and streaming behave exactly the same
> - SQL Client CLI behavior does not change compared to 1.12 and remains
> async for batch and streaming
> - consistent with the async Table API behavior
>
> con:
> - batch files are not 100% SQL compliant by default
>
> The last item might not be an issue since we can expect that users have
> long-running jobs and prefer async execution in most cases.
>
> Regards,
> Timo
>
>
> On 08.02.21 14:15, Jark Wu wrote:
> > Hi Timo,
> >
> > Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
> > Because it makes submitting streaming jobs very verbose, every INSERT
> INTO
> > and STATEMENT SET must be wrapped in the ASYNC clause which is
> > not user-friendly and not backward-compatible.
> >
> > I agree we will have unified behavior but this is at the cost of hurting
> > our main users.
> > I'm worried that end users can't understand the technical decision, and
> > they would
> > feel streaming is harder to use.
> >
> > If we want to have an unified behavior, and let users decide what's the
> > desirable behavior, I prefer to have a config option. A Flink cluster can
> > be set to async, then
> > users don't need to wrap every DML in an ASYNC clause. This is the least
> > intrusive
> > way to the users.
> >
> >
> > Personally, I'm fine with following options in priority:
> >
> > 1) sync for batch DML and async for streaming DML
> > ==> only breaks batch behavior, but makes both happy
> >
> > 2) async for both batch and streaming DML, and can be set to sync via a
> > configuration.
> > ==> compatible, and provides flexible configurable behavior
> >
> > 3) sync for both batch and streaming DML, and can be
> >      set to async via a configuration.
> > ==> +0 for this, because it breaks all the compatibility, esp. our main
> > users.
> >
> > Best,
> > Jark
> >
> > On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org> wrote:
> >
> >> Hi Jark, Hi Rui,
> >>
> >> 1) How should we execute statements in CLI and in file? Should there be
> >> a difference?
> >> So it seems we have consensus here with unified bahavior. Even though
> >> this means we are breaking existing batch INSERT INTOs that were
> >> asynchronous before.
> >>
> >> 2) Should we have different behavior for batch and streaming?
> >> I think also batch users prefer async behavior because usually even
> >> those pipelines take some time to execute. But we need should stick to
> >> standard SQL blocking semantics.
> >>
> >> What are your opinions on making async explicit in SQL via `BEGIN ASYNC;
> >> ... END;`? This would allow us to really have unified semantics because
> >> batch and streaming would behave the same?
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 07.02.21 04:46, Rui Li wrote:
> >>> Hi Timo,
> >>>
> >>> I agree with Jark that we should provide consistent experience
> regarding
> >>> SQL CLI and files. Some systems even allow users to execute SQL files
> in
> >>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that
> >> in
> >>> the future, it's a little tricky to decide whether that should be
> treated
> >>> as CLI or file.
> >>>
> >>> I actually prefer a config option and let users decide what's the
> >>> desirable behavior. But if we have agreed not to use options, I'm also
> >> fine
> >>> with Alternative #1.
> >>>
> >>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
> >>>
> >>>> Hi Timo,
> >>>>
> >>>> 1) How should we execute statements in CLI and in file? Should there
> be
> >> a
> >>>> difference?
> >>>> I do think we should unify the behavior of CLI and SQL files. SQL
> files
> >> can
> >>>> be thought of as a shortcut of
> >>>> "start CLI" => "copy content of SQL files" => "past content in CLI".
> >>>> Actually, we already did this in kafka_e2e.sql [1].
> >>>> I think it's hard for users to understand why SQL files behave
> >> differently
> >>>> from CLI, all the other systems don't have such a difference.
> >>>>
> >>>> If we distinguish SQL files and CLI, should there be a difference in
> >> JDBC
> >>>> driver and UI platform?
> >>>> Personally, they all should have consistent behavior.
> >>>>
> >>>> 2) Should we have different behavior for batch and streaming?
> >>>> I think we all agree streaming users prefer async execution, otherwise
> >> it's
> >>>> weird and difficult to use if the
> >>>> submit script or CLI never exists. On the other hand, batch SQL users
> >> are
> >>>> used to SQL statements being
> >>>> executed blockly.
> >>>>
> >>>> Either unified async execution or unified sync execution, will hurt
> one
> >>>> side of the streaming
> >>>> batch users. In order to make both sides happy, I think we can have
> >>>> different behavior for batch and streaming.
> >>>> There are many essential differences between batch and stream
> systems, I
> >>>> think it's normal to have some
> >>>> different behaviors, and the behavior doesn't break the unified batch
> >>>> stream semantics.
> >>>>
> >>>>
> >>>> Thus, I'm +1 to Alternative 1:
> >>>> We consider batch/streaming mode and block for batch INSERT INTO and
> >> async
> >>>> for streaming INSERT INTO/STATEMENT SET.
> >>>> And this behavior is consistent across CLI and files.
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>> [1]:
> >>>>
> >>>>
> >>
> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
> >>>>
> >>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org> wrote:
> >>>>
> >>>>> Hi Jark,
> >>>>>
> >>>>> thanks for the summary. I hope we can also find a good long-term
> >>>>> solution on the async/sync execution behavior topic.
> >>>>>
> >>>>> It should be discussed in a bigger round because it is (similar to
> the
> >>>>> time function discussion) related to batch-streaming unification
> where
> >>>>> we should stick to the SQL standard to some degree but also need to
> >> come
> >>>>> up with good streaming semantics.
> >>>>>
> >>>>> Let me summarize the problem again to hear opinions:
> >>>>>
> >>>>> - Batch SQL users are used to execute SQL files sequentially (from
> top
> >>>>> to bottom).
> >>>>> - Batch SQL users are used to SQL statements being executed blocking.
> >>>>> One after the other. Esp. when moving around data with INSERT INTO.
> >>>>> - Streaming users prefer async execution because unbounded stream are
> >>>>> more frequent than bounded streams.
> >>>>> - We decided to make Flink Table API is async because in a
> programming
> >>>>> language it is easy to call `.await()` on the result to make it
> >> blocking.
> >>>>> - INSERT INTO statements in the current SQL Client implementation are
> >>>>> always submitted asynchrounous.
> >>>>> - Other client's such as Ververica platform allow only one INSERT
> INTO
> >>>>> or a STATEMENT SET at the end of a file that will run
> asynchrounously.
> >>>>>
> >>>>> Questions:
> >>>>>
> >>>>> - How should we execute statements in CLI and in file? Should there
> be
> >> a
> >>>>> difference?
> >>>>> - Should we have different behavior for batch and streaming?
> >>>>> - Shall we solve parts with a config option or is it better to make
> it
> >>>>> explicit in the SQL job definition because it influences the
> semantics
> >>>>> of multiple INSERT INTOs?
> >>>>>
> >>>>> Let me summarize my opinion at the moment:
> >>>>>
> >>>>> - SQL files should always be executed blocking by default. Because
> they
> >>>>> could potentially contain a long list of INSERT INTO statements. This
> >>>>> would be SQL standard compliant.
> >>>>> - If we allow async execution, we should make this explicit in the
> SQL
> >>>>> file via `BEGIN ASYNC; ... END;`.
> >>>>> - In the CLI, we always execute async to maintain the old behavior.
> We
> >>>>> can also assume that people are only using the CLI to fire statements
> >>>>> and close the CLI afterwards.
> >>>>>
> >>>>> Alternative 1:
> >>>>> - We consider batch/streaming mode and block for batch INSERT INTO
> and
> >>>>> async for streaming INSERT INTO/STATEMENT SET
> >>>>>
> >>>>> What do others think?
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 05.02.21 04:03, Jark Wu wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> After an offline discussion with Timo and Kurt, we have reached some
> >>>>>> consensus.
> >>>>>> Please correct me if I am wrong or missed anything.
> >>>>>>
> >>>>>> 1) We will introduce "table.planner" and "table.execution-mode"
> >> instead
> >>>>> of
> >>>>>> "sql-client" prefix,
> >>>>>> and add `TableEnvironment.create(Configuration)` interface. These 2
> >>>>> options
> >>>>>> can only be used
> >>>>>> for tableEnv initialization. If used after initialization, Flink
> >> should
> >>>>>> throw an exception. We may can
> >>>>>> support dynamic switch the planner in the future.
> >>>>>>
> >>>>>> 2) We will have only one parser,
> >>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a string
> >>>>>> statement, and returns a list of Operation. It will first use regex
> to
> >>>>>> match some special statement,
> >>>>>>     e.g. SET, ADD JAR, others will be delegated to the underlying
> >> Calcite
> >>>>>> parser. The Parser can
> >>>>>> have different implementations, e.g. HiveParser.
> >>>>>>
> >>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect.
> >> But
> >>>>> we
> >>>>>> can allow
> >>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
> >>>>>>
> >>>>>> 4) We don't have a conclusion for async/sync execution behavior yet.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Ingo,
> >>>>>>>
> >>>>>>> Since we have supported the WITH syntax and SET command since v1.9
> >>>>> [1][2],
> >>>>>>> and
> >>>>>>> we have never received such complaints, I think it's fine for such
> >>>>>>> differences.
> >>>>>>>
> >>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
> >>>> requires
> >>>>>>> string literal keys[3],
> >>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>> [1]:
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
> >>>>>>> [2]:
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
> >>>>>>> [3]:
> >>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> >>>>>>> [4]:
> >>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
> >>>>>>> (search "set mapred.reduce.tasks=32")
> >>>>>>>
> >>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> regarding the (un-)quoted question, compatibility is of course an
> >>>>>>>> important
> >>>>>>>> argument, but in terms of consistency I'd find it a bit surprising
> >>>> that
> >>>>>>>> WITH handles it differently than SET, and I wonder if that could
> >>>> cause
> >>>>>>>> friction for developers when writing their SQL.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Ingo
> >>>>>>>>
> >>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> Regarding "One Parser", I think it's not possible for now because
> >>>>>>>> Calcite
> >>>>>>>>> parser can't parse
> >>>>>>>>> special characters (e.g. "-") unless quoting them as string
> >>>> literals.
> >>>>>>>>> That's why the WITH option
> >>>>>>>>> key are string literals not identifiers.
> >>>>>>>>>
> >>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
> >>>>>>>>> /local/my-home/test.jar
> >>>>>>>>> have the same
> >>>>>>>>> problems. That's why we propose two parser, one splits lines into
> >>>>>>>> multiple
> >>>>>>>>> statements and match special
> >>>>>>>>> command through regex which is light-weight, and delegate other
> >>>>>>>> statements
> >>>>>>>>> to the other parser which is Calcite parser.
> >>>>>>>>>
> >>>>>>>>> Note: we should stick on the unquoted SET
> >>>>> table.exec.mini-batch.enabled
> >>>>>>>> =
> >>>>>>>>> true syntax,
> >>>>>>>>> both for backward-compatibility and easy-to-use, and all the
> other
> >>>>>>>> systems
> >>>>>>>>> don't have quotes on the key.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Regarding "table.planner" vs "sql-client.planner",
> >>>>>>>>> if we want to use "table.planner", I think we should explain
> >> clearly
> >>>>>>>> what's
> >>>>>>>>> the scope it can be used in documentation.
> >>>>>>>>> Otherwise, there will be users complaining why the planner
> doesn't
> >>>>>>>> change
> >>>>>>>>> when setting the configuration on TableEnv.
> >>>>>>>>> Would be better throwing an exception to indicate users it's now
> >>>>>>>> allowed to
> >>>>>>>>> change planner after TableEnv is initialized.
> >>>>>>>>> However, it seems not easy to implement.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jark
> >>>>>>>>>
> >>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> Regarding "table.planner" and "table.execution-mode"
> >>>>>>>>>> If we define that those two options are just used to initialize
> >> the
> >>>>>>>>>> TableEnvironment, +1 for introducing table options instead of
> >>>>>>>> sql-client
> >>>>>>>>>> options.
> >>>>>>>>>>
> >>>>>>>>>> Regarding "the sql client, we will maintain two parsers", I want
> >> to
> >>>>>>>> give
> >>>>>>>>>> more inputs:
> >>>>>>>>>> We want to introduce sql-gateway into the Flink project (see
> >>>> FLIP-24
> >>>>> &
> >>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI
> >>>> client
> >>>>>>>> and
> >>>>>>>>>> the gateway service will communicate through Rest API. The " ADD
> >>>> JAR
> >>>>>>>>>> /local/path/jar " will be executed in the CLI client machine. So
> >>>> when
> >>>>>>>> we
> >>>>>>>>>> submit a sql file which contains multiple statements, the CLI
> >>>> client
> >>>>>>>>> needs
> >>>>>>>>>> to pick out the "ADD JAR" line, and also statements need to be
> >>>>>>>> submitted
> >>>>>>>>> or
> >>>>>>>>>> executed one by one to make sure the result is correct. The sql
> >>>> file
> >>>>>>>> may
> >>>>>>>>> be
> >>>>>>>>>> look like:
> >>>>>>>>>>
> >>>>>>>>>> SET xxx=yyy;
> >>>>>>>>>> create table my_table ...;
> >>>>>>>>>> create table my_sink ...;
> >>>>>>>>>> ADD JAR /local/path/jar1;
> >>>>>>>>>> create function my_udf as com....MyUdf;
> >>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>>>>>>> REMOVE JAR /local/path/jar1;
> >>>>>>>>>> drop function my_udf;
> >>>>>>>>>> ADD JAR /local/path/jar2;
> >>>>>>>>>> create function my_udf as com....MyUdf2;
> >>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>>>>>>>
> >>>>>>>>>> The lines need to be splitted into multiple statements first in
> >> the
> >>>>>>>> CLI
> >>>>>>>>>> client, there are two approaches:
> >>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser
> splits
> >>>>> the
> >>>>>>>>>> lines and tells which lines are "ADD JAR".
> >>>>>>>>>> pro: there is only one parser
> >>>>>>>>>> cons: It's a little heavy that the CLI client depends on the
> >>>>>>>> sql-parser,
> >>>>>>>>>> because the CLI client is just a simple tool which receives the
> >>>> user
> >>>>>>>>>> commands and displays the result. The non "ADD JAR" command will
> >> be
> >>>>>>>>> parsed
> >>>>>>>>>> twice.
> >>>>>>>>>>
> >>>>>>>>>> 2. The CLI client splits the lines into multiple statements and
> >>>> finds
> >>>>>>>> the
> >>>>>>>>>> ADD JAR command through regex matching.
> >>>>>>>>>> pro: The CLI client is very light-weight.
> >>>>>>>>>> cons: there are two parsers.
> >>>>>>>>>>
> >>>>>>>>>> (personally, I prefer the second option)
> >>>>>>>>>>
> >>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
> >>>>>>>>>> For default dialect, we support SHOW JARS, but if we switch to
> >> hive
> >>>>>>>>>> dialect, LIST JARS is also supported.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Godfrey
> >>>>>>>>>>
> >>>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> >>>>>>>>>>
> >>>>>>>>>>> Hi guys,
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with
> >>>> other
> >>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion about
> >>>> REMOVE
> >>>>>>>> vs
> >>>>>>>>>>> DELETE though.
> >>>>>>>>>>>
> >>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I
> know,
> >>>>>>>> most
> >>>>>>>>>>> users who are requesting these features are previously hive
> >> users.
> >>>>>>>> So I
> >>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
> >>>> REMOVE/DELETE
> >>>>>>>>> JARS
> >>>>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT
> and
> >>>>>>>> QUIT
> >>>>>>>>> as
> >>>>>>>>>>> the command to terminate the program. So if that's not hard to
> >>>>>>>> achieve,
> >>>>>>>>>> and
> >>>>>>>>>>> will make users happier, I don't see a reason why we must
> choose
> >>>> one
> >>>>>>>>> over
> >>>>>>>>>>> the other.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <
> twalthr@apache.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>
> >>>>>>>>>>>> some feedback regarding the open questions. Maybe we can
> discuss
> >>>>>>>> the
> >>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to determine
> >> how
> >>>>>>>> we
> >>>>>>>>>>>> proceed with this in the near future.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1) "whether the table environment has the ability to update
> >>>>>>>> itself"
> >>>>>>>>>>>>
> >>>>>>>>>>>> Maybe there was some misunderstanding. I don't think that we
> >>>>>>>> should
> >>>>>>>>>>>> support
> >>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
> >>>>>>>>>>>> "old")`. Instead I'm proposing to support
> >>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
> >>>>>>>> execution
> >>>>>>>>>>>> mode are read immediately and a subsequent changes to these
> >>>>>>>> options
> >>>>>>>>>> will
> >>>>>>>>>>>> have no effect. We are doing it similar in `new
> >>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
> >>>>>>>> ConfigOption's
> >>>>>>>>>>>> must not be SQL Client specific but can be part of the core
> >> table
> >>>>>>>>> code
> >>>>>>>>>>>> base. Many users would like to get a 100% preconfigured
> >>>>>>>> environment
> >>>>>>>>>> from
> >>>>>>>>>>>> just Configuration. And this is not possible right now. We can
> >>>>>>>> solve
> >>>>>>>>>>>> both use cases in one change.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
> >>>>>>>>>>>>
> >>>>>>>>>>>> I remember we had some discussion about this and decided that
> we
> >>>>>>>>> would
> >>>>>>>>>>>> like to maintain only one parser. In the end it is "One Flink
> >>>> SQL"
> >>>>>>>>>> where
> >>>>>>>>>>>> commands influence each other also with respect to keywords.
> It
> >>>>>>>>> should
> >>>>>>>>>>>> be fine to include the SQL Client commands in the Flink
> parser.
> >>>> Of
> >>>>>>>>>>>> cource the table environment would not be able to handle the
> >>>>>>>>>> `Operation`
> >>>>>>>>>>>> instance that would be the result but we can introduce hooks
> to
> >>>>>>>>> handle
> >>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Can we skip `table.job.async` in the first version? We should
> >>>>>>>> further
> >>>>>>>>>>>> discuss whether we introduce a special SQL clause for wrapping
> >>>>>>>> async
> >>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming
> >> queries
> >>>>>>>> we
> >>>>>>>>>>>> need to be careful and should force users to either "one
> INSERT
> >>>>>>>> INTO"
> >>>>>>>>>> or
> >>>>>>>>>>>> "one STATEMENT SET".
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3) 4) "HIVE also uses these commands"
> >>>>>>>>>>>>
> >>>>>>>>>>>> In general, Hive is not a good reference. Aligning the
> commands
> >>>>>>>> more
> >>>>>>>>>>>> with the remaining commands should be our goal. We just had a
> >>>>>>>> MODULE
> >>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it is
> >> true
> >>>>>>>>> that
> >>>>>>>>>>>> JARs are not part of the catalog which is why I would not use
> >>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> >>>>>>>>> language.
> >>>>>>>>>>>> Take a look at the Java collection API as another example.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 6) "Most of the commands should belong to the table
> environment"
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
> >>>>>>>> understand.
> >>>>>>>>> It
> >>>>>>>>>>>> is good to see that most commends will be available in
> >>>>>>>>>> TableEnvironment.
> >>>>>>>>>>>> However, I would also support SET and RESET for consistency.
> >>>>>>>> Again,
> >>>>>>>>>> from
> >>>>>>>>>>>> an architectural point of view, if we would allow some kind of
> >>>>>>>>>>>> `Operation` hook in table environment, we could check for SQL
> >>>>>>>> Client
> >>>>>>>>>>>> specific options and forward to regular
> >>>>>>>>> `TableConfig.getConfiguration`
> >>>>>>>>>>>> otherwise. What do you think?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
> >>>>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I will respond some of the questions:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) SQL client specific options
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on
> where
> >>>>>>>> the
> >>>>>>>>>>>>> configuration takes effect.
> >>>>>>>>>>>>> If it is a table configuration, we should make clear what's
> the
> >>>>>>>>>>> behavior
> >>>>>>>>>>>>> when users change
> >>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
> >>>>>>>>>>>> `sql-client.execution.mode`
> >>>>>>>>>>>>> are something special
> >>>>>>>>>>>>> that can't be changed after TableEnvironment has been
> >>>>>>>> initialized.
> >>>>>>>>>> You
> >>>>>>>>>>>> can
> >>>>>>>>>>>>> see
> >>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method
> to
> >>>>>>>>>> override
> >>>>>>>>>>>>> configuration after
> >>>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Therefore, I think it would be better to still use
> >>>>>>>>>>> `sql-client.planner`
> >>>>>>>>>>>>> and `sql-client.execution.mode`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2) Execution file
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> >From my point of view, there is a big difference between
> >>>>>>>>>>>>> `sql-client.job.detach` and
> >>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
> >>>>>>>> `sql-client.job.detach`
> >>>>>>>>>> will
> >>>>>>>>>>>>> affect every single DML statement
> >>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I
> think
> >>>>>>>> the
> >>>>>>>>>>> single
> >>>>>>>>>>>>> DML statement in the interactive
> >>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
> >>>>>>>>>>>>> tEnv#executeMultiSql.
> >>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
> >>>>>>>>>>> `table.multi-sql-async`.
> >>>>>>>>>>>>> I just find that runtime provides a configuration called
> >>>>>>>>>>>>> "execution.attached" [1] which is false by default
> >>>>>>>>>>>>> which specifies if the pipeline is submitted in attached or
> >>>>>>>>> detached
> >>>>>>>>>>>> mode.
> >>>>>>>>>>>>> It provides exactly the same
> >>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you think
> >>>>>>>> about
> >>>>>>>>>> using
> >>>>>>>>>>>>> this option?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If we also want to support this config in TableEnvironment, I
> >>>>>>>> think
> >>>>>>>>>> it
> >>>>>>>>>>>>> should also affect the DML execution
> >>>>>>>>>>>>>      of `tEnv#executeSql()`, not only DMLs in
> >>>>>>>>> `tEnv#executeMultiSql()`.
> >>>>>>>>>>>>> Therefore, the behavior may look like this:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==>
> async
> >>>>>>>> by
> >>>>>>>>>>>> default
> >>>>>>>>>>>>> tableResult.await()   ==> manually block until finish
> >>>>>>>>>>>>>
> >>>>>>>>
> tEnv.getConfig().getConfiguration().setString("execution.attached",
> >>>>>>>>>>>> "true")
> >>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==>
> >> sync,
> >>>>>>>>>> don't
> >>>>>>>>>>>> need
> >>>>>>>>>>>>> to wait on the TableResult
> >>>>>>>>>>>>> tEnv.executeMultiSql(
> >>>>>>>>>>>>> """
> >>>>>>>>>>>>> CREATE TABLE ....  ==> always sync
> >>>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration above
> >>>>>>>>>>>>> SET execution.attached = false;
> >>>>>>>>>>>>> INSERT INTO ...  => async
> >>>>>>>>>>>>> """)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
> >>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two
> separate
> >>>>>>>>>> topics,
> >>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
> >>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
> >>>>>>>> statements.
> >>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
> >>>>>>>> it to
> >>>>>>>>>>> block
> >>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <
> fskmine@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi, Timo.
> >>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts
> about
> >>>>>>>> your
> >>>>>>>>>>>>>> feedback.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the
> table
> >>>>>>>>>>>> environment
> >>>>>>>>>>>>>> has the ability to update itself. Let's take a simple
> program
> >>>>>>>> as
> >>>>>>>>> an
> >>>>>>>>>>>>>> example.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
> >>>>>>>> "old");
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> tEnv.executeSql("...");
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If we regard this option as a table option, users don't have
> >> to
> >>>>>>>>>> create
> >>>>>>>>>>>>>> another table environment manually. In that case, tEnv needs
> >> to
> >>>>>>>>>> check
> >>>>>>>>>>>>>> whether the current mode and planner are the same as before
> >>>>>>>> when
> >>>>>>>>>>>> executeSql
> >>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the table
> >>>>>>>>>> environment,
> >>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but
> set
> >>>>>>>> old
> >>>>>>>>>>>> planner
> >>>>>>>>>>>>>> and batch mode. But when we make this option as a sql client
> >>>>>>>>> option,
> >>>>>>>>>>>> users
> >>>>>>>>>>>>>> only use the SET command to change the setting. We can
> rebuild
> >>>>>>>> a
> >>>>>>>>> new
> >>>>>>>>>>>> table
> >>>>>>>>>>>>>> environment when set successes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the
> implementation
> >>>>>>>>> before
> >>>>>>>>>>>>>> continuing this topic. In the sql client, we will maintain
> two
> >>>>>>>>>>> parsers.
> >>>>>>>>>>>> The
> >>>>>>>>>>>>>> first parser(client parser) will only match the sql client
> >>>>>>>>> commands.
> >>>>>>>>>>> If
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> client parser can't parse the statement, we will leverage
> the
> >>>>>>>>> power
> >>>>>>>>>> of
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> table environment to execute. According to our blueprint,
> >>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
> >>>>>>>>> Therefore,
> >>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this
> >> FLIP.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> But if we need to introduce the
> >>>>>>>> `TableEnvironment.executeMultiSql`
> >>>>>>>>>> in
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> future, I think it's OK to use the option
> >>>>>>>> `table.multi-sql-async`
> >>>>>>>>>>> rather
> >>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the name
> is
> >>>>>>>> not
> >>>>>>>>>>>> suitable
> >>>>>>>>>>>>>> because the name is confusing for others. When setting the
> >>>>>>>> option
> >>>>>>>>>>>> false, we
> >>>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
> >>>>>>>>> statement,
> >>>>>>>>>>> not
> >>>>>>>>>>>> DDL
> >>>>>>>>>>>>>> or others(other sql statements are always executed
> >>>>>>>> synchronously).
> >>>>>>>>>> So
> >>>>>>>>>>>> how
> >>>>>>>>>>>>>> about `table.job.async`? It only works for the sql-client
> and
> >>>>>>>> the
> >>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
> >>>>>>>> environment
> >>>>>>>>>>> will
> >>>>>>>>>>>>>> return the result until the job finishes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR
> and
> >>>>>>>>> LIST
> >>>>>>>>>>> JAR
> >>>>>>>>>>>>>> because HIVE also uses these commands to add the jar into
> the
> >>>>>>>>>>> classpath
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce our
> >>>>>>>> work
> >>>>>>>>> for
> >>>>>>>>>>>> hive
> >>>>>>>>>>>>>> compatibility.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
> >>>>>>>>>> maintained
> >>>>>>>>>>> by
> >>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
> >>>>>>>>> grammar,
> >>>>>>>>>>>> maybe
> >>>>>>>>>>>>>> we should use
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
> >>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
> >>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
> >>>>>>>> consistent.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to
> the
> >>>>>>>>> table
> >>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
> >>>>>>>>>> identify
> >>>>>>>>>>>> which
> >>>>>>>>>>>>>> commands should belong to the sql client and which commands
> >>>>>>>> should
> >>>>>>>>>>>> belong
> >>>>>>>>>>>>>> to the table environment. I also add a new section about
> >>>>>>>>>>> implementation
> >>>>>>>>>>>>>> details in the FLIP.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
> >>>>>>>> SQL
> >>>>>>>>>>> Client
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>> very good update and make it production ready.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Here is some feedback from my side:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1) SQL client specific options
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I don't think that `sql-client.planner` and
> >>>>>>>>>>> `sql-client.execution.mode`
> >>>>>>>>>>>>>>> are SQL Client specific. Similar to
> >>>>>>>> `StreamExecutionEnvironment`
> >>>>>>>>>> and
> >>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added recently,
> we
> >>>>>>>>>> should
> >>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we
> offer
> >>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
> >>>>>>>>> `table.planner`
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> `table.execution-mode` to
> >>>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2) Execution file
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1]
> including
> >>>>>>>> the
> >>>>>>>>>>>> mailing
> >>>>>>>>>>>>>>> list thread at that time? Could you further elaborate how
> the
> >>>>>>>>>>>>>>> multi-statement execution should work for a unified
> >>>>>>>>> batch/streaming
> >>>>>>>>>>>>>>> story? According to our past discussions, each line in an
> >>>>>>>>> execution
> >>>>>>>>>>>> file
> >>>>>>>>>>>>>>> should be executed blocking which means a streaming query
> >>>>>>>> needs a
> >>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
> >>>>>>>> correct?
> >>>>>>>>>> We
> >>>>>>>>>>>>>>> should also offer this functionality in
> >>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
> >>>>>>>>>> `sql-client.job.detach`
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could also
> be
> >> a
> >>>>>>>>>>> general
> >>>>>>>>>>>>>>> `table.multi-sql-async` option?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 3) DELETE JAR
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE"
> sounds
> >>>>>>>> like
> >>>>>>>>>> one
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 4) LIST JAR
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
> >>>>>>>> such
> >>>>>>>>> as
> >>>>>>>>>>>> `SHOW
> >>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> We should keep the details in sync with
> >>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
> >> confusion
> >>>>>>>>>> about
> >>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
> >>>>>>>>> `ESTIMATED_COST`
> >>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a
> reason
> >>>>>>>> why
> >>>>>>>>> to
> >>>>>>>>>>>> call
> >>>>>>>>>>>>>>> it that way.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 6) Implementation details
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It would be nice to understand how we plan to implement the
> >>>>>>>> given
> >>>>>>>>>>>>>>> features. Most of the commands and config options should go
> >>>>>>>> into
> >>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
> >>>>>>>> users
> >>>>>>>>>>> have a
> >>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
> >>>>>>>> provide a
> >>>>>>>>>>>> similar
> >>>>>>>>>>>>>>> user experience in notebooks or interactive programs than
> the
> >>>>>>>> SQL
> >>>>>>>>>>>> Client.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
> >>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather
> >> than
> >>>>>>>>>>> `UNSET`.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi, Jingsong.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
> >>>>>>>> `RESET`
> >>>>>>>>> is
> >>>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal
> just
> >>>>>>>>>> extends
> >>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
> >>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
> >>>>>>>> default
> >>>>>>>>>>>>>>> value[1].
> >>>>>>>>>>>>>>>>> I think it is more friendly for batch users.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
> >> 下午1:56写道：
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too
> outdated.
> >>>>>>>> +1
> >>>>>>>>> for
> >>>>>>>>>>>>>>>>>> improving it.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>> Jingsong
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> >>>>>>>> lirui.fudan@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes
> look
> >>>>>>>>> good
> >>>>>>>>>> to
> >>>>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> >>>>>>>>>> fskmine@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi, Rui.
> >>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The main changes:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
> >>>>>>>> type.
> >>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result
> of
> >>>>>>>>>> queries
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> debug
> >>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
> >> convenient
> >>>>>>>>>>>> comparing
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach`
> .
> >>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
> >>>>>>>> mode.
> >>>>>>>>>> Users
> >>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>> this option false and the client will process the next
> >>>>>>>> job
> >>>>>>>>>> until
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this option
> >> is
> >>>>>>>>>> false,
> >>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>> means the client will execute the next job when the
> >>>>>>>> current
> >>>>>>>>>> job
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> submitted.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
> 下午4:52写道：
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
> >>>>>>>> have
> >>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
> >>>>>>>>>> example,
> >>>>>>>>>>> if
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what happens
> if
> >>>>>>>> the
> >>>>>>>>>> file
> >>>>>>>>>>>>>>>>>>> contains
> >>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
> >>>>>>>> them
> >>>>>>>>> as
> >>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> statement
> >>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
> >>>>>>>> STATEMENT
> >>>>>>>>>> SET
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit
> >> the
> >>>>>>>>> two
> >>>>>>>>>>>> jobs,
> >>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> >>>>>>>>>>> fskmine@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi Rui,
> >>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
> >>>>>>>> suggestions.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
> >>>>>>>> the
> >>>>>>>>> set
> >>>>>>>>>>>>>>>>>>> command. In
> >>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value
> >> into
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
> >>>>>>>> table
> >>>>>>>>>>>> config.
> >>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>> hive
> >>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table config,
> >>>>>>>> users
> >>>>>>>>>> are
> >>>>>>>>>>>>>> able
> >>>>>>>>>>>>>>>>>>> to set
> >>>>>>>>>>>>>>>>>>>>>> the hive-related settings.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit
> the
> >>>>>>>> job
> >>>>>>>>>> and
> >>>>>>>>>>>>>>> exit.
> >>>>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job
> by
> >>>>>>>>>>>>>> themselves,
> >>>>>>>>>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
> >>>>>>>> case,
> >>>>>>>>>>>> queries
> >>>>>>>>>>>>>>>>>>> are used
> >>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
> >>>>>>>>>>> interactive
> >>>>>>>>>>>>>>>>>>> mode.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
> >> 下午3:18写道：
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
> >>>>>>>> think
> >>>>>>>>> it
> >>>>>>>>>>>>>>> covers a
> >>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
> >> improve
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> usability of our
> >>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the
> FLIP.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
> >>>>>>>>>> configurations
> >>>>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
> >>>>>>>> configurations
> >>>>>>>>>> and
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> don't have
> >>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in
> >> SQL
> >>>>>>>>>>> Client.
> >>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>> example,
> >>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
> >>>>>>>> using
> >>>>>>>>>> hive
> >>>>>>>>>>>>>>>>>>> connector [1].
> >>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
> >>>>>>>> files
> >>>>>>>>>>>>>> specified
> >>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option
> but
> >>>>>>>>> allows
> >>>>>>>>>>>>>>> queries
> >>>>>>>>>>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query
> and
> >>>>>>>>>> redirect
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like
> to
> >>>>>>>> do
> >>>>>>>>>> the
> >>>>>>>>>>>>>> same,
> >>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> [1]
> >> https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
> >>>>>>>> additional
> >>>>>>>>>>>>>>>>>>> suggestions:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext
> >> to
> >>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and
> batch
> >>>>>>>> sql.
> >>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql
> client
> >>>>>>>>>> collect
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
> >>>>>>>>>>>>>>>>>>>>>>>>            which may have memory issues in JM or
> >> Local
> >>>>>>>> for
> >>>>>>>>>> the
> >>>>>>>>>>>> big
> >>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>>>>>>> result.
> >>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> >>>>>>>>>>>>>>>>>>>>>>>>            We may change to use SelectTableSink,
> >> which
> >>>>>>>> is
> >>>>>>>>>> based
> >>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
> >>>>>>>> is in
> >>>>>>>>>>>>>> FLIP-91.
> >>>>>>>>>>>>>>>>>>> Seems
> >>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long
> >> time.
> >>>>>>>>>>>>>>>>>>>>>>>>            Provide a long running service out of
> the
> >>>>>>>> box to
> >>>>>>>>>>>>>>> facilitate
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> sql
> >>>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
> >>>>>>>>> 下午8:54写道：
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
> >>>>>>>>> FLIP-163:SQL
> >>>>>>>>>>>>>> Client
> >>>>>>>>>>>>>>>>>>>>>>>>> Improvements.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of
> >> the
> >>>>>>>>> sql
> >>>>>>>>>>>>>> client.
> >>>>>>>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table
> proposed
> >>>>>>>> by
> >>>>>>>>>>>> FLIP-95.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
> >>>>>>>>> initialize
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> >>>>>>>>>> parameter;
> >>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
> >>>>>>>> FLIP-163[1].
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> *With kind regards
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> ------------------------------------------------------------
> >>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy
> >> of
> >>>>>>>>>>> Science
> >>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
> >> liuyang0704@gmail.com
> >>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>>>>>>>>>>> Rui Li
> >>>>>>>>>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Hi Jark,

you are right. Nesting STATEMENT SET and ASYNC might be too verbose.

So let's stick to the config option approach.

However, I strongly believe that we should not use the batch/streaming 
mode for deriving semantics. This discussion is similar to time function 
discussion. We should not derive sync/async submission behavior from a 
flag that should only influence runtime operators and the incremental 
computation. Statements for bounded streams should have the same 
semantics in batch mode.

I think your proposed option 2) is a good tradeoff. For the following 
reasons:

pros:
- by default, batch and streaming behave exactly the same
- SQL Client CLI behavior does not change compared to 1.12 and remains 
async for batch and streaming
- consistent with the async Table API behavior

con:
- batch files are not 100% SQL compliant by default

The last item might not be an issue since we can expect that users have 
long-running jobs and prefer async execution in most cases.

Regards,
Timo


On 08.02.21 14:15, Jark Wu wrote:
> Hi Timo,
> 
> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
> Because it makes submitting streaming jobs very verbose, every INSERT INTO
> and STATEMENT SET must be wrapped in the ASYNC clause which is
> not user-friendly and not backward-compatible.
> 
> I agree we will have unified behavior but this is at the cost of hurting
> our main users.
> I'm worried that end users can't understand the technical decision, and
> they would
> feel streaming is harder to use.
> 
> If we want to have an unified behavior, and let users decide what's the
> desirable behavior, I prefer to have a config option. A Flink cluster can
> be set to async, then
> users don't need to wrap every DML in an ASYNC clause. This is the least
> intrusive
> way to the users.
> 
> 
> Personally, I'm fine with following options in priority:
> 
> 1) sync for batch DML and async for streaming DML
> ==> only breaks batch behavior, but makes both happy
> 
> 2) async for both batch and streaming DML, and can be set to sync via a
> configuration.
> ==> compatible, and provides flexible configurable behavior
> 
> 3) sync for both batch and streaming DML, and can be
>      set to async via a configuration.
> ==> +0 for this, because it breaks all the compatibility, esp. our main
> users.
> 
> Best,
> Jark
> 
> On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org> wrote:
> 
>> Hi Jark, Hi Rui,
>>
>> 1) How should we execute statements in CLI and in file? Should there be
>> a difference?
>> So it seems we have consensus here with unified bahavior. Even though
>> this means we are breaking existing batch INSERT INTOs that were
>> asynchronous before.
>>
>> 2) Should we have different behavior for batch and streaming?
>> I think also batch users prefer async behavior because usually even
>> those pipelines take some time to execute. But we need should stick to
>> standard SQL blocking semantics.
>>
>> What are your opinions on making async explicit in SQL via `BEGIN ASYNC;
>> ... END;`? This would allow us to really have unified semantics because
>> batch and streaming would behave the same?
>>
>> Regards,
>> Timo
>>
>>
>> On 07.02.21 04:46, Rui Li wrote:
>>> Hi Timo,
>>>
>>> I agree with Jark that we should provide consistent experience regarding
>>> SQL CLI and files. Some systems even allow users to execute SQL files in
>>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that
>> in
>>> the future, it's a little tricky to decide whether that should be treated
>>> as CLI or file.
>>>
>>> I actually prefer a config option and let users decide what's the
>>> desirable behavior. But if we have agreed not to use options, I'm also
>> fine
>>> with Alternative #1.
>>>
>>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
>>>
>>>> Hi Timo,
>>>>
>>>> 1) How should we execute statements in CLI and in file? Should there be
>> a
>>>> difference?
>>>> I do think we should unify the behavior of CLI and SQL files. SQL files
>> can
>>>> be thought of as a shortcut of
>>>> "start CLI" => "copy content of SQL files" => "past content in CLI".
>>>> Actually, we already did this in kafka_e2e.sql [1].
>>>> I think it's hard for users to understand why SQL files behave
>> differently
>>>> from CLI, all the other systems don't have such a difference.
>>>>
>>>> If we distinguish SQL files and CLI, should there be a difference in
>> JDBC
>>>> driver and UI platform?
>>>> Personally, they all should have consistent behavior.
>>>>
>>>> 2) Should we have different behavior for batch and streaming?
>>>> I think we all agree streaming users prefer async execution, otherwise
>> it's
>>>> weird and difficult to use if the
>>>> submit script or CLI never exists. On the other hand, batch SQL users
>> are
>>>> used to SQL statements being
>>>> executed blockly.
>>>>
>>>> Either unified async execution or unified sync execution, will hurt one
>>>> side of the streaming
>>>> batch users. In order to make both sides happy, I think we can have
>>>> different behavior for batch and streaming.
>>>> There are many essential differences between batch and stream systems, I
>>>> think it's normal to have some
>>>> different behaviors, and the behavior doesn't break the unified batch
>>>> stream semantics.
>>>>
>>>>
>>>> Thus, I'm +1 to Alternative 1:
>>>> We consider batch/streaming mode and block for batch INSERT INTO and
>> async
>>>> for streaming INSERT INTO/STATEMENT SET.
>>>> And this behavior is consistent across CLI and files.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> [1]:
>>>>
>>>>
>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
>>>>
>>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org> wrote:
>>>>
>>>>> Hi Jark,
>>>>>
>>>>> thanks for the summary. I hope we can also find a good long-term
>>>>> solution on the async/sync execution behavior topic.
>>>>>
>>>>> It should be discussed in a bigger round because it is (similar to the
>>>>> time function discussion) related to batch-streaming unification where
>>>>> we should stick to the SQL standard to some degree but also need to
>> come
>>>>> up with good streaming semantics.
>>>>>
>>>>> Let me summarize the problem again to hear opinions:
>>>>>
>>>>> - Batch SQL users are used to execute SQL files sequentially (from top
>>>>> to bottom).
>>>>> - Batch SQL users are used to SQL statements being executed blocking.
>>>>> One after the other. Esp. when moving around data with INSERT INTO.
>>>>> - Streaming users prefer async execution because unbounded stream are
>>>>> more frequent than bounded streams.
>>>>> - We decided to make Flink Table API is async because in a programming
>>>>> language it is easy to call `.await()` on the result to make it
>> blocking.
>>>>> - INSERT INTO statements in the current SQL Client implementation are
>>>>> always submitted asynchrounous.
>>>>> - Other client's such as Ververica platform allow only one INSERT INTO
>>>>> or a STATEMENT SET at the end of a file that will run asynchrounously.
>>>>>
>>>>> Questions:
>>>>>
>>>>> - How should we execute statements in CLI and in file? Should there be
>> a
>>>>> difference?
>>>>> - Should we have different behavior for batch and streaming?
>>>>> - Shall we solve parts with a config option or is it better to make it
>>>>> explicit in the SQL job definition because it influences the semantics
>>>>> of multiple INSERT INTOs?
>>>>>
>>>>> Let me summarize my opinion at the moment:
>>>>>
>>>>> - SQL files should always be executed blocking by default. Because they
>>>>> could potentially contain a long list of INSERT INTO statements. This
>>>>> would be SQL standard compliant.
>>>>> - If we allow async execution, we should make this explicit in the SQL
>>>>> file via `BEGIN ASYNC; ... END;`.
>>>>> - In the CLI, we always execute async to maintain the old behavior. We
>>>>> can also assume that people are only using the CLI to fire statements
>>>>> and close the CLI afterwards.
>>>>>
>>>>> Alternative 1:
>>>>> - We consider batch/streaming mode and block for batch INSERT INTO and
>>>>> async for streaming INSERT INTO/STATEMENT SET
>>>>>
>>>>> What do others think?
>>>>>
>>>>> Regards,
>>>>> Timo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05.02.21 04:03, Jark Wu wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> After an offline discussion with Timo and Kurt, we have reached some
>>>>>> consensus.
>>>>>> Please correct me if I am wrong or missed anything.
>>>>>>
>>>>>> 1) We will introduce "table.planner" and "table.execution-mode"
>> instead
>>>>> of
>>>>>> "sql-client" prefix,
>>>>>> and add `TableEnvironment.create(Configuration)` interface. These 2
>>>>> options
>>>>>> can only be used
>>>>>> for tableEnv initialization. If used after initialization, Flink
>> should
>>>>>> throw an exception. We may can
>>>>>> support dynamic switch the planner in the future.
>>>>>>
>>>>>> 2) We will have only one parser,
>>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a string
>>>>>> statement, and returns a list of Operation. It will first use regex to
>>>>>> match some special statement,
>>>>>>     e.g. SET, ADD JAR, others will be delegated to the underlying
>> Calcite
>>>>>> parser. The Parser can
>>>>>> have different implementations, e.g. HiveParser.
>>>>>>
>>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect.
>> But
>>>>> we
>>>>>> can allow
>>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
>>>>>>
>>>>>> 4) We don't have a conclusion for async/sync execution behavior yet.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Ingo,
>>>>>>>
>>>>>>> Since we have supported the WITH syntax and SET command since v1.9
>>>>> [1][2],
>>>>>>> and
>>>>>>> we have never received such complaints, I think it's fine for such
>>>>>>> differences.
>>>>>>>
>>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
>>>> requires
>>>>>>> string literal keys[3],
>>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>> [1]:
>>>>>>>
>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
>>>>>>> [2]:
>>>>>>>
>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
>>>>>>> [3]:
>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>>>>>>> [4]:
>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
>>>>>>> (search "set mapred.reduce.tasks=32")
>>>>>>>
>>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> regarding the (un-)quoted question, compatibility is of course an
>>>>>>>> important
>>>>>>>> argument, but in terms of consistency I'd find it a bit surprising
>>>> that
>>>>>>>> WITH handles it differently than SET, and I wonder if that could
>>>> cause
>>>>>>>> friction for developers when writing their SQL.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Ingo
>>>>>>>>
>>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Regarding "One Parser", I think it's not possible for now because
>>>>>>>> Calcite
>>>>>>>>> parser can't parse
>>>>>>>>> special characters (e.g. "-") unless quoting them as string
>>>> literals.
>>>>>>>>> That's why the WITH option
>>>>>>>>> key are string literals not identifiers.
>>>>>>>>>
>>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
>>>>>>>>> /local/my-home/test.jar
>>>>>>>>> have the same
>>>>>>>>> problems. That's why we propose two parser, one splits lines into
>>>>>>>> multiple
>>>>>>>>> statements and match special
>>>>>>>>> command through regex which is light-weight, and delegate other
>>>>>>>> statements
>>>>>>>>> to the other parser which is Calcite parser.
>>>>>>>>>
>>>>>>>>> Note: we should stick on the unquoted SET
>>>>> table.exec.mini-batch.enabled
>>>>>>>> =
>>>>>>>>> true syntax,
>>>>>>>>> both for backward-compatibility and easy-to-use, and all the other
>>>>>>>> systems
>>>>>>>>> don't have quotes on the key.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regarding "table.planner" vs "sql-client.planner",
>>>>>>>>> if we want to use "table.planner", I think we should explain
>> clearly
>>>>>>>> what's
>>>>>>>>> the scope it can be used in documentation.
>>>>>>>>> Otherwise, there will be users complaining why the planner doesn't
>>>>>>>> change
>>>>>>>>> when setting the configuration on TableEnv.
>>>>>>>>> Would be better throwing an exception to indicate users it's now
>>>>>>>> allowed to
>>>>>>>>> change planner after TableEnv is initialized.
>>>>>>>>> However, it seems not easy to implement.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jark
>>>>>>>>>
>>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> Regarding "table.planner" and "table.execution-mode"
>>>>>>>>>> If we define that those two options are just used to initialize
>> the
>>>>>>>>>> TableEnvironment, +1 for introducing table options instead of
>>>>>>>> sql-client
>>>>>>>>>> options.
>>>>>>>>>>
>>>>>>>>>> Regarding "the sql client, we will maintain two parsers", I want
>> to
>>>>>>>> give
>>>>>>>>>> more inputs:
>>>>>>>>>> We want to introduce sql-gateway into the Flink project (see
>>>> FLIP-24
>>>>> &
>>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI
>>>> client
>>>>>>>> and
>>>>>>>>>> the gateway service will communicate through Rest API. The " ADD
>>>> JAR
>>>>>>>>>> /local/path/jar " will be executed in the CLI client machine. So
>>>> when
>>>>>>>> we
>>>>>>>>>> submit a sql file which contains multiple statements, the CLI
>>>> client
>>>>>>>>> needs
>>>>>>>>>> to pick out the "ADD JAR" line, and also statements need to be
>>>>>>>> submitted
>>>>>>>>> or
>>>>>>>>>> executed one by one to make sure the result is correct. The sql
>>>> file
>>>>>>>> may
>>>>>>>>> be
>>>>>>>>>> look like:
>>>>>>>>>>
>>>>>>>>>> SET xxx=yyy;
>>>>>>>>>> create table my_table ...;
>>>>>>>>>> create table my_sink ...;
>>>>>>>>>> ADD JAR /local/path/jar1;
>>>>>>>>>> create function my_udf as com....MyUdf;
>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>>>>>>> REMOVE JAR /local/path/jar1;
>>>>>>>>>> drop function my_udf;
>>>>>>>>>> ADD JAR /local/path/jar2;
>>>>>>>>>> create function my_udf as com....MyUdf2;
>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>>>>>>>
>>>>>>>>>> The lines need to be splitted into multiple statements first in
>> the
>>>>>>>> CLI
>>>>>>>>>> client, there are two approaches:
>>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser splits
>>>>> the
>>>>>>>>>> lines and tells which lines are "ADD JAR".
>>>>>>>>>> pro: there is only one parser
>>>>>>>>>> cons: It's a little heavy that the CLI client depends on the
>>>>>>>> sql-parser,
>>>>>>>>>> because the CLI client is just a simple tool which receives the
>>>> user
>>>>>>>>>> commands and displays the result. The non "ADD JAR" command will
>> be
>>>>>>>>> parsed
>>>>>>>>>> twice.
>>>>>>>>>>
>>>>>>>>>> 2. The CLI client splits the lines into multiple statements and
>>>> finds
>>>>>>>> the
>>>>>>>>>> ADD JAR command through regex matching.
>>>>>>>>>> pro: The CLI client is very light-weight.
>>>>>>>>>> cons: there are two parsers.
>>>>>>>>>>
>>>>>>>>>> (personally, I prefer the second option)
>>>>>>>>>>
>>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
>>>>>>>>>> For default dialect, we support SHOW JARS, but if we switch to
>> hive
>>>>>>>>>> dialect, LIST JARS is also supported.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>>>>>>>>>> [2]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Godfrey
>>>>>>>>>>
>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>>>>>>>>>>
>>>>>>>>>>> Hi guys,
>>>>>>>>>>>
>>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with
>>>> other
>>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion about
>>>> REMOVE
>>>>>>>> vs
>>>>>>>>>>> DELETE though.
>>>>>>>>>>>
>>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I know,
>>>>>>>> most
>>>>>>>>>>> users who are requesting these features are previously hive
>> users.
>>>>>>>> So I
>>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
>>>> REMOVE/DELETE
>>>>>>>>> JARS
>>>>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT and
>>>>>>>> QUIT
>>>>>>>>> as
>>>>>>>>>>> the command to terminate the program. So if that's not hard to
>>>>>>>> achieve,
>>>>>>>>>> and
>>>>>>>>>>> will make users happier, I don't see a reason why we must choose
>>>> one
>>>>>>>>> over
>>>>>>>>>>> the other.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <twalthr@apache.org
>>>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> some feedback regarding the open questions. Maybe we can discuss
>>>>>>>> the
>>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to determine
>> how
>>>>>>>> we
>>>>>>>>>>>> proceed with this in the near future.
>>>>>>>>>>>>
>>>>>>>>>>>> 1) "whether the table environment has the ability to update
>>>>>>>> itself"
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe there was some misunderstanding. I don't think that we
>>>>>>>> should
>>>>>>>>>>>> support
>>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>>>>>>>>>>>> "old")`. Instead I'm proposing to support
>>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
>>>>>>>> execution
>>>>>>>>>>>> mode are read immediately and a subsequent changes to these
>>>>>>>> options
>>>>>>>>>> will
>>>>>>>>>>>> have no effect. We are doing it similar in `new
>>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
>>>>>>>> ConfigOption's
>>>>>>>>>>>> must not be SQL Client specific but can be part of the core
>> table
>>>>>>>>> code
>>>>>>>>>>>> base. Many users would like to get a 100% preconfigured
>>>>>>>> environment
>>>>>>>>>> from
>>>>>>>>>>>> just Configuration. And this is not possible right now. We can
>>>>>>>> solve
>>>>>>>>>>>> both use cases in one change.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
>>>>>>>>>>>>
>>>>>>>>>>>> I remember we had some discussion about this and decided that we
>>>>>>>>> would
>>>>>>>>>>>> like to maintain only one parser. In the end it is "One Flink
>>>> SQL"
>>>>>>>>>> where
>>>>>>>>>>>> commands influence each other also with respect to keywords. It
>>>>>>>>> should
>>>>>>>>>>>> be fine to include the SQL Client commands in the Flink parser.
>>>> Of
>>>>>>>>>>>> cource the table environment would not be able to handle the
>>>>>>>>>> `Operation`
>>>>>>>>>>>> instance that would be the result but we can introduce hooks to
>>>>>>>>> handle
>>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
>>>>>>>>>>>>
>>>>>>>>>>>> Can we skip `table.job.async` in the first version? We should
>>>>>>>> further
>>>>>>>>>>>> discuss whether we introduce a special SQL clause for wrapping
>>>>>>>> async
>>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming
>> queries
>>>>>>>> we
>>>>>>>>>>>> need to be careful and should force users to either "one INSERT
>>>>>>>> INTO"
>>>>>>>>>> or
>>>>>>>>>>>> "one STATEMENT SET".
>>>>>>>>>>>>
>>>>>>>>>>>> 3) 4) "HIVE also uses these commands"
>>>>>>>>>>>>
>>>>>>>>>>>> In general, Hive is not a good reference. Aligning the commands
>>>>>>>> more
>>>>>>>>>>>> with the remaining commands should be our goal. We just had a
>>>>>>>> MODULE
>>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it is
>> true
>>>>>>>>> that
>>>>>>>>>>>> JARs are not part of the catalog which is why I would not use
>>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
>>>>>>>>> language.
>>>>>>>>>>>> Take a look at the Java collection API as another example.
>>>>>>>>>>>>
>>>>>>>>>>>> 6) "Most of the commands should belong to the table environment"
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
>>>>>>>> understand.
>>>>>>>>> It
>>>>>>>>>>>> is good to see that most commends will be available in
>>>>>>>>>> TableEnvironment.
>>>>>>>>>>>> However, I would also support SET and RESET for consistency.
>>>>>>>> Again,
>>>>>>>>>> from
>>>>>>>>>>>> an architectural point of view, if we would allow some kind of
>>>>>>>>>>>> `Operation` hook in table environment, we could check for SQL
>>>>>>>> Client
>>>>>>>>>>>> specific options and forward to regular
>>>>>>>>> `TableConfig.getConfiguration`
>>>>>>>>>>>> otherwise. What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
>>>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will respond some of the questions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) SQL client specific options
>>>>>>>>>>>>>
>>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on where
>>>>>>>> the
>>>>>>>>>>>>> configuration takes effect.
>>>>>>>>>>>>> If it is a table configuration, we should make clear what's the
>>>>>>>>>>> behavior
>>>>>>>>>>>>> when users change
>>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
>>>>>>>>>>>> `sql-client.execution.mode`
>>>>>>>>>>>>> are something special
>>>>>>>>>>>>> that can't be changed after TableEnvironment has been
>>>>>>>> initialized.
>>>>>>>>>> You
>>>>>>>>>>>> can
>>>>>>>>>>>>> see
>>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method to
>>>>>>>>>> override
>>>>>>>>>>>>> configuration after
>>>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, I think it would be better to still use
>>>>>>>>>>> `sql-client.planner`
>>>>>>>>>>>>> and `sql-client.execution.mode`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Execution file
>>>>>>>>>>>>>
>>>>>>>>>>>>> >From my point of view, there is a big difference between
>>>>>>>>>>>>> `sql-client.job.detach` and
>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
>>>>>>>> `sql-client.job.detach`
>>>>>>>>>> will
>>>>>>>>>>>>> affect every single DML statement
>>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I think
>>>>>>>> the
>>>>>>>>>>> single
>>>>>>>>>>>>> DML statement in the interactive
>>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
>>>>>>>>>>>>> tEnv#executeMultiSql.
>>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
>>>>>>>>>>> `table.multi-sql-async`.
>>>>>>>>>>>>> I just find that runtime provides a configuration called
>>>>>>>>>>>>> "execution.attached" [1] which is false by default
>>>>>>>>>>>>> which specifies if the pipeline is submitted in attached or
>>>>>>>>> detached
>>>>>>>>>>>> mode.
>>>>>>>>>>>>> It provides exactly the same
>>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you think
>>>>>>>> about
>>>>>>>>>> using
>>>>>>>>>>>>> this option?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we also want to support this config in TableEnvironment, I
>>>>>>>> think
>>>>>>>>>> it
>>>>>>>>>>>>> should also affect the DML execution
>>>>>>>>>>>>>      of `tEnv#executeSql()`, not only DMLs in
>>>>>>>>> `tEnv#executeMultiSql()`.
>>>>>>>>>>>>> Therefore, the behavior may look like this:
>>>>>>>>>>>>>
>>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
>>>>>>>> by
>>>>>>>>>>>> default
>>>>>>>>>>>>> tableResult.await()   ==> manually block until finish
>>>>>>>>>>>>>
>>>>>>>> tEnv.getConfig().getConfiguration().setString("execution.attached",
>>>>>>>>>>>> "true")
>>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==>
>> sync,
>>>>>>>>>> don't
>>>>>>>>>>>> need
>>>>>>>>>>>>> to wait on the TableResult
>>>>>>>>>>>>> tEnv.executeMultiSql(
>>>>>>>>>>>>> """
>>>>>>>>>>>>> CREATE TABLE ....  ==> always sync
>>>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration above
>>>>>>>>>>>>> SET execution.attached = false;
>>>>>>>>>>>>> INSERT INTO ...  => async
>>>>>>>>>>>>> """)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
>>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two separate
>>>>>>>>>> topics,
>>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
>>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
>>>>>>>> statements.
>>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
>>>>>>>> it to
>>>>>>>>>>> block
>>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi, Timo.
>>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts about
>>>>>>>> your
>>>>>>>>>>>>>> feedback.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the table
>>>>>>>>>>>> environment
>>>>>>>>>>>>>> has the ability to update itself. Let's take a simple program
>>>>>>>> as
>>>>>>>>> an
>>>>>>>>>>>>>> example.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
>>>>>>>> "old");
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> tEnv.executeSql("...");
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we regard this option as a table option, users don't have
>> to
>>>>>>>>>> create
>>>>>>>>>>>>>> another table environment manually. In that case, tEnv needs
>> to
>>>>>>>>>> check
>>>>>>>>>>>>>> whether the current mode and planner are the same as before
>>>>>>>> when
>>>>>>>>>>>> executeSql
>>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the table
>>>>>>>>>> environment,
>>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but set
>>>>>>>> old
>>>>>>>>>>>> planner
>>>>>>>>>>>>>> and batch mode. But when we make this option as a sql client
>>>>>>>>> option,
>>>>>>>>>>>> users
>>>>>>>>>>>>>> only use the SET command to change the setting. We can rebuild
>>>>>>>> a
>>>>>>>>> new
>>>>>>>>>>>> table
>>>>>>>>>>>>>> environment when set successes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the implementation
>>>>>>>>> before
>>>>>>>>>>>>>> continuing this topic. In the sql client, we will maintain two
>>>>>>>>>>> parsers.
>>>>>>>>>>>> The
>>>>>>>>>>>>>> first parser(client parser) will only match the sql client
>>>>>>>>> commands.
>>>>>>>>>>> If
>>>>>>>>>>>> the
>>>>>>>>>>>>>> client parser can't parse the statement, we will leverage the
>>>>>>>>> power
>>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>>> table environment to execute. According to our blueprint,
>>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
>>>>>>>>> Therefore,
>>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this
>> FLIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But if we need to introduce the
>>>>>>>> `TableEnvironment.executeMultiSql`
>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>> future, I think it's OK to use the option
>>>>>>>> `table.multi-sql-async`
>>>>>>>>>>> rather
>>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the name is
>>>>>>>> not
>>>>>>>>>>>> suitable
>>>>>>>>>>>>>> because the name is confusing for others. When setting the
>>>>>>>> option
>>>>>>>>>>>> false, we
>>>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
>>>>>>>>> statement,
>>>>>>>>>>> not
>>>>>>>>>>>> DDL
>>>>>>>>>>>>>> or others(other sql statements are always executed
>>>>>>>> synchronously).
>>>>>>>>>> So
>>>>>>>>>>>> how
>>>>>>>>>>>>>> about `table.job.async`? It only works for the sql-client and
>>>>>>>> the
>>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
>>>>>>>> environment
>>>>>>>>>>> will
>>>>>>>>>>>>>> return the result until the job finishes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR and
>>>>>>>>> LIST
>>>>>>>>>>> JAR
>>>>>>>>>>>>>> because HIVE also uses these commands to add the jar into the
>>>>>>>>>>> classpath
>>>>>>>>>>>> or
>>>>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce our
>>>>>>>> work
>>>>>>>>> for
>>>>>>>>>>>> hive
>>>>>>>>>>>>>> compatibility.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
>>>>>>>>>> maintained
>>>>>>>>>>> by
>>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
>>>>>>>>> grammar,
>>>>>>>>>>>> maybe
>>>>>>>>>>>>>> we should use
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
>>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
>>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
>>>>>>>> consistent.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to the
>>>>>>>>> table
>>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
>>>>>>>>>> identify
>>>>>>>>>>>> which
>>>>>>>>>>>>>> commands should belong to the sql client and which commands
>>>>>>>> should
>>>>>>>>>>>> belong
>>>>>>>>>>>>>> to the table environment. I also add a new section about
>>>>>>>>>>> implementation
>>>>>>>>>>>>>> details in the FLIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
>>>>>>>> SQL
>>>>>>>>>>> Client
>>>>>>>>>>>> a
>>>>>>>>>>>>>>> very good update and make it production ready.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is some feedback from my side:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) SQL client specific options
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't think that `sql-client.planner` and
>>>>>>>>>>> `sql-client.execution.mode`
>>>>>>>>>>>>>>> are SQL Client specific. Similar to
>>>>>>>> `StreamExecutionEnvironment`
>>>>>>>>>> and
>>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added recently, we
>>>>>>>>>> should
>>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we offer
>>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
>>>>>>>>> `table.planner`
>>>>>>>>>>> and
>>>>>>>>>>>>>>> `table.execution-mode` to
>>>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Execution file
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1] including
>>>>>>>> the
>>>>>>>>>>>> mailing
>>>>>>>>>>>>>>> list thread at that time? Could you further elaborate how the
>>>>>>>>>>>>>>> multi-statement execution should work for a unified
>>>>>>>>> batch/streaming
>>>>>>>>>>>>>>> story? According to our past discussions, each line in an
>>>>>>>>> execution
>>>>>>>>>>>> file
>>>>>>>>>>>>>>> should be executed blocking which means a streaming query
>>>>>>>> needs a
>>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
>>>>>>>> correct?
>>>>>>>>>> We
>>>>>>>>>>>>>>> should also offer this functionality in
>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
>>>>>>>>>> `sql-client.job.detach`
>>>>>>>>>>>> is
>>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could also be
>> a
>>>>>>>>>>> general
>>>>>>>>>>>>>>> `table.multi-sql-async` option?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3) DELETE JAR
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
>>>>>>>> like
>>>>>>>>>> one
>>>>>>>>>>>> is
>>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4) LIST JAR
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
>>>>>>>> such
>>>>>>>>> as
>>>>>>>>>>>> `SHOW
>>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We should keep the details in sync with
>>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
>> confusion
>>>>>>>>>> about
>>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
>>>>>>>>> `ESTIMATED_COST`
>>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a reason
>>>>>>>> why
>>>>>>>>> to
>>>>>>>>>>>> call
>>>>>>>>>>>>>>> it that way.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 6) Implementation details
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It would be nice to understand how we plan to implement the
>>>>>>>> given
>>>>>>>>>>>>>>> features. Most of the commands and config options should go
>>>>>>>> into
>>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
>>>>>>>> users
>>>>>>>>>>> have a
>>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
>>>>>>>> provide a
>>>>>>>>>>>> similar
>>>>>>>>>>>>>>> user experience in notebooks or interactive programs than the
>>>>>>>> SQL
>>>>>>>>>>>> Client.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather
>> than
>>>>>>>>>>> `UNSET`.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi, Jingsong.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
>>>>>>>> `RESET`
>>>>>>>>> is
>>>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal just
>>>>>>>>>> extends
>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
>>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
>>>>>>>> default
>>>>>>>>>>>>>>> value[1].
>>>>>>>>>>>>>>>>> I think it is more friendly for batch users.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
>> 下午1:56写道：
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too outdated.
>>>>>>>> +1
>>>>>>>>> for
>>>>>>>>>>>>>>>>>> improving it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Jingsong
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>>>>>>>> lirui.fudan@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes look
>>>>>>>>> good
>>>>>>>>>> to
>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>>>>>>>>>> fskmine@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi, Rui.
>>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The main changes:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
>>>>>>>> type.
>>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result of
>>>>>>>>>> queries
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> debug
>>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
>> convenient
>>>>>>>>>>>> comparing
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
>>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
>>>>>>>> mode.
>>>>>>>>>> Users
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>> this option false and the client will process the next
>>>>>>>> job
>>>>>>>>>> until
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this option
>> is
>>>>>>>>>> false,
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>> means the client will execute the next job when the
>>>>>>>> current
>>>>>>>>>> job
>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> submitted.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
>>>>>>>> have
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
>>>>>>>>>> example,
>>>>>>>>>>> if
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what happens if
>>>>>>>> the
>>>>>>>>>> file
>>>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
>>>>>>>> them
>>>>>>>>> as
>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> statement
>>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
>>>>>>>> STATEMENT
>>>>>>>>>> SET
>>>>>>>>>>> in
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit
>> the
>>>>>>>>> two
>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>>>>>>>>>>> fskmine@gmail.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Rui,
>>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
>>>>>>>> suggestions.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
>>>>>>>> the
>>>>>>>>> set
>>>>>>>>>>>>>>>>>>> command. In
>>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value
>> into
>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
>>>>>>>> table
>>>>>>>>>>>> config.
>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>> hive
>>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table config,
>>>>>>>> users
>>>>>>>>>> are
>>>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>>>> to set
>>>>>>>>>>>>>>>>>>>>>> the hive-related settings.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit the
>>>>>>>> job
>>>>>>>>>> and
>>>>>>>>>>>>>>> exit.
>>>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job by
>>>>>>>>>>>>>> themselves,
>>>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
>>>>>>>> case,
>>>>>>>>>>>> queries
>>>>>>>>>>>>>>>>>>> are used
>>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
>>>>>>>>>>> interactive
>>>>>>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
>> 下午3:18写道：
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
>>>>>>>> think
>>>>>>>>> it
>>>>>>>>>>>>>>> covers a
>>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
>> improve
>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> usability of our
>>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
>>>>>>>>>> configurations
>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
>>>>>>>> configurations
>>>>>>>>>> and
>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> don't have
>>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in
>> SQL
>>>>>>>>>>> Client.
>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
>>>>>>>> using
>>>>>>>>>> hive
>>>>>>>>>>>>>>>>>>> connector [1].
>>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
>>>>>>>> files
>>>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option but
>>>>>>>>> allows
>>>>>>>>>>>>>>> queries
>>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query and
>>>>>>>>>> redirect
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like to
>>>>>>>> do
>>>>>>>>>> the
>>>>>>>>>>>>>> same,
>>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [1]
>> https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
>>>>>>>> additional
>>>>>>>>>>>>>>>>>>> suggestions:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext
>> to
>>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
>>>>>>>> sql.
>>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
>>>>>>>>>> collect
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>>>>>>>>>>>>>>>>            which may have memory issues in JM or
>> Local
>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>>>> big
>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>>>>> result.
>>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>>>>>>>>>>>>>>>>            We may change to use SelectTableSink,
>> which
>>>>>>>> is
>>>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
>>>>>>>> is in
>>>>>>>>>>>>>> FLIP-91.
>>>>>>>>>>>>>>>>>>> Seems
>>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long
>> time.
>>>>>>>>>>>>>>>>>>>>>>>>            Provide a long running service out of the
>>>>>>>> box to
>>>>>>>>>>>>>>> facilitate
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> sql
>>>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
>>>>>>>>> 下午8:54写道：
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
>>>>>>>>> FLIP-163:SQL
>>>>>>>>>>>>>> Client
>>>>>>>>>>>>>>>>>>>>>>>>> Improvements.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of
>> the
>>>>>>>>> sql
>>>>>>>>>>>>>> client.
>>>>>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table proposed
>>>>>>>> by
>>>>>>>>>>>> FLIP-95.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
>>>>>>>>> initialize
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
>>>>>>>>>> parameter;
>>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
>>>>>>>> FLIP-163[1].
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> *With kind regards
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy
>> of
>>>>>>>>>>> Science
>>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
>> liuyang0704@gmail.com
>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best, Jingsong Lee
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best regards!
>>>>>>>>>>> Rui Li
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi Timo,

Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
Because it makes submitting streaming jobs very verbose, every INSERT INTO
and STATEMENT SET must be wrapped in the ASYNC clause which is
not user-friendly and not backward-compatible.

I agree we will have unified behavior but this is at the cost of hurting
our main users.
I'm worried that end users can't understand the technical decision, and
they would
feel streaming is harder to use.

If we want to have an unified behavior, and let users decide what's the
desirable behavior, I prefer to have a config option. A Flink cluster can
be set to async, then
users don't need to wrap every DML in an ASYNC clause. This is the least
intrusive
way to the users.


Personally, I'm fine with following options in priority:

1) sync for batch DML and async for streaming DML
==> only breaks batch behavior, but makes both happy

2) async for both batch and streaming DML, and can be set to sync via a
configuration.
==> compatible, and provides flexible configurable behavior

3) sync for both batch and streaming DML, and can be
    set to async via a configuration.
==> +0 for this, because it breaks all the compatibility, esp. our main
users.

Best,
Jark

On Mon, 8 Feb 2021 at 17:34, Timo Walther <tw...@apache.org> wrote:

> Hi Jark, Hi Rui,
>
> 1) How should we execute statements in CLI and in file? Should there be
> a difference?
> So it seems we have consensus here with unified bahavior. Even though
> this means we are breaking existing batch INSERT INTOs that were
> asynchronous before.
>
> 2) Should we have different behavior for batch and streaming?
> I think also batch users prefer async behavior because usually even
> those pipelines take some time to execute. But we need should stick to
> standard SQL blocking semantics.
>
> What are your opinions on making async explicit in SQL via `BEGIN ASYNC;
> ... END;`? This would allow us to really have unified semantics because
> batch and streaming would behave the same?
>
> Regards,
> Timo
>
>
> On 07.02.21 04:46, Rui Li wrote:
> > Hi Timo,
> >
> > I agree with Jark that we should provide consistent experience regarding
> > SQL CLI and files. Some systems even allow users to execute SQL files in
> > the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that
> in
> > the future, it's a little tricky to decide whether that should be treated
> > as CLI or file.
> >
> > I actually prefer a config option and let users decide what's the
> > desirable behavior. But if we have agreed not to use options, I'm also
> fine
> > with Alternative #1.
> >
> > On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
> >
> >> Hi Timo,
> >>
> >> 1) How should we execute statements in CLI and in file? Should there be
> a
> >> difference?
> >> I do think we should unify the behavior of CLI and SQL files. SQL files
> can
> >> be thought of as a shortcut of
> >> "start CLI" => "copy content of SQL files" => "past content in CLI".
> >> Actually, we already did this in kafka_e2e.sql [1].
> >> I think it's hard for users to understand why SQL files behave
> differently
> >> from CLI, all the other systems don't have such a difference.
> >>
> >> If we distinguish SQL files and CLI, should there be a difference in
> JDBC
> >> driver and UI platform?
> >> Personally, they all should have consistent behavior.
> >>
> >> 2) Should we have different behavior for batch and streaming?
> >> I think we all agree streaming users prefer async execution, otherwise
> it's
> >> weird and difficult to use if the
> >> submit script or CLI never exists. On the other hand, batch SQL users
> are
> >> used to SQL statements being
> >> executed blockly.
> >>
> >> Either unified async execution or unified sync execution, will hurt one
> >> side of the streaming
> >> batch users. In order to make both sides happy, I think we can have
> >> different behavior for batch and streaming.
> >> There are many essential differences between batch and stream systems, I
> >> think it's normal to have some
> >> different behaviors, and the behavior doesn't break the unified batch
> >> stream semantics.
> >>
> >>
> >> Thus, I'm +1 to Alternative 1:
> >> We consider batch/streaming mode and block for batch INSERT INTO and
> async
> >> for streaming INSERT INTO/STATEMENT SET.
> >> And this behavior is consistent across CLI and files.
> >>
> >> Best,
> >> Jark
> >>
> >> [1]:
> >>
> >>
> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
> >>
> >> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org> wrote:
> >>
> >>> Hi Jark,
> >>>
> >>> thanks for the summary. I hope we can also find a good long-term
> >>> solution on the async/sync execution behavior topic.
> >>>
> >>> It should be discussed in a bigger round because it is (similar to the
> >>> time function discussion) related to batch-streaming unification where
> >>> we should stick to the SQL standard to some degree but also need to
> come
> >>> up with good streaming semantics.
> >>>
> >>> Let me summarize the problem again to hear opinions:
> >>>
> >>> - Batch SQL users are used to execute SQL files sequentially (from top
> >>> to bottom).
> >>> - Batch SQL users are used to SQL statements being executed blocking.
> >>> One after the other. Esp. when moving around data with INSERT INTO.
> >>> - Streaming users prefer async execution because unbounded stream are
> >>> more frequent than bounded streams.
> >>> - We decided to make Flink Table API is async because in a programming
> >>> language it is easy to call `.await()` on the result to make it
> blocking.
> >>> - INSERT INTO statements in the current SQL Client implementation are
> >>> always submitted asynchrounous.
> >>> - Other client's such as Ververica platform allow only one INSERT INTO
> >>> or a STATEMENT SET at the end of a file that will run asynchrounously.
> >>>
> >>> Questions:
> >>>
> >>> - How should we execute statements in CLI and in file? Should there be
> a
> >>> difference?
> >>> - Should we have different behavior for batch and streaming?
> >>> - Shall we solve parts with a config option or is it better to make it
> >>> explicit in the SQL job definition because it influences the semantics
> >>> of multiple INSERT INTOs?
> >>>
> >>> Let me summarize my opinion at the moment:
> >>>
> >>> - SQL files should always be executed blocking by default. Because they
> >>> could potentially contain a long list of INSERT INTO statements. This
> >>> would be SQL standard compliant.
> >>> - If we allow async execution, we should make this explicit in the SQL
> >>> file via `BEGIN ASYNC; ... END;`.
> >>> - In the CLI, we always execute async to maintain the old behavior. We
> >>> can also assume that people are only using the CLI to fire statements
> >>> and close the CLI afterwards.
> >>>
> >>> Alternative 1:
> >>> - We consider batch/streaming mode and block for batch INSERT INTO and
> >>> async for streaming INSERT INTO/STATEMENT SET
> >>>
> >>> What do others think?
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>>
> >>>
> >>> On 05.02.21 04:03, Jark Wu wrote:
> >>>> Hi all,
> >>>>
> >>>> After an offline discussion with Timo and Kurt, we have reached some
> >>>> consensus.
> >>>> Please correct me if I am wrong or missed anything.
> >>>>
> >>>> 1) We will introduce "table.planner" and "table.execution-mode"
> instead
> >>> of
> >>>> "sql-client" prefix,
> >>>> and add `TableEnvironment.create(Configuration)` interface. These 2
> >>> options
> >>>> can only be used
> >>>> for tableEnv initialization. If used after initialization, Flink
> should
> >>>> throw an exception. We may can
> >>>> support dynamic switch the planner in the future.
> >>>>
> >>>> 2) We will have only one parser,
> >>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a string
> >>>> statement, and returns a list of Operation. It will first use regex to
> >>>> match some special statement,
> >>>>    e.g. SET, ADD JAR, others will be delegated to the underlying
> Calcite
> >>>> parser. The Parser can
> >>>> have different implementations, e.g. HiveParser.
> >>>>
> >>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect.
> But
> >>> we
> >>>> can allow
> >>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
> >>>>
> >>>> 4) We don't have a conclusion for async/sync execution behavior yet.
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>>
> >>>>
> >>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
> >>>>
> >>>>> Hi Ingo,
> >>>>>
> >>>>> Since we have supported the WITH syntax and SET command since v1.9
> >>> [1][2],
> >>>>> and
> >>>>> we have never received such complaints, I think it's fine for such
> >>>>> differences.
> >>>>>
> >>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
> >> requires
> >>>>> string literal keys[3],
> >>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>> [1]:
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
> >>>>> [2]:
> >>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
> >>>>> [3]:
> >>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> >>>>> [4]:
> >>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
> >>>>> (search "set mapred.reduce.tasks=32")
> >>>>>
> >>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> regarding the (un-)quoted question, compatibility is of course an
> >>>>>> important
> >>>>>> argument, but in terms of consistency I'd find it a bit surprising
> >> that
> >>>>>> WITH handles it differently than SET, and I wonder if that could
> >> cause
> >>>>>> friction for developers when writing their SQL.
> >>>>>>
> >>>>>>
> >>>>>> Regards
> >>>>>> Ingo
> >>>>>>
> >>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> Regarding "One Parser", I think it's not possible for now because
> >>>>>> Calcite
> >>>>>>> parser can't parse
> >>>>>>> special characters (e.g. "-") unless quoting them as string
> >> literals.
> >>>>>>> That's why the WITH option
> >>>>>>> key are string literals not identifiers.
> >>>>>>>
> >>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
> >>>>>>> /local/my-home/test.jar
> >>>>>>> have the same
> >>>>>>> problems. That's why we propose two parser, one splits lines into
> >>>>>> multiple
> >>>>>>> statements and match special
> >>>>>>> command through regex which is light-weight, and delegate other
> >>>>>> statements
> >>>>>>> to the other parser which is Calcite parser.
> >>>>>>>
> >>>>>>> Note: we should stick on the unquoted SET
> >>> table.exec.mini-batch.enabled
> >>>>>> =
> >>>>>>> true syntax,
> >>>>>>> both for backward-compatibility and easy-to-use, and all the other
> >>>>>> systems
> >>>>>>> don't have quotes on the key.
> >>>>>>>
> >>>>>>>
> >>>>>>> Regarding "table.planner" vs "sql-client.planner",
> >>>>>>> if we want to use "table.planner", I think we should explain
> clearly
> >>>>>> what's
> >>>>>>> the scope it can be used in documentation.
> >>>>>>> Otherwise, there will be users complaining why the planner doesn't
> >>>>>> change
> >>>>>>> when setting the configuration on TableEnv.
> >>>>>>> Would be better throwing an exception to indicate users it's now
> >>>>>> allowed to
> >>>>>>> change planner after TableEnv is initialized.
> >>>>>>> However, it seems not easy to implement.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
> >> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> Regarding "table.planner" and "table.execution-mode"
> >>>>>>>> If we define that those two options are just used to initialize
> the
> >>>>>>>> TableEnvironment, +1 for introducing table options instead of
> >>>>>> sql-client
> >>>>>>>> options.
> >>>>>>>>
> >>>>>>>> Regarding "the sql client, we will maintain two parsers", I want
> to
> >>>>>> give
> >>>>>>>> more inputs:
> >>>>>>>> We want to introduce sql-gateway into the Flink project (see
> >> FLIP-24
> >>> &
> >>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI
> >> client
> >>>>>> and
> >>>>>>>> the gateway service will communicate through Rest API. The " ADD
> >> JAR
> >>>>>>>> /local/path/jar " will be executed in the CLI client machine. So
> >> when
> >>>>>> we
> >>>>>>>> submit a sql file which contains multiple statements, the CLI
> >> client
> >>>>>>> needs
> >>>>>>>> to pick out the "ADD JAR" line, and also statements need to be
> >>>>>> submitted
> >>>>>>> or
> >>>>>>>> executed one by one to make sure the result is correct. The sql
> >> file
> >>>>>> may
> >>>>>>> be
> >>>>>>>> look like:
> >>>>>>>>
> >>>>>>>> SET xxx=yyy;
> >>>>>>>> create table my_table ...;
> >>>>>>>> create table my_sink ...;
> >>>>>>>> ADD JAR /local/path/jar1;
> >>>>>>>> create function my_udf as com....MyUdf;
> >>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>>>>> REMOVE JAR /local/path/jar1;
> >>>>>>>> drop function my_udf;
> >>>>>>>> ADD JAR /local/path/jar2;
> >>>>>>>> create function my_udf as com....MyUdf2;
> >>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>>>>>
> >>>>>>>> The lines need to be splitted into multiple statements first in
> the
> >>>>>> CLI
> >>>>>>>> client, there are two approaches:
> >>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser splits
> >>> the
> >>>>>>>> lines and tells which lines are "ADD JAR".
> >>>>>>>> pro: there is only one parser
> >>>>>>>> cons: It's a little heavy that the CLI client depends on the
> >>>>>> sql-parser,
> >>>>>>>> because the CLI client is just a simple tool which receives the
> >> user
> >>>>>>>> commands and displays the result. The non "ADD JAR" command will
> be
> >>>>>>> parsed
> >>>>>>>> twice.
> >>>>>>>>
> >>>>>>>> 2. The CLI client splits the lines into multiple statements and
> >> finds
> >>>>>> the
> >>>>>>>> ADD JAR command through regex matching.
> >>>>>>>> pro: The CLI client is very light-weight.
> >>>>>>>> cons: there are two parsers.
> >>>>>>>>
> >>>>>>>> (personally, I prefer the second option)
> >>>>>>>>
> >>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
> >>>>>>>> For default dialect, we support SHOW JARS, but if we switch to
> hive
> >>>>>>>> dialect, LIST JARS is also supported.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Godfrey
> >>>>>>>>
> >>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> >>>>>>>>
> >>>>>>>>> Hi guys,
> >>>>>>>>>
> >>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with
> >> other
> >>>>>>>>> commands than LIST JARS. I don't have a strong opinion about
> >> REMOVE
> >>>>>> vs
> >>>>>>>>> DELETE though.
> >>>>>>>>>
> >>>>>>>>> While flink doesn't need to follow hive syntax, as far as I know,
> >>>>>> most
> >>>>>>>>> users who are requesting these features are previously hive
> users.
> >>>>>> So I
> >>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
> >> REMOVE/DELETE
> >>>>>>> JARS
> >>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT and
> >>>>>> QUIT
> >>>>>>> as
> >>>>>>>>> the command to terminate the program. So if that's not hard to
> >>>>>> achieve,
> >>>>>>>> and
> >>>>>>>>> will make users happier, I don't see a reason why we must choose
> >> one
> >>>>>>> over
> >>>>>>>>> the other.
> >>>>>>>>>
> >>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <twalthr@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> some feedback regarding the open questions. Maybe we can discuss
> >>>>>> the
> >>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to determine
> how
> >>>>>> we
> >>>>>>>>>> proceed with this in the near future.
> >>>>>>>>>>
> >>>>>>>>>> 1) "whether the table environment has the ability to update
> >>>>>> itself"
> >>>>>>>>>>
> >>>>>>>>>> Maybe there was some misunderstanding. I don't think that we
> >>>>>> should
> >>>>>>>>>> support
> >>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
> >>>>>>>>>> "old")`. Instead I'm proposing to support
> >>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
> >>>>>> execution
> >>>>>>>>>> mode are read immediately and a subsequent changes to these
> >>>>>> options
> >>>>>>>> will
> >>>>>>>>>> have no effect. We are doing it similar in `new
> >>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
> >>>>>> ConfigOption's
> >>>>>>>>>> must not be SQL Client specific but can be part of the core
> table
> >>>>>>> code
> >>>>>>>>>> base. Many users would like to get a 100% preconfigured
> >>>>>> environment
> >>>>>>>> from
> >>>>>>>>>> just Configuration. And this is not possible right now. We can
> >>>>>> solve
> >>>>>>>>>> both use cases in one change.
> >>>>>>>>>>
> >>>>>>>>>> 2) "the sql client, we will maintain two parsers"
> >>>>>>>>>>
> >>>>>>>>>> I remember we had some discussion about this and decided that we
> >>>>>>> would
> >>>>>>>>>> like to maintain only one parser. In the end it is "One Flink
> >> SQL"
> >>>>>>>> where
> >>>>>>>>>> commands influence each other also with respect to keywords. It
> >>>>>>> should
> >>>>>>>>>> be fine to include the SQL Client commands in the Flink parser.
> >> Of
> >>>>>>>>>> cource the table environment would not be able to handle the
> >>>>>>>> `Operation`
> >>>>>>>>>> instance that would be the result but we can introduce hooks to
> >>>>>>> handle
> >>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
> >>>>>>>>>>
> >>>>>>>>>> Can we skip `table.job.async` in the first version? We should
> >>>>>> further
> >>>>>>>>>> discuss whether we introduce a special SQL clause for wrapping
> >>>>>> async
> >>>>>>>>>> behavior or if we use a config option? Esp. for streaming
> queries
> >>>>>> we
> >>>>>>>>>> need to be careful and should force users to either "one INSERT
> >>>>>> INTO"
> >>>>>>>> or
> >>>>>>>>>> "one STATEMENT SET".
> >>>>>>>>>>
> >>>>>>>>>> 3) 4) "HIVE also uses these commands"
> >>>>>>>>>>
> >>>>>>>>>> In general, Hive is not a good reference. Aligning the commands
> >>>>>> more
> >>>>>>>>>> with the remaining commands should be our goal. We just had a
> >>>>>> MODULE
> >>>>>>>>>> discussion where we selected SHOW instead of LIST. But it is
> true
> >>>>>>> that
> >>>>>>>>>> JARs are not part of the catalog which is why I would not use
> >>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> >>>>>>> language.
> >>>>>>>>>> Take a look at the Java collection API as another example.
> >>>>>>>>>>
> >>>>>>>>>> 6) "Most of the commands should belong to the table environment"
> >>>>>>>>>>
> >>>>>>>>>> Thanks for updating the FLIP this makes things easier to
> >>>>>> understand.
> >>>>>>> It
> >>>>>>>>>> is good to see that most commends will be available in
> >>>>>>>> TableEnvironment.
> >>>>>>>>>> However, I would also support SET and RESET for consistency.
> >>>>>> Again,
> >>>>>>>> from
> >>>>>>>>>> an architectural point of view, if we would allow some kind of
> >>>>>>>>>> `Operation` hook in table environment, we could check for SQL
> >>>>>> Client
> >>>>>>>>>> specific options and forward to regular
> >>>>>>> `TableConfig.getConfiguration`
> >>>>>>>>>> otherwise. What do you think?
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Timo
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
> >>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>
> >>>>>>>>>>> I will respond some of the questions:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) SQL client specific options
> >>>>>>>>>>>
> >>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on where
> >>>>>> the
> >>>>>>>>>>> configuration takes effect.
> >>>>>>>>>>> If it is a table configuration, we should make clear what's the
> >>>>>>>>> behavior
> >>>>>>>>>>> when users change
> >>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
> >>>>>>>>>>>
> >>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
> >>>>>>>>>> `sql-client.execution.mode`
> >>>>>>>>>>> are something special
> >>>>>>>>>>> that can't be changed after TableEnvironment has been
> >>>>>> initialized.
> >>>>>>>> You
> >>>>>>>>>> can
> >>>>>>>>>>> see
> >>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method to
> >>>>>>>> override
> >>>>>>>>>>> configuration after
> >>>>>>>>>>> StreamExecutionEnvironment has been initialized.
> >>>>>>>>>>>
> >>>>>>>>>>> Therefore, I think it would be better to still use
> >>>>>>>>> `sql-client.planner`
> >>>>>>>>>>> and `sql-client.execution.mode`.
> >>>>>>>>>>>
> >>>>>>>>>>> 2) Execution file
> >>>>>>>>>>>
> >>>>>>>>>>> >From my point of view, there is a big difference between
> >>>>>>>>>>> `sql-client.job.detach` and
> >>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
> >>>>>> `sql-client.job.detach`
> >>>>>>>> will
> >>>>>>>>>>> affect every single DML statement
> >>>>>>>>>>> in the terminal, not only the statements in SQL files. I think
> >>>>>> the
> >>>>>>>>> single
> >>>>>>>>>>> DML statement in the interactive
> >>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
> >>>>>>>>>>> tEnv#executeMultiSql.
> >>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
> >>>>>>>>> `table.multi-sql-async`.
> >>>>>>>>>>> I just find that runtime provides a configuration called
> >>>>>>>>>>> "execution.attached" [1] which is false by default
> >>>>>>>>>>> which specifies if the pipeline is submitted in attached or
> >>>>>>> detached
> >>>>>>>>>> mode.
> >>>>>>>>>>> It provides exactly the same
> >>>>>>>>>>> functionality of `sql-client.job.detach`. What do you think
> >>>>>> about
> >>>>>>>> using
> >>>>>>>>>>> this option?
> >>>>>>>>>>>
> >>>>>>>>>>> If we also want to support this config in TableEnvironment, I
> >>>>>> think
> >>>>>>>> it
> >>>>>>>>>>> should also affect the DML execution
> >>>>>>>>>>>     of `tEnv#executeSql()`, not only DMLs in
> >>>>>>> `tEnv#executeMultiSql()`.
> >>>>>>>>>>> Therefore, the behavior may look like this:
> >>>>>>>>>>>
> >>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
> >>>>>> by
> >>>>>>>>>> default
> >>>>>>>>>>> tableResult.await()   ==> manually block until finish
> >>>>>>>>>>>
> >>>>>> tEnv.getConfig().getConfiguration().setString("execution.attached",
> >>>>>>>>>> "true")
> >>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==>
> sync,
> >>>>>>>> don't
> >>>>>>>>>> need
> >>>>>>>>>>> to wait on the TableResult
> >>>>>>>>>>> tEnv.executeMultiSql(
> >>>>>>>>>>> """
> >>>>>>>>>>> CREATE TABLE ....  ==> always sync
> >>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration above
> >>>>>>>>>>> SET execution.attached = false;
> >>>>>>>>>>> INSERT INTO ...  => async
> >>>>>>>>>>> """)
> >>>>>>>>>>>
> >>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
> >>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two separate
> >>>>>>>> topics,
> >>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
> >>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
> >>>>>> statements.
> >>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
> >>>>>> it to
> >>>>>>>>> block
> >>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Jark
> >>>>>>>>>>>
> >>>>>>>>>>> [1]:
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi, Timo.
> >>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts about
> >>>>>> your
> >>>>>>>>>>>> feedback.
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the table
> >>>>>>>>>> environment
> >>>>>>>>>>>> has the ability to update itself. Let's take a simple program
> >>>>>> as
> >>>>>>> an
> >>>>>>>>>>>> example.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> ```
> >>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
> >>>>>>>>>>>>
> >>>>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
> >>>>>> "old");
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> tEnv.executeSql("...");
> >>>>>>>>>>>>
> >>>>>>>>>>>> ```
> >>>>>>>>>>>>
> >>>>>>>>>>>> If we regard this option as a table option, users don't have
> to
> >>>>>>>> create
> >>>>>>>>>>>> another table environment manually. In that case, tEnv needs
> to
> >>>>>>>> check
> >>>>>>>>>>>> whether the current mode and planner are the same as before
> >>>>>> when
> >>>>>>>>>> executeSql
> >>>>>>>>>>>> or explainSql. I don't think it's easy work for the table
> >>>>>>>> environment,
> >>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but set
> >>>>>> old
> >>>>>>>>>> planner
> >>>>>>>>>>>> and batch mode. But when we make this option as a sql client
> >>>>>>> option,
> >>>>>>>>>> users
> >>>>>>>>>>>> only use the SET command to change the setting. We can rebuild
> >>>>>> a
> >>>>>>> new
> >>>>>>>>>> table
> >>>>>>>>>>>> environment when set successes.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Regarding #2*: I think we need to discuss the implementation
> >>>>>>> before
> >>>>>>>>>>>> continuing this topic. In the sql client, we will maintain two
> >>>>>>>>> parsers.
> >>>>>>>>>> The
> >>>>>>>>>>>> first parser(client parser) will only match the sql client
> >>>>>>> commands.
> >>>>>>>>> If
> >>>>>>>>>> the
> >>>>>>>>>>>> client parser can't parse the statement, we will leverage the
> >>>>>>> power
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>> table environment to execute. According to our blueprint,
> >>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
> >>>>>>> Therefore,
> >>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this
> FLIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But if we need to introduce the
> >>>>>> `TableEnvironment.executeMultiSql`
> >>>>>>>> in
> >>>>>>>>>> the
> >>>>>>>>>>>> future, I think it's OK to use the option
> >>>>>> `table.multi-sql-async`
> >>>>>>>>> rather
> >>>>>>>>>>>> than option `sql-client.job.detach`. But we think the name is
> >>>>>> not
> >>>>>>>>>> suitable
> >>>>>>>>>>>> because the name is confusing for others. When setting the
> >>>>>> option
> >>>>>>>>>> false, we
> >>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
> >>>>>>> statement,
> >>>>>>>>> not
> >>>>>>>>>> DDL
> >>>>>>>>>>>> or others(other sql statements are always executed
> >>>>>> synchronously).
> >>>>>>>> So
> >>>>>>>>>> how
> >>>>>>>>>>>> about `table.job.async`? It only works for the sql-client and
> >>>>>> the
> >>>>>>>>>>>> executeMultiSql. If we set this value false, the table
> >>>>>> environment
> >>>>>>>>> will
> >>>>>>>>>>>> return the result until the job finishes.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR and
> >>>>>>> LIST
> >>>>>>>>> JAR
> >>>>>>>>>>>> because HIVE also uses these commands to add the jar into the
> >>>>>>>>> classpath
> >>>>>>>>>> or
> >>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce our
> >>>>>> work
> >>>>>>> for
> >>>>>>>>>> hive
> >>>>>>>>>>>> compatibility.
> >>>>>>>>>>>>
> >>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
> >>>>>>>> maintained
> >>>>>>>>> by
> >>>>>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
> >>>>>>> grammar,
> >>>>>>>>>> maybe
> >>>>>>>>>>>> we should use
> >>>>>>>>>>>>
> >>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
> >>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
> >>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
> >>>>>> consistent.
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to the
> >>>>>>> table
> >>>>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
> >>>>>>>> identify
> >>>>>>>>>> which
> >>>>>>>>>>>> commands should belong to the sql client and which commands
> >>>>>> should
> >>>>>>>>>> belong
> >>>>>>>>>>>> to the table environment. I also add a new section about
> >>>>>>>>> implementation
> >>>>>>>>>>>> details in the FLIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>
> >>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
> >>>>>> SQL
> >>>>>>>>> Client
> >>>>>>>>>> a
> >>>>>>>>>>>>> very good update and make it production ready.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here is some feedback from my side:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) SQL client specific options
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I don't think that `sql-client.planner` and
> >>>>>>>>> `sql-client.execution.mode`
> >>>>>>>>>>>>> are SQL Client specific. Similar to
> >>>>>> `StreamExecutionEnvironment`
> >>>>>>>> and
> >>>>>>>>>>>>> `ExecutionConfig#configure` that have been added recently, we
> >>>>>>>> should
> >>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we offer
> >>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
> >>>>>>> `table.planner`
> >>>>>>>>> and
> >>>>>>>>>>>>> `table.execution-mode` to
> >>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2) Execution file
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1] including
> >>>>>> the
> >>>>>>>>>> mailing
> >>>>>>>>>>>>> list thread at that time? Could you further elaborate how the
> >>>>>>>>>>>>> multi-statement execution should work for a unified
> >>>>>>> batch/streaming
> >>>>>>>>>>>>> story? According to our past discussions, each line in an
> >>>>>>> execution
> >>>>>>>>>> file
> >>>>>>>>>>>>> should be executed blocking which means a streaming query
> >>>>>> needs a
> >>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
> >>>>>> correct?
> >>>>>>>> We
> >>>>>>>>>>>>> should also offer this functionality in
> >>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
> >>>>>>>> `sql-client.job.detach`
> >>>>>>>>>> is
> >>>>>>>>>>>>> SQL Client specific needs to be determined, it could also be
> a
> >>>>>>>>> general
> >>>>>>>>>>>>> `table.multi-sql-async` option?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 3) DELETE JAR
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
> >>>>>> like
> >>>>>>>> one
> >>>>>>>>>> is
> >>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 4) LIST JAR
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
> >>>>>> such
> >>>>>>> as
> >>>>>>>>>> `SHOW
> >>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We should keep the details in sync with
> >>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid
> confusion
> >>>>>>>> about
> >>>>>>>>>>>>> differently named ExplainDetails. I would vote for
> >>>>>>> `ESTIMATED_COST`
> >>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a reason
> >>>>>> why
> >>>>>>> to
> >>>>>>>>>> call
> >>>>>>>>>>>>> it that way.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 6) Implementation details
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It would be nice to understand how we plan to implement the
> >>>>>> given
> >>>>>>>>>>>>> features. Most of the commands and config options should go
> >>>>>> into
> >>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
> >>>>>> users
> >>>>>>>>> have a
> >>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
> >>>>>> provide a
> >>>>>>>>>> similar
> >>>>>>>>>>>>> user experience in notebooks or interactive programs than the
> >>>>>> SQL
> >>>>>>>>>> Client.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
> >>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather
> than
> >>>>>>>>> `UNSET`.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi, Jingsong.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
> >>>>>> `RESET`
> >>>>>>> is
> >>>>>>>>>>>>>>> supported in the current sql client now. Our proposal just
> >>>>>>>> extends
> >>>>>>>>>> its
> >>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
> >>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
> >>>>>> default
> >>>>>>>>>>>>> value[1].
> >>>>>>>>>>>>>>> I think it is more friendly for batch users.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>>>>>>>
> >> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二
> 下午1:56写道：
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too outdated.
> >>>>>> +1
> >>>>>>> for
> >>>>>>>>>>>>>>>> improving it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Jingsong
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> >>>>>> lirui.fudan@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes look
> >>>>>>> good
> >>>>>>>> to
> >>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> >>>>>>>> fskmine@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi, Rui.
> >>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The main changes:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
> >>>>>> type.
> >>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result of
> >>>>>>>> queries
> >>>>>>>>> to
> >>>>>>>>>>>>>>>>> debug
> >>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much
> convenient
> >>>>>>>>>> comparing
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> >>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
> >>>>>> mode.
> >>>>>>>> Users
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>> this option false and the client will process the next
> >>>>>> job
> >>>>>>>> until
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> current job finishes. The default value of this option
> is
> >>>>>>>> false,
> >>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>> means the client will execute the next job when the
> >>>>>> current
> >>>>>>>> job
> >>>>>>>>> is
> >>>>>>>>>>>>>>>>>> submitted.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
> >>>>>> have
> >>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
> >>>>>>>> example,
> >>>>>>>>> if
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> client just submits the job and exits, what happens if
> >>>>>> the
> >>>>>>>> file
> >>>>>>>>>>>>>>>>> contains
> >>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
> >>>>>> them
> >>>>>>> as
> >>>>>>>> a
> >>>>>>>>>>>>>>>>> statement
> >>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
> >>>>>> STATEMENT
> >>>>>>>> SET
> >>>>>>>>> in
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit
> the
> >>>>>>> two
> >>>>>>>>>> jobs,
> >>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> >>>>>>>>> fskmine@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Rui,
> >>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
> >>>>>> suggestions.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
> >>>>>> the
> >>>>>>> set
> >>>>>>>>>>>>>>>>> command. In
> >>>>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value
> into
> >>>>>>> the
> >>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
> >>>>>> table
> >>>>>>>>>> config.
> >>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>> hive
> >>>>>>>>>>>>>>>>>>>> supports to read the setting from the table config,
> >>>>>> users
> >>>>>>>> are
> >>>>>>>>>>>> able
> >>>>>>>>>>>>>>>>> to set
> >>>>>>>>>>>>>>>>>>>> the hive-related settings.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit the
> >>>>>> job
> >>>>>>>> and
> >>>>>>>>>>>>> exit.
> >>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job by
> >>>>>>>>>>>> themselves,
> >>>>>>>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
> >>>>>> case,
> >>>>>>>>>> queries
> >>>>>>>>>>>>>>>>> are used
> >>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
> >>>>>>>>> interactive
> >>>>>>>>>>>>>>>>> mode.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五
> 下午3:18写道：
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
> >>>>>> think
> >>>>>>> it
> >>>>>>>>>>>>> covers a
> >>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically
> improve
> >>>>>>> the
> >>>>>>>>>>>>>>>>> usability of our
> >>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
> >>>>>>>> configurations
> >>>>>>>>>>>> via
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
> >>>>>> configurations
> >>>>>>>> and
> >>>>>>>>> we
> >>>>>>>>>>>>>>>>> don't have
> >>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in
> SQL
> >>>>>>>>> Client.
> >>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>> example,
> >>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
> >>>>>> using
> >>>>>>>> hive
> >>>>>>>>>>>>>>>>> connector [1].
> >>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
> >>>>>> files
> >>>>>>>>>>>> specified
> >>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option but
> >>>>>>> allows
> >>>>>>>>>>>>> queries
> >>>>>>>>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query and
> >>>>>>>> redirect
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like to
> >>>>>> do
> >>>>>>>> the
> >>>>>>>>>>>> same,
> >>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> [1]
> https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
> >>>>>> additional
> >>>>>>>>>>>>>>>>> suggestions:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext
> to
> >>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
> >>>>>> sql.
> >>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> >>>>>>>> collect
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
> >>>>>>>>>>>>>>>>>>>>>>           which may have memory issues in JM or
> Local
> >>>>>> for
> >>>>>>>> the
> >>>>>>>>>> big
> >>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>>>>> result.
> >>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> >>>>>>>>>>>>>>>>>>>>>>           We may change to use SelectTableSink,
> which
> >>>>>> is
> >>>>>>>> based
> >>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
> >>>>>> is in
> >>>>>>>>>>>> FLIP-91.
> >>>>>>>>>>>>>>>>> Seems
> >>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long
> time.
> >>>>>>>>>>>>>>>>>>>>>>           Provide a long running service out of the
> >>>>>> box to
> >>>>>>>>>>>>> facilitate
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> sql
> >>>>>>>>>>>>>>>>>>>>>> submission is necessary.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> What do you think of these?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
> >>>>>>> 下午8:54写道：
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
> >>>>>>> FLIP-163:SQL
> >>>>>>>>>>>> Client
> >>>>>>>>>>>>>>>>>>>>>>> Improvements.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of
> the
> >>>>>>> sql
> >>>>>>>>>>>> client.
> >>>>>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table proposed
> >>>>>> by
> >>>>>>>>>> FLIP-95.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
> >>>>>>> initialize
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> >>>>>>>> parameter;
> >>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
> >>>>>> FLIP-163[1].
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> *With kind regards
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>> ------------------------------------------------------------
> >>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy
> of
> >>>>>>>>> Science
> >>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <
> liuyang0704@gmail.com
> >>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Best, Jingsong Lee
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Best regards!
> >>>>>>>>> Rui Li
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Hi Jark, Hi Rui,

1) How should we execute statements in CLI and in file? Should there be 
a difference?
So it seems we have consensus here with unified bahavior. Even though 
this means we are breaking existing batch INSERT INTOs that were 
asynchronous before.

2) Should we have different behavior for batch and streaming?
I think also batch users prefer async behavior because usually even 
those pipelines take some time to execute. But we need should stick to 
standard SQL blocking semantics.

What are your opinions on making async explicit in SQL via `BEGIN ASYNC; 
... END;`? This would allow us to really have unified semantics because 
batch and streaming would behave the same?

Regards,
Timo


On 07.02.21 04:46, Rui Li wrote:
> Hi Timo,
> 
> I agree with Jark that we should provide consistent experience regarding
> SQL CLI and files. Some systems even allow users to execute SQL files in
> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that in
> the future, it's a little tricky to decide whether that should be treated
> as CLI or file.
> 
> I actually prefer a config option and let users decide what's the
> desirable behavior. But if we have agreed not to use options, I'm also fine
> with Alternative #1.
> 
> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:
> 
>> Hi Timo,
>>
>> 1) How should we execute statements in CLI and in file? Should there be a
>> difference?
>> I do think we should unify the behavior of CLI and SQL files. SQL files can
>> be thought of as a shortcut of
>> "start CLI" => "copy content of SQL files" => "past content in CLI".
>> Actually, we already did this in kafka_e2e.sql [1].
>> I think it's hard for users to understand why SQL files behave differently
>> from CLI, all the other systems don't have such a difference.
>>
>> If we distinguish SQL files and CLI, should there be a difference in JDBC
>> driver and UI platform?
>> Personally, they all should have consistent behavior.
>>
>> 2) Should we have different behavior for batch and streaming?
>> I think we all agree streaming users prefer async execution, otherwise it's
>> weird and difficult to use if the
>> submit script or CLI never exists. On the other hand, batch SQL users are
>> used to SQL statements being
>> executed blockly.
>>
>> Either unified async execution or unified sync execution, will hurt one
>> side of the streaming
>> batch users. In order to make both sides happy, I think we can have
>> different behavior for batch and streaming.
>> There are many essential differences between batch and stream systems, I
>> think it's normal to have some
>> different behaviors, and the behavior doesn't break the unified batch
>> stream semantics.
>>
>>
>> Thus, I'm +1 to Alternative 1:
>> We consider batch/streaming mode and block for batch INSERT INTO and async
>> for streaming INSERT INTO/STATEMENT SET.
>> And this behavior is consistent across CLI and files.
>>
>> Best,
>> Jark
>>
>> [1]:
>>
>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
>>
>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org> wrote:
>>
>>> Hi Jark,
>>>
>>> thanks for the summary. I hope we can also find a good long-term
>>> solution on the async/sync execution behavior topic.
>>>
>>> It should be discussed in a bigger round because it is (similar to the
>>> time function discussion) related to batch-streaming unification where
>>> we should stick to the SQL standard to some degree but also need to come
>>> up with good streaming semantics.
>>>
>>> Let me summarize the problem again to hear opinions:
>>>
>>> - Batch SQL users are used to execute SQL files sequentially (from top
>>> to bottom).
>>> - Batch SQL users are used to SQL statements being executed blocking.
>>> One after the other. Esp. when moving around data with INSERT INTO.
>>> - Streaming users prefer async execution because unbounded stream are
>>> more frequent than bounded streams.
>>> - We decided to make Flink Table API is async because in a programming
>>> language it is easy to call `.await()` on the result to make it blocking.
>>> - INSERT INTO statements in the current SQL Client implementation are
>>> always submitted asynchrounous.
>>> - Other client's such as Ververica platform allow only one INSERT INTO
>>> or a STATEMENT SET at the end of a file that will run asynchrounously.
>>>
>>> Questions:
>>>
>>> - How should we execute statements in CLI and in file? Should there be a
>>> difference?
>>> - Should we have different behavior for batch and streaming?
>>> - Shall we solve parts with a config option or is it better to make it
>>> explicit in the SQL job definition because it influences the semantics
>>> of multiple INSERT INTOs?
>>>
>>> Let me summarize my opinion at the moment:
>>>
>>> - SQL files should always be executed blocking by default. Because they
>>> could potentially contain a long list of INSERT INTO statements. This
>>> would be SQL standard compliant.
>>> - If we allow async execution, we should make this explicit in the SQL
>>> file via `BEGIN ASYNC; ... END;`.
>>> - In the CLI, we always execute async to maintain the old behavior. We
>>> can also assume that people are only using the CLI to fire statements
>>> and close the CLI afterwards.
>>>
>>> Alternative 1:
>>> - We consider batch/streaming mode and block for batch INSERT INTO and
>>> async for streaming INSERT INTO/STATEMENT SET
>>>
>>> What do others think?
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>>
>>>
>>> On 05.02.21 04:03, Jark Wu wrote:
>>>> Hi all,
>>>>
>>>> After an offline discussion with Timo and Kurt, we have reached some
>>>> consensus.
>>>> Please correct me if I am wrong or missed anything.
>>>>
>>>> 1) We will introduce "table.planner" and "table.execution-mode" instead
>>> of
>>>> "sql-client" prefix,
>>>> and add `TableEnvironment.create(Configuration)` interface. These 2
>>> options
>>>> can only be used
>>>> for tableEnv initialization. If used after initialization, Flink should
>>>> throw an exception. We may can
>>>> support dynamic switch the planner in the future.
>>>>
>>>> 2) We will have only one parser,
>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a string
>>>> statement, and returns a list of Operation. It will first use regex to
>>>> match some special statement,
>>>>    e.g. SET, ADD JAR, others will be delegated to the underlying Calcite
>>>> parser. The Parser can
>>>> have different implementations, e.g. HiveParser.
>>>>
>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect. But
>>> we
>>>> can allow
>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
>>>>
>>>> 4) We don't have a conclusion for async/sync execution behavior yet.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>>
>>>>
>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
>>>>
>>>>> Hi Ingo,
>>>>>
>>>>> Since we have supported the WITH syntax and SET command since v1.9
>>> [1][2],
>>>>> and
>>>>> we have never received such complaints, I think it's fine for such
>>>>> differences.
>>>>>
>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
>> requires
>>>>> string literal keys[3],
>>>>> and the SET <key>=<value> doesn't allow quoted keys [4].
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>> [1]:
>>>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
>>>>> [2]:
>>>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
>>>>> [3]:
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>>>>> [4]:
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
>>>>> (search "set mapred.reduce.tasks=32")
>>>>>
>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> regarding the (un-)quoted question, compatibility is of course an
>>>>>> important
>>>>>> argument, but in terms of consistency I'd find it a bit surprising
>> that
>>>>>> WITH handles it differently than SET, and I wonder if that could
>> cause
>>>>>> friction for developers when writing their SQL.
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Ingo
>>>>>>
>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Regarding "One Parser", I think it's not possible for now because
>>>>>> Calcite
>>>>>>> parser can't parse
>>>>>>> special characters (e.g. "-") unless quoting them as string
>> literals.
>>>>>>> That's why the WITH option
>>>>>>> key are string literals not identifiers.
>>>>>>>
>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
>>>>>>> /local/my-home/test.jar
>>>>>>> have the same
>>>>>>> problems. That's why we propose two parser, one splits lines into
>>>>>> multiple
>>>>>>> statements and match special
>>>>>>> command through regex which is light-weight, and delegate other
>>>>>> statements
>>>>>>> to the other parser which is Calcite parser.
>>>>>>>
>>>>>>> Note: we should stick on the unquoted SET
>>> table.exec.mini-batch.enabled
>>>>>> =
>>>>>>> true syntax,
>>>>>>> both for backward-compatibility and easy-to-use, and all the other
>>>>>> systems
>>>>>>> don't have quotes on the key.
>>>>>>>
>>>>>>>
>>>>>>> Regarding "table.planner" vs "sql-client.planner",
>>>>>>> if we want to use "table.planner", I think we should explain clearly
>>>>>> what's
>>>>>>> the scope it can be used in documentation.
>>>>>>> Otherwise, there will be users complaining why the planner doesn't
>>>>>> change
>>>>>>> when setting the configuration on TableEnv.
>>>>>>> Would be better throwing an exception to indicate users it's now
>>>>>> allowed to
>>>>>>> change planner after TableEnv is initialized.
>>>>>>> However, it seems not easy to implement.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
>> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Regarding "table.planner" and "table.execution-mode"
>>>>>>>> If we define that those two options are just used to initialize the
>>>>>>>> TableEnvironment, +1 for introducing table options instead of
>>>>>> sql-client
>>>>>>>> options.
>>>>>>>>
>>>>>>>> Regarding "the sql client, we will maintain two parsers", I want to
>>>>>> give
>>>>>>>> more inputs:
>>>>>>>> We want to introduce sql-gateway into the Flink project (see
>> FLIP-24
>>> &
>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI
>> client
>>>>>> and
>>>>>>>> the gateway service will communicate through Rest API. The " ADD
>> JAR
>>>>>>>> /local/path/jar " will be executed in the CLI client machine. So
>> when
>>>>>> we
>>>>>>>> submit a sql file which contains multiple statements, the CLI
>> client
>>>>>>> needs
>>>>>>>> to pick out the "ADD JAR" line, and also statements need to be
>>>>>> submitted
>>>>>>> or
>>>>>>>> executed one by one to make sure the result is correct. The sql
>> file
>>>>>> may
>>>>>>> be
>>>>>>>> look like:
>>>>>>>>
>>>>>>>> SET xxx=yyy;
>>>>>>>> create table my_table ...;
>>>>>>>> create table my_sink ...;
>>>>>>>> ADD JAR /local/path/jar1;
>>>>>>>> create function my_udf as com....MyUdf;
>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>>>>> REMOVE JAR /local/path/jar1;
>>>>>>>> drop function my_udf;
>>>>>>>> ADD JAR /local/path/jar2;
>>>>>>>> create function my_udf as com....MyUdf2;
>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>>>>>
>>>>>>>> The lines need to be splitted into multiple statements first in the
>>>>>> CLI
>>>>>>>> client, there are two approaches:
>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser splits
>>> the
>>>>>>>> lines and tells which lines are "ADD JAR".
>>>>>>>> pro: there is only one parser
>>>>>>>> cons: It's a little heavy that the CLI client depends on the
>>>>>> sql-parser,
>>>>>>>> because the CLI client is just a simple tool which receives the
>> user
>>>>>>>> commands and displays the result. The non "ADD JAR" command will be
>>>>>>> parsed
>>>>>>>> twice.
>>>>>>>>
>>>>>>>> 2. The CLI client splits the lines into multiple statements and
>> finds
>>>>>> the
>>>>>>>> ADD JAR command through regex matching.
>>>>>>>> pro: The CLI client is very light-weight.
>>>>>>>> cons: there are two parsers.
>>>>>>>>
>>>>>>>> (personally, I prefer the second option)
>>>>>>>>
>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
>>>>>>>> For default dialect, we support SHOW JARS, but if we switch to hive
>>>>>>>> dialect, LIST JARS is also supported.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>>>>>>>> [2]
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Godfrey
>>>>>>>>
>>>>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>>>>>>>>
>>>>>>>>> Hi guys,
>>>>>>>>>
>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with
>> other
>>>>>>>>> commands than LIST JARS. I don't have a strong opinion about
>> REMOVE
>>>>>> vs
>>>>>>>>> DELETE though.
>>>>>>>>>
>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I know,
>>>>>> most
>>>>>>>>> users who are requesting these features are previously hive users.
>>>>>> So I
>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and
>> REMOVE/DELETE
>>>>>>> JARS
>>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT and
>>>>>> QUIT
>>>>>>> as
>>>>>>>>> the command to terminate the program. So if that's not hard to
>>>>>> achieve,
>>>>>>>> and
>>>>>>>>> will make users happier, I don't see a reason why we must choose
>> one
>>>>>>> over
>>>>>>>>> the other.
>>>>>>>>>
>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> some feedback regarding the open questions. Maybe we can discuss
>>>>>> the
>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to determine how
>>>>>> we
>>>>>>>>>> proceed with this in the near future.
>>>>>>>>>>
>>>>>>>>>> 1) "whether the table environment has the ability to update
>>>>>> itself"
>>>>>>>>>>
>>>>>>>>>> Maybe there was some misunderstanding. I don't think that we
>>>>>> should
>>>>>>>>>> support
>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>>>>>>>>>> "old")`. Instead I'm proposing to support
>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and
>>>>>> execution
>>>>>>>>>> mode are read immediately and a subsequent changes to these
>>>>>> options
>>>>>>>> will
>>>>>>>>>> have no effect. We are doing it similar in `new
>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
>>>>>> ConfigOption's
>>>>>>>>>> must not be SQL Client specific but can be part of the core table
>>>>>>> code
>>>>>>>>>> base. Many users would like to get a 100% preconfigured
>>>>>> environment
>>>>>>>> from
>>>>>>>>>> just Configuration. And this is not possible right now. We can
>>>>>> solve
>>>>>>>>>> both use cases in one change.
>>>>>>>>>>
>>>>>>>>>> 2) "the sql client, we will maintain two parsers"
>>>>>>>>>>
>>>>>>>>>> I remember we had some discussion about this and decided that we
>>>>>>> would
>>>>>>>>>> like to maintain only one parser. In the end it is "One Flink
>> SQL"
>>>>>>>> where
>>>>>>>>>> commands influence each other also with respect to keywords. It
>>>>>>> should
>>>>>>>>>> be fine to include the SQL Client commands in the Flink parser.
>> Of
>>>>>>>>>> cource the table environment would not be able to handle the
>>>>>>>> `Operation`
>>>>>>>>>> instance that would be the result but we can introduce hooks to
>>>>>>> handle
>>>>>>>>>> those `Operation`s. Or we introduce parser extensions.
>>>>>>>>>>
>>>>>>>>>> Can we skip `table.job.async` in the first version? We should
>>>>>> further
>>>>>>>>>> discuss whether we introduce a special SQL clause for wrapping
>>>>>> async
>>>>>>>>>> behavior or if we use a config option? Esp. for streaming queries
>>>>>> we
>>>>>>>>>> need to be careful and should force users to either "one INSERT
>>>>>> INTO"
>>>>>>>> or
>>>>>>>>>> "one STATEMENT SET".
>>>>>>>>>>
>>>>>>>>>> 3) 4) "HIVE also uses these commands"
>>>>>>>>>>
>>>>>>>>>> In general, Hive is not a good reference. Aligning the commands
>>>>>> more
>>>>>>>>>> with the remaining commands should be our goal. We just had a
>>>>>> MODULE
>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it is true
>>>>>>> that
>>>>>>>>>> JARs are not part of the catalog which is why I would not use
>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
>>>>>>> language.
>>>>>>>>>> Take a look at the Java collection API as another example.
>>>>>>>>>>
>>>>>>>>>> 6) "Most of the commands should belong to the table environment"
>>>>>>>>>>
>>>>>>>>>> Thanks for updating the FLIP this makes things easier to
>>>>>> understand.
>>>>>>> It
>>>>>>>>>> is good to see that most commends will be available in
>>>>>>>> TableEnvironment.
>>>>>>>>>> However, I would also support SET and RESET for consistency.
>>>>>> Again,
>>>>>>>> from
>>>>>>>>>> an architectural point of view, if we would allow some kind of
>>>>>>>>>> `Operation` hook in table environment, we could check for SQL
>>>>>> Client
>>>>>>>>>> specific options and forward to regular
>>>>>>> `TableConfig.getConfiguration`
>>>>>>>>>> otherwise. What do you think?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Timo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>
>>>>>>>>>>> I will respond some of the questions:
>>>>>>>>>>>
>>>>>>>>>>> 1) SQL client specific options
>>>>>>>>>>>
>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on where
>>>>>> the
>>>>>>>>>>> configuration takes effect.
>>>>>>>>>>> If it is a table configuration, we should make clear what's the
>>>>>>>>> behavior
>>>>>>>>>>> when users change
>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment.
>>>>>>>>>>>
>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and
>>>>>>>>>> `sql-client.execution.mode`
>>>>>>>>>>> are something special
>>>>>>>>>>> that can't be changed after TableEnvironment has been
>>>>>> initialized.
>>>>>>>> You
>>>>>>>>>> can
>>>>>>>>>>> see
>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method to
>>>>>>>> override
>>>>>>>>>>> configuration after
>>>>>>>>>>> StreamExecutionEnvironment has been initialized.
>>>>>>>>>>>
>>>>>>>>>>> Therefore, I think it would be better to still use
>>>>>>>>> `sql-client.planner`
>>>>>>>>>>> and `sql-client.execution.mode`.
>>>>>>>>>>>
>>>>>>>>>>> 2) Execution file
>>>>>>>>>>>
>>>>>>>>>>> >From my point of view, there is a big difference between
>>>>>>>>>>> `sql-client.job.detach` and
>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that
>>>>>> `sql-client.job.detach`
>>>>>>>> will
>>>>>>>>>>> affect every single DML statement
>>>>>>>>>>> in the terminal, not only the statements in SQL files. I think
>>>>>> the
>>>>>>>>> single
>>>>>>>>>>> DML statement in the interactive
>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of
>>>>>>>>>>> tEnv#executeMultiSql.
>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in
>>>>>>>>> `table.multi-sql-async`.
>>>>>>>>>>> I just find that runtime provides a configuration called
>>>>>>>>>>> "execution.attached" [1] which is false by default
>>>>>>>>>>> which specifies if the pipeline is submitted in attached or
>>>>>>> detached
>>>>>>>>>> mode.
>>>>>>>>>>> It provides exactly the same
>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you think
>>>>>> about
>>>>>>>> using
>>>>>>>>>>> this option?
>>>>>>>>>>>
>>>>>>>>>>> If we also want to support this config in TableEnvironment, I
>>>>>> think
>>>>>>>> it
>>>>>>>>>>> should also affect the DML execution
>>>>>>>>>>>     of `tEnv#executeSql()`, not only DMLs in
>>>>>>> `tEnv#executeMultiSql()`.
>>>>>>>>>>> Therefore, the behavior may look like this:
>>>>>>>>>>>
>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
>>>>>> by
>>>>>>>>>> default
>>>>>>>>>>> tableResult.await()   ==> manually block until finish
>>>>>>>>>>>
>>>>>> tEnv.getConfig().getConfiguration().setString("execution.attached",
>>>>>>>>>> "true")
>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
>>>>>>>> don't
>>>>>>>>>> need
>>>>>>>>>>> to wait on the TableResult
>>>>>>>>>>> tEnv.executeMultiSql(
>>>>>>>>>>> """
>>>>>>>>>>> CREATE TABLE ....  ==> always sync
>>>>>>>>>>> INSERT INTO ...  => sync, because we set configuration above
>>>>>>>>>>> SET execution.attached = false;
>>>>>>>>>>> INSERT INTO ...  => async
>>>>>>>>>>> """)
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, I think `sql-client.job.detach`
>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two separate
>>>>>>>> topics,
>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
>>>>>> statements.
>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
>>>>>> it to
>>>>>>>>> block
>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi, Timo.
>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts about
>>>>>> your
>>>>>>>>>>>> feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the table
>>>>>>>>>> environment
>>>>>>>>>>>> has the ability to update itself. Let's take a simple program
>>>>>> as
>>>>>>> an
>>>>>>>>>>>> example.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
>>>>>>>>>>>>
>>>>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
>>>>>> "old");
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> tEnv.executeSql("...");
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> If we regard this option as a table option, users don't have to
>>>>>>>> create
>>>>>>>>>>>> another table environment manually. In that case, tEnv needs to
>>>>>>>> check
>>>>>>>>>>>> whether the current mode and planner are the same as before
>>>>>> when
>>>>>>>>>> executeSql
>>>>>>>>>>>> or explainSql. I don't think it's easy work for the table
>>>>>>>> environment,
>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but set
>>>>>> old
>>>>>>>>>> planner
>>>>>>>>>>>> and batch mode. But when we make this option as a sql client
>>>>>>> option,
>>>>>>>>>> users
>>>>>>>>>>>> only use the SET command to change the setting. We can rebuild
>>>>>> a
>>>>>>> new
>>>>>>>>>> table
>>>>>>>>>>>> environment when set successes.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the implementation
>>>>>>> before
>>>>>>>>>>>> continuing this topic. In the sql client, we will maintain two
>>>>>>>>> parsers.
>>>>>>>>>> The
>>>>>>>>>>>> first parser(client parser) will only match the sql client
>>>>>>> commands.
>>>>>>>>> If
>>>>>>>>>> the
>>>>>>>>>>>> client parser can't parse the statement, we will leverage the
>>>>>>> power
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>>> table environment to execute. According to our blueprint,
>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
>>>>>>> Therefore,
>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>>>>>>>>>>>>
>>>>>>>>>>>> But if we need to introduce the
>>>>>> `TableEnvironment.executeMultiSql`
>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>>>> future, I think it's OK to use the option
>>>>>> `table.multi-sql-async`
>>>>>>>>> rather
>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the name is
>>>>>> not
>>>>>>>>>> suitable
>>>>>>>>>>>> because the name is confusing for others. When setting the
>>>>>> option
>>>>>>>>>> false, we
>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO
>>>>>>> statement,
>>>>>>>>> not
>>>>>>>>>> DDL
>>>>>>>>>>>> or others(other sql statements are always executed
>>>>>> synchronously).
>>>>>>>> So
>>>>>>>>>> how
>>>>>>>>>>>> about `table.job.async`? It only works for the sql-client and
>>>>>> the
>>>>>>>>>>>> executeMultiSql. If we set this value false, the table
>>>>>> environment
>>>>>>>>> will
>>>>>>>>>>>> return the result until the job finishes.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR and
>>>>>>> LIST
>>>>>>>>> JAR
>>>>>>>>>>>> because HIVE also uses these commands to add the jar into the
>>>>>>>>> classpath
>>>>>>>>>> or
>>>>>>>>>>>> delete the jar. If we use  such commands, it can reduce our
>>>>>> work
>>>>>>> for
>>>>>>>>>> hive
>>>>>>>>>>>> compatibility.
>>>>>>>>>>>>
>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
>>>>>>>> maintained
>>>>>>>>> by
>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
>>>>>>> grammar,
>>>>>>>>>> maybe
>>>>>>>>>>>> we should use
>>>>>>>>>>>>
>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
>>>>>>>>>>>>
>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
>>>>>> consistent.
>>>>>>>>>>>>
>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to the
>>>>>>> table
>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
>>>>>>>> identify
>>>>>>>>>> which
>>>>>>>>>>>> commands should belong to the sql client and which commands
>>>>>> should
>>>>>>>>>> belong
>>>>>>>>>>>> to the table environment. I also add a new section about
>>>>>>>>> implementation
>>>>>>>>>>>> details in the FLIP.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>
>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
>>>>>> SQL
>>>>>>>>> Client
>>>>>>>>>> a
>>>>>>>>>>>>> very good update and make it production ready.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is some feedback from my side:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) SQL client specific options
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think that `sql-client.planner` and
>>>>>>>>> `sql-client.execution.mode`
>>>>>>>>>>>>> are SQL Client specific. Similar to
>>>>>> `StreamExecutionEnvironment`
>>>>>>>> and
>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added recently, we
>>>>>>>> should
>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we offer
>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
>>>>>>> `table.planner`
>>>>>>>>> and
>>>>>>>>>>>>> `table.execution-mode` to
>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Execution file
>>>>>>>>>>>>>
>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1] including
>>>>>> the
>>>>>>>>>> mailing
>>>>>>>>>>>>> list thread at that time? Could you further elaborate how the
>>>>>>>>>>>>> multi-statement execution should work for a unified
>>>>>>> batch/streaming
>>>>>>>>>>>>> story? According to our past discussions, each line in an
>>>>>>> execution
>>>>>>>>>> file
>>>>>>>>>>>>> should be executed blocking which means a streaming query
>>>>>> needs a
>>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
>>>>>> correct?
>>>>>>>> We
>>>>>>>>>>>>> should also offer this functionality in
>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
>>>>>>>> `sql-client.job.detach`
>>>>>>>>>> is
>>>>>>>>>>>>> SQL Client specific needs to be determined, it could also be a
>>>>>>>>> general
>>>>>>>>>>>>> `table.multi-sql-async` option?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3) DELETE JAR
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
>>>>>> like
>>>>>>>> one
>>>>>>>>>> is
>>>>>>>>>>>>> actively deleting the JAR in the corresponding path.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4) LIST JAR
>>>>>>>>>>>>>
>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
>>>>>> such
>>>>>>> as
>>>>>>>>>> `SHOW
>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>>>>>>>>>>>>
>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should keep the details in sync with
>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
>>>>>>>> about
>>>>>>>>>>>>> differently named ExplainDetails. I would vote for
>>>>>>> `ESTIMATED_COST`
>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a reason
>>>>>> why
>>>>>>> to
>>>>>>>>>> call
>>>>>>>>>>>>> it that way.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 6) Implementation details
>>>>>>>>>>>>>
>>>>>>>>>>>>> It would be nice to understand how we plan to implement the
>>>>>> given
>>>>>>>>>>>>> features. Most of the commands and config options should go
>>>>>> into
>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
>>>>>> users
>>>>>>>>> have a
>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
>>>>>> provide a
>>>>>>>>>> similar
>>>>>>>>>>>>> user experience in notebooks or interactive programs than the
>>>>>> SQL
>>>>>>>>>> Client.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather than
>>>>>>>>> `UNSET`.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi, Jingsong.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
>>>>>> `RESET`
>>>>>>> is
>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal just
>>>>>>>> extends
>>>>>>>>>> its
>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys.
>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
>>>>>> default
>>>>>>>>>>>>> value[1].
>>>>>>>>>>>>>>> I think it is more friendly for batch users.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>>>>>>>>
>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too outdated.
>>>>>> +1
>>>>>>> for
>>>>>>>>>>>>>>>> improving it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Jingsong
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>>>>>> lirui.fudan@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes look
>>>>>>> good
>>>>>>>> to
>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>>>>>>>> fskmine@gmail.com
>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi, Rui.
>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The main changes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
>>>>>> type.
>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result of
>>>>>>>> queries
>>>>>>>>> to
>>>>>>>>>>>>>>>>> debug
>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much convenient
>>>>>>>>>> comparing
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> writing INSERT INTO statements.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
>>>>>> mode.
>>>>>>>> Users
>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>> this option false and the client will process the next
>>>>>> job
>>>>>>>> until
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> current job finishes. The default value of this option is
>>>>>>>> false,
>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> means the client will execute the next job when the
>>>>>> current
>>>>>>>> job
>>>>>>>>> is
>>>>>>>>>>>>>>>>>> submitted.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
>>>>>> have
>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
>>>>>>>> example,
>>>>>>>>> if
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what happens if
>>>>>> the
>>>>>>>> file
>>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
>>>>>> them
>>>>>>> as
>>>>>>>> a
>>>>>>>>>>>>>>>>> statement
>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
>>>>>> STATEMENT
>>>>>>>> SET
>>>>>>>>> in
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit the
>>>>>>> two
>>>>>>>>>> jobs,
>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>>>>>>>>> fskmine@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Rui,
>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
>>>>>> suggestions.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
>>>>>> the
>>>>>>> set
>>>>>>>>>>>>>>>>> command. In
>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value into
>>>>>>> the
>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
>>>>>> table
>>>>>>>>>> config.
>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>> hive
>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table config,
>>>>>> users
>>>>>>>> are
>>>>>>>>>>>> able
>>>>>>>>>>>>>>>>> to set
>>>>>>>>>>>>>>>>>>>> the hive-related settings.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit the
>>>>>> job
>>>>>>>> and
>>>>>>>>>>>>> exit.
>>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job by
>>>>>>>>>>>> themselves,
>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
>>>>>> case,
>>>>>>>>>> queries
>>>>>>>>>>>>>>>>> are used
>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
>>>>>>>>> interactive
>>>>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
>>>>>> think
>>>>>>> it
>>>>>>>>>>>>> covers a
>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically improve
>>>>>>> the
>>>>>>>>>>>>>>>>> usability of our
>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
>>>>>>>> configurations
>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
>>>>>> configurations
>>>>>>>> and
>>>>>>>>> we
>>>>>>>>>>>>>>>>> don't have
>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in SQL
>>>>>>>>> Client.
>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
>>>>>> using
>>>>>>>> hive
>>>>>>>>>>>>>>>>> connector [1].
>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
>>>>>> files
>>>>>>>>>>>> specified
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option but
>>>>>>> allows
>>>>>>>>>>>>> queries
>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query and
>>>>>>>> redirect
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like to
>>>>>> do
>>>>>>>> the
>>>>>>>>>>>> same,
>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>>>>>>>>>>>>>>> liuyang0704@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
>>>>>> additional
>>>>>>>>>>>>>>>>> suggestions:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
>>>>>> sql.
>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
>>>>>>>> collect
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>>>>>>>>>>>>>>           which may have memory issues in JM or Local
>>>>>> for
>>>>>>>> the
>>>>>>>>>> big
>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>>> result.
>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>>>>>>>>>>>>>>           We may change to use SelectTableSink, which
>>>>>> is
>>>>>>>> based
>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
>>>>>> is in
>>>>>>>>>>>> FLIP-91.
>>>>>>>>>>>>>>>>> Seems
>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long time.
>>>>>>>>>>>>>>>>>>>>>>           Provide a long running service out of the
>>>>>> box to
>>>>>>>>>>>>> facilitate
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> sql
>>>>>>>>>>>>>>>>>>>>>> submission is necessary.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> What do you think of these?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
>>>>>>> 下午8:54写道：
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
>>>>>>> FLIP-163:SQL
>>>>>>>>>>>> Client
>>>>>>>>>>>>>>>>>>>>>>> Improvements.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of the
>>>>>>> sql
>>>>>>>>>>>> client.
>>>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table proposed
>>>>>> by
>>>>>>>>>> FLIP-95.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
>>>>>>> initialize
>>>>>>>>> the
>>>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
>>>>>>>> parameter;
>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
>>>>>> FLIP-163[1].
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> *With kind regards
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
>>>>>>>>> Science
>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <liuyang0704@gmail.com
>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559*
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best, Jingsong Lee
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards!
>>>>>>>>> Rui Li
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Rui Li <li...@gmail.com>.

Hi Timo,

I agree with Jark that we should provide consistent experience regarding
SQL CLI and files. Some systems even allow users to execute SQL files in
the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that in
the future, it's a little tricky to decide whether that should be treated
as CLI or file.

I actually prefer a config option and let users decide what's the
desirable behavior. But if we have agreed not to use options, I'm also fine
with Alternative #1.

On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <im...@gmail.com> wrote:

> Hi Timo,
>
> 1) How should we execute statements in CLI and in file? Should there be a
> difference?
> I do think we should unify the behavior of CLI and SQL files. SQL files can
> be thought of as a shortcut of
> "start CLI" => "copy content of SQL files" => "past content in CLI".
> Actually, we already did this in kafka_e2e.sql [1].
> I think it's hard for users to understand why SQL files behave differently
> from CLI, all the other systems don't have such a difference.
>
> If we distinguish SQL files and CLI, should there be a difference in JDBC
> driver and UI platform?
> Personally, they all should have consistent behavior.
>
> 2) Should we have different behavior for batch and streaming?
> I think we all agree streaming users prefer async execution, otherwise it's
> weird and difficult to use if the
> submit script or CLI never exists. On the other hand, batch SQL users are
> used to SQL statements being
> executed blockly.
>
> Either unified async execution or unified sync execution, will hurt one
> side of the streaming
> batch users. In order to make both sides happy, I think we can have
> different behavior for batch and streaming.
> There are many essential differences between batch and stream systems, I
> think it's normal to have some
> different behaviors, and the behavior doesn't break the unified batch
> stream semantics.
>
>
> Thus, I'm +1 to Alternative 1:
> We consider batch/streaming mode and block for batch INSERT INTO and async
> for streaming INSERT INTO/STATEMENT SET.
> And this behavior is consistent across CLI and files.
>
> Best,
> Jark
>
> [1]:
>
> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql
>
> On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org> wrote:
>
> > Hi Jark,
> >
> > thanks for the summary. I hope we can also find a good long-term
> > solution on the async/sync execution behavior topic.
> >
> > It should be discussed in a bigger round because it is (similar to the
> > time function discussion) related to batch-streaming unification where
> > we should stick to the SQL standard to some degree but also need to come
> > up with good streaming semantics.
> >
> > Let me summarize the problem again to hear opinions:
> >
> > - Batch SQL users are used to execute SQL files sequentially (from top
> > to bottom).
> > - Batch SQL users are used to SQL statements being executed blocking.
> > One after the other. Esp. when moving around data with INSERT INTO.
> > - Streaming users prefer async execution because unbounded stream are
> > more frequent than bounded streams.
> > - We decided to make Flink Table API is async because in a programming
> > language it is easy to call `.await()` on the result to make it blocking.
> > - INSERT INTO statements in the current SQL Client implementation are
> > always submitted asynchrounous.
> > - Other client's such as Ververica platform allow only one INSERT INTO
> > or a STATEMENT SET at the end of a file that will run asynchrounously.
> >
> > Questions:
> >
> > - How should we execute statements in CLI and in file? Should there be a
> > difference?
> > - Should we have different behavior for batch and streaming?
> > - Shall we solve parts with a config option or is it better to make it
> > explicit in the SQL job definition because it influences the semantics
> > of multiple INSERT INTOs?
> >
> > Let me summarize my opinion at the moment:
> >
> > - SQL files should always be executed blocking by default. Because they
> > could potentially contain a long list of INSERT INTO statements. This
> > would be SQL standard compliant.
> > - If we allow async execution, we should make this explicit in the SQL
> > file via `BEGIN ASYNC; ... END;`.
> > - In the CLI, we always execute async to maintain the old behavior. We
> > can also assume that people are only using the CLI to fire statements
> > and close the CLI afterwards.
> >
> > Alternative 1:
> > - We consider batch/streaming mode and block for batch INSERT INTO and
> > async for streaming INSERT INTO/STATEMENT SET
> >
> > What do others think?
> >
> > Regards,
> > Timo
> >
> >
> >
> >
> > On 05.02.21 04:03, Jark Wu wrote:
> > > Hi all,
> > >
> > > After an offline discussion with Timo and Kurt, we have reached some
> > > consensus.
> > > Please correct me if I am wrong or missed anything.
> > >
> > > 1) We will introduce "table.planner" and "table.execution-mode" instead
> > of
> > > "sql-client" prefix,
> > > and add `TableEnvironment.create(Configuration)` interface. These 2
> > options
> > > can only be used
> > > for tableEnv initialization. If used after initialization, Flink should
> > > throw an exception. We may can
> > > support dynamic switch the planner in the future.
> > >
> > > 2) We will have only one parser,
> > > i.e. org.apache.flink.table.delegation.Parser. It accepts a string
> > > statement, and returns a list of Operation. It will first use regex to
> > > match some special statement,
> > >   e.g. SET, ADD JAR, others will be delegated to the underlying Calcite
> > > parser. The Parser can
> > > have different implementations, e.g. HiveParser.
> > >
> > > 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect. But
> > we
> > > can allow
> > > DELETE JAR, LIST JAR in Hive dialect through HiveParser.
> > >
> > > 4) We don't have a conclusion for async/sync execution behavior yet.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
> > >
> > >> Hi Ingo,
> > >>
> > >> Since we have supported the WITH syntax and SET command since v1.9
> > [1][2],
> > >> and
> > >> we have never received such complaints, I think it's fine for such
> > >> differences.
> > >>
> > >> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also
> requires
> > >> string literal keys[3],
> > >> and the SET <key>=<value> doesn't allow quoted keys [4].
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> [1]:
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
> > >> [2]:
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
> > >> [3]:
> > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> > >> [4]:
> > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
> > >> (search "set mapred.reduce.tasks=32")
> > >>
> > >> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> regarding the (un-)quoted question, compatibility is of course an
> > >>> important
> > >>> argument, but in terms of consistency I'd find it a bit surprising
> that
> > >>> WITH handles it differently than SET, and I wonder if that could
> cause
> > >>> friction for developers when writing their SQL.
> > >>>
> > >>>
> > >>> Regards
> > >>> Ingo
> > >>>
> > >>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> Regarding "One Parser", I think it's not possible for now because
> > >>> Calcite
> > >>>> parser can't parse
> > >>>> special characters (e.g. "-") unless quoting them as string
> literals.
> > >>>> That's why the WITH option
> > >>>> key are string literals not identifiers.
> > >>>>
> > >>>> SET table.exec.mini-batch.enabled = true and ADD JAR
> > >>>> /local/my-home/test.jar
> > >>>> have the same
> > >>>> problems. That's why we propose two parser, one splits lines into
> > >>> multiple
> > >>>> statements and match special
> > >>>> command through regex which is light-weight, and delegate other
> > >>> statements
> > >>>> to the other parser which is Calcite parser.
> > >>>>
> > >>>> Note: we should stick on the unquoted SET
> > table.exec.mini-batch.enabled
> > >>> =
> > >>>> true syntax,
> > >>>> both for backward-compatibility and easy-to-use, and all the other
> > >>> systems
> > >>>> don't have quotes on the key.
> > >>>>
> > >>>>
> > >>>> Regarding "table.planner" vs "sql-client.planner",
> > >>>> if we want to use "table.planner", I think we should explain clearly
> > >>> what's
> > >>>> the scope it can be used in documentation.
> > >>>> Otherwise, there will be users complaining why the planner doesn't
> > >>> change
> > >>>> when setting the configuration on TableEnv.
> > >>>> Would be better throwing an exception to indicate users it's now
> > >>> allowed to
> > >>>> change planner after TableEnv is initialized.
> > >>>> However, it seems not easy to implement.
> > >>>>
> > >>>> Best,
> > >>>> Jark
> > >>>>
> > >>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com>
> wrote:
> > >>>>
> > >>>>> Hi everyone,
> > >>>>>
> > >>>>> Regarding "table.planner" and "table.execution-mode"
> > >>>>> If we define that those two options are just used to initialize the
> > >>>>> TableEnvironment, +1 for introducing table options instead of
> > >>> sql-client
> > >>>>> options.
> > >>>>>
> > >>>>> Regarding "the sql client, we will maintain two parsers", I want to
> > >>> give
> > >>>>> more inputs:
> > >>>>> We want to introduce sql-gateway into the Flink project (see
> FLIP-24
> > &
> > >>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI
> client
> > >>> and
> > >>>>> the gateway service will communicate through Rest API. The " ADD
> JAR
> > >>>>> /local/path/jar " will be executed in the CLI client machine. So
> when
> > >>> we
> > >>>>> submit a sql file which contains multiple statements, the CLI
> client
> > >>>> needs
> > >>>>> to pick out the "ADD JAR" line, and also statements need to be
> > >>> submitted
> > >>>> or
> > >>>>> executed one by one to make sure the result is correct. The sql
> file
> > >>> may
> > >>>> be
> > >>>>> look like:
> > >>>>>
> > >>>>> SET xxx=yyy;
> > >>>>> create table my_table ...;
> > >>>>> create table my_sink ...;
> > >>>>> ADD JAR /local/path/jar1;
> > >>>>> create function my_udf as com....MyUdf;
> > >>>>> insert into my_sink select ..., my_udf(xx) from ...;
> > >>>>> REMOVE JAR /local/path/jar1;
> > >>>>> drop function my_udf;
> > >>>>> ADD JAR /local/path/jar2;
> > >>>>> create function my_udf as com....MyUdf2;
> > >>>>> insert into my_sink select ..., my_udf(xx) from ...;
> > >>>>>
> > >>>>> The lines need to be splitted into multiple statements first in the
> > >>> CLI
> > >>>>> client, there are two approaches:
> > >>>>> 1. The CLI client depends on the sql-parser: the sql-parser splits
> > the
> > >>>>> lines and tells which lines are "ADD JAR".
> > >>>>> pro: there is only one parser
> > >>>>> cons: It's a little heavy that the CLI client depends on the
> > >>> sql-parser,
> > >>>>> because the CLI client is just a simple tool which receives the
> user
> > >>>>> commands and displays the result. The non "ADD JAR" command will be
> > >>>> parsed
> > >>>>> twice.
> > >>>>>
> > >>>>> 2. The CLI client splits the lines into multiple statements and
> finds
> > >>> the
> > >>>>> ADD JAR command through regex matching.
> > >>>>> pro: The CLI client is very light-weight.
> > >>>>> cons: there are two parsers.
> > >>>>>
> > >>>>> (personally, I prefer the second option)
> > >>>>>
> > >>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
> > >>>>> For default dialect, we support SHOW JARS, but if we switch to hive
> > >>>>> dialect, LIST JARS is also supported.
> > >>>>>
> > >>>>>
> > >>>>> [1]
> > >>>>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> > >>>>> [2]
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >>>>>
> > >>>>> Best,
> > >>>>> Godfrey
> > >>>>>
> > >>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> > >>>>>
> > >>>>>> Hi guys,
> > >>>>>>
> > >>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with
> other
> > >>>>>> commands than LIST JARS. I don't have a strong opinion about
> REMOVE
> > >>> vs
> > >>>>>> DELETE though.
> > >>>>>>
> > >>>>>> While flink doesn't need to follow hive syntax, as far as I know,
> > >>> most
> > >>>>>> users who are requesting these features are previously hive users.
> > >>> So I
> > >>>>>> wonder whether we can support both LIST/SHOW JARS and
> REMOVE/DELETE
> > >>>> JARS
> > >>>>>> as synonyms? It's just like lots of systems accept both EXIT and
> > >>> QUIT
> > >>>> as
> > >>>>>> the command to terminate the program. So if that's not hard to
> > >>> achieve,
> > >>>>> and
> > >>>>>> will make users happier, I don't see a reason why we must choose
> one
> > >>>> over
> > >>>>>> the other.
> > >>>>>>
> > >>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> some feedback regarding the open questions. Maybe we can discuss
> > >>> the
> > >>>>>>> `TableEnvironment.executeMultiSql` story offline to determine how
> > >>> we
> > >>>>>>> proceed with this in the near future.
> > >>>>>>>
> > >>>>>>> 1) "whether the table environment has the ability to update
> > >>> itself"
> > >>>>>>>
> > >>>>>>> Maybe there was some misunderstanding. I don't think that we
> > >>> should
> > >>>>>>> support
> > >>> `tEnv.getConfig.getConfiguration.setString("table.planner",
> > >>>>>>> "old")`. Instead I'm proposing to support
> > >>>>>>> `TableEnvironment.create(Configuration)` where planner and
> > >>> execution
> > >>>>>>> mode are read immediately and a subsequent changes to these
> > >>> options
> > >>>>> will
> > >>>>>>> have no effect. We are doing it similar in `new
> > >>>>>>> StreamExecutionEnvironment(Configuration)`. These two
> > >>> ConfigOption's
> > >>>>>>> must not be SQL Client specific but can be part of the core table
> > >>>> code
> > >>>>>>> base. Many users would like to get a 100% preconfigured
> > >>> environment
> > >>>>> from
> > >>>>>>> just Configuration. And this is not possible right now. We can
> > >>> solve
> > >>>>>>> both use cases in one change.
> > >>>>>>>
> > >>>>>>> 2) "the sql client, we will maintain two parsers"
> > >>>>>>>
> > >>>>>>> I remember we had some discussion about this and decided that we
> > >>>> would
> > >>>>>>> like to maintain only one parser. In the end it is "One Flink
> SQL"
> > >>>>> where
> > >>>>>>> commands influence each other also with respect to keywords. It
> > >>>> should
> > >>>>>>> be fine to include the SQL Client commands in the Flink parser.
> Of
> > >>>>>>> cource the table environment would not be able to handle the
> > >>>>> `Operation`
> > >>>>>>> instance that would be the result but we can introduce hooks to
> > >>>> handle
> > >>>>>>> those `Operation`s. Or we introduce parser extensions.
> > >>>>>>>
> > >>>>>>> Can we skip `table.job.async` in the first version? We should
> > >>> further
> > >>>>>>> discuss whether we introduce a special SQL clause for wrapping
> > >>> async
> > >>>>>>> behavior or if we use a config option? Esp. for streaming queries
> > >>> we
> > >>>>>>> need to be careful and should force users to either "one INSERT
> > >>> INTO"
> > >>>>> or
> > >>>>>>> "one STATEMENT SET".
> > >>>>>>>
> > >>>>>>> 3) 4) "HIVE also uses these commands"
> > >>>>>>>
> > >>>>>>> In general, Hive is not a good reference. Aligning the commands
> > >>> more
> > >>>>>>> with the remaining commands should be our goal. We just had a
> > >>> MODULE
> > >>>>>>> discussion where we selected SHOW instead of LIST. But it is true
> > >>>> that
> > >>>>>>> JARs are not part of the catalog which is why I would not use
> > >>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> > >>>> language.
> > >>>>>>> Take a look at the Java collection API as another example.
> > >>>>>>>
> > >>>>>>> 6) "Most of the commands should belong to the table environment"
> > >>>>>>>
> > >>>>>>> Thanks for updating the FLIP this makes things easier to
> > >>> understand.
> > >>>> It
> > >>>>>>> is good to see that most commends will be available in
> > >>>>> TableEnvironment.
> > >>>>>>> However, I would also support SET and RESET for consistency.
> > >>> Again,
> > >>>>> from
> > >>>>>>> an architectural point of view, if we would allow some kind of
> > >>>>>>> `Operation` hook in table environment, we could check for SQL
> > >>> Client
> > >>>>>>> specific options and forward to regular
> > >>>> `TableConfig.getConfiguration`
> > >>>>>>> otherwise. What do you think?
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Timo
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 03.02.21 08:58, Jark Wu wrote:
> > >>>>>>>> Hi Timo,
> > >>>>>>>>
> > >>>>>>>> I will respond some of the questions:
> > >>>>>>>>
> > >>>>>>>> 1) SQL client specific options
> > >>>>>>>>
> > >>>>>>>> Whether it starts with "table" or "sql-client" depends on where
> > >>> the
> > >>>>>>>> configuration takes effect.
> > >>>>>>>> If it is a table configuration, we should make clear what's the
> > >>>>>> behavior
> > >>>>>>>> when users change
> > >>>>>>>> the configuration in the lifecycle of TableEnvironment.
> > >>>>>>>>
> > >>>>>>>> I agree with Shengkai `sql-client.planner` and
> > >>>>>>> `sql-client.execution.mode`
> > >>>>>>>> are something special
> > >>>>>>>> that can't be changed after TableEnvironment has been
> > >>> initialized.
> > >>>>> You
> > >>>>>>> can
> > >>>>>>>> see
> > >>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method to
> > >>>>> override
> > >>>>>>>> configuration after
> > >>>>>>>> StreamExecutionEnvironment has been initialized.
> > >>>>>>>>
> > >>>>>>>> Therefore, I think it would be better to still use
> > >>>>>> `sql-client.planner`
> > >>>>>>>> and `sql-client.execution.mode`.
> > >>>>>>>>
> > >>>>>>>> 2) Execution file
> > >>>>>>>>
> > >>>>>>>> >From my point of view, there is a big difference between
> > >>>>>>>> `sql-client.job.detach` and
> > >>>>>>>> `TableEnvironment.executeMultiSql()` that
> > >>> `sql-client.job.detach`
> > >>>>> will
> > >>>>>>>> affect every single DML statement
> > >>>>>>>> in the terminal, not only the statements in SQL files. I think
> > >>> the
> > >>>>>> single
> > >>>>>>>> DML statement in the interactive
> > >>>>>>>> terminal is something like tEnv#executeSql() instead of
> > >>>>>>>> tEnv#executeMultiSql.
> > >>>>>>>> So I don't like the "multi" and "sql" keyword in
> > >>>>>> `table.multi-sql-async`.
> > >>>>>>>> I just find that runtime provides a configuration called
> > >>>>>>>> "execution.attached" [1] which is false by default
> > >>>>>>>> which specifies if the pipeline is submitted in attached or
> > >>>> detached
> > >>>>>>> mode.
> > >>>>>>>> It provides exactly the same
> > >>>>>>>> functionality of `sql-client.job.detach`. What do you think
> > >>> about
> > >>>>> using
> > >>>>>>>> this option?
> > >>>>>>>>
> > >>>>>>>> If we also want to support this config in TableEnvironment, I
> > >>> think
> > >>>>> it
> > >>>>>>>> should also affect the DML execution
> > >>>>>>>>    of `tEnv#executeSql()`, not only DMLs in
> > >>>> `tEnv#executeMultiSql()`.
> > >>>>>>>> Therefore, the behavior may look like this:
> > >>>>>>>>
> > >>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
> > >>> by
> > >>>>>>> default
> > >>>>>>>> tableResult.await()   ==> manually block until finish
> > >>>>>>>>
> > >>> tEnv.getConfig().getConfiguration().setString("execution.attached",
> > >>>>>>> "true")
> > >>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> > >>>>> don't
> > >>>>>>> need
> > >>>>>>>> to wait on the TableResult
> > >>>>>>>> tEnv.executeMultiSql(
> > >>>>>>>> """
> > >>>>>>>> CREATE TABLE ....  ==> always sync
> > >>>>>>>> INSERT INTO ...  => sync, because we set configuration above
> > >>>>>>>> SET execution.attached = false;
> > >>>>>>>> INSERT INTO ...  => async
> > >>>>>>>> """)
> > >>>>>>>>
> > >>>>>>>> On the other hand, I think `sql-client.job.detach`
> > >>>>>>>> and `TableEnvironment.executeMultiSql()` should be two separate
> > >>>>> topics,
> > >>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
> > >>>>>>>> `TableEnvironment#executeSql()` to support multi-line
> > >>> statements.
> > >>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
> > >>> it to
> > >>>>>> block
> > >>>>>>>> this FLIP, maybe we can discuss this in another thread.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Jark
> > >>>>>>>>
> > >>>>>>>> [1]:
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > >>>>>>>>
> > >>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi, Timo.
> > >>>>>>>>> Thanks for your detailed feedback. I have some thoughts about
> > >>> your
> > >>>>>>>>> feedback.
> > >>>>>>>>>
> > >>>>>>>>> *Regarding #1*: I think the main problem is whether the table
> > >>>>>>> environment
> > >>>>>>>>> has the ability to update itself. Let's take a simple program
> > >>> as
> > >>>> an
> > >>>>>>>>> example.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> ```
> > >>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
> > >>>>>>>>>
> > >>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
> > >>> "old");
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> tEnv.executeSql("...");
> > >>>>>>>>>
> > >>>>>>>>> ```
> > >>>>>>>>>
> > >>>>>>>>> If we regard this option as a table option, users don't have to
> > >>>>> create
> > >>>>>>>>> another table environment manually. In that case, tEnv needs to
> > >>>>> check
> > >>>>>>>>> whether the current mode and planner are the same as before
> > >>> when
> > >>>>>>> executeSql
> > >>>>>>>>> or explainSql. I don't think it's easy work for the table
> > >>>>> environment,
> > >>>>>>>>> especially if users have a StreamExecutionEnvironment but set
> > >>> old
> > >>>>>>> planner
> > >>>>>>>>> and batch mode. But when we make this option as a sql client
> > >>>> option,
> > >>>>>>> users
> > >>>>>>>>> only use the SET command to change the setting. We can rebuild
> > >>> a
> > >>>> new
> > >>>>>>> table
> > >>>>>>>>> environment when set successes.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> *Regarding #2*: I think we need to discuss the implementation
> > >>>> before
> > >>>>>>>>> continuing this topic. In the sql client, we will maintain two
> > >>>>>> parsers.
> > >>>>>>> The
> > >>>>>>>>> first parser(client parser) will only match the sql client
> > >>>> commands.
> > >>>>>> If
> > >>>>>>> the
> > >>>>>>>>> client parser can't parse the statement, we will leverage the
> > >>>> power
> > >>>>> of
> > >>>>>>> the
> > >>>>>>>>> table environment to execute. According to our blueprint,
> > >>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
> > >>>> Therefore,
> > >>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > >>>>>>>>>
> > >>>>>>>>> But if we need to introduce the
> > >>> `TableEnvironment.executeMultiSql`
> > >>>>> in
> > >>>>>>> the
> > >>>>>>>>> future, I think it's OK to use the option
> > >>> `table.multi-sql-async`
> > >>>>>> rather
> > >>>>>>>>> than option `sql-client.job.detach`. But we think the name is
> > >>> not
> > >>>>>>> suitable
> > >>>>>>>>> because the name is confusing for others. When setting the
> > >>> option
> > >>>>>>> false, we
> > >>>>>>>>> just mean it will block the execution of the INSERT INTO
> > >>>> statement,
> > >>>>>> not
> > >>>>>>> DDL
> > >>>>>>>>> or others(other sql statements are always executed
> > >>> synchronously).
> > >>>>> So
> > >>>>>>> how
> > >>>>>>>>> about `table.job.async`? It only works for the sql-client and
> > >>> the
> > >>>>>>>>> executeMultiSql. If we set this value false, the table
> > >>> environment
> > >>>>>> will
> > >>>>>>>>> return the result until the job finishes.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR and
> > >>>> LIST
> > >>>>>> JAR
> > >>>>>>>>> because HIVE also uses these commands to add the jar into the
> > >>>>>> classpath
> > >>>>>>> or
> > >>>>>>>>> delete the jar. If we use  such commands, it can reduce our
> > >>> work
> > >>>> for
> > >>>>>>> hive
> > >>>>>>>>> compatibility.
> > >>>>>>>>>
> > >>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
> > >>>>> maintained
> > >>>>>> by
> > >>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
> > >>>> grammar,
> > >>>>>>> maybe
> > >>>>>>>>> we should use
> > >>>>>>>>>
> > >>>>>>>>> `ADD JAR` -> `CREATE JAR`,
> > >>>>>>>>> `DELETE JAR` -> `DROP JAR`,
> > >>>>>>>>> `LIST JAR` -> `SHOW JAR`.
> > >>>>>>>>>
> > >>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
> > >>> consistent.
> > >>>>>>>>>
> > >>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to the
> > >>>> table
> > >>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
> > >>>>> identify
> > >>>>>>> which
> > >>>>>>>>> commands should belong to the sql client and which commands
> > >>> should
> > >>>>>>> belong
> > >>>>>>>>> to the table environment. I also add a new section about
> > >>>>>> implementation
> > >>>>>>>>> details in the FLIP.
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Shengkai
> > >>>>>>>>>
> > >>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> > >>>>>>>>>
> > >>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
> > >>> SQL
> > >>>>>> Client
> > >>>>>>> a
> > >>>>>>>>>> very good update and make it production ready.
> > >>>>>>>>>>
> > >>>>>>>>>> Here is some feedback from my side:
> > >>>>>>>>>>
> > >>>>>>>>>> 1) SQL client specific options
> > >>>>>>>>>>
> > >>>>>>>>>> I don't think that `sql-client.planner` and
> > >>>>>> `sql-client.execution.mode`
> > >>>>>>>>>> are SQL Client specific. Similar to
> > >>> `StreamExecutionEnvironment`
> > >>>>> and
> > >>>>>>>>>> `ExecutionConfig#configure` that have been added recently, we
> > >>>>> should
> > >>>>>>>>>> offer a possibility for TableEnvironment. How about we offer
> > >>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
> > >>>> `table.planner`
> > >>>>>> and
> > >>>>>>>>>> `table.execution-mode` to
> > >>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > >>>>>>>>>>
> > >>>>>>>>>> 2) Execution file
> > >>>>>>>>>>
> > >>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1] including
> > >>> the
> > >>>>>>> mailing
> > >>>>>>>>>> list thread at that time? Could you further elaborate how the
> > >>>>>>>>>> multi-statement execution should work for a unified
> > >>>> batch/streaming
> > >>>>>>>>>> story? According to our past discussions, each line in an
> > >>>> execution
> > >>>>>>> file
> > >>>>>>>>>> should be executed blocking which means a streaming query
> > >>> needs a
> > >>>>>>>>>> statement set to execute multiple INSERT INTO statement,
> > >>> correct?
> > >>>>> We
> > >>>>>>>>>> should also offer this functionality in
> > >>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
> > >>>>> `sql-client.job.detach`
> > >>>>>>> is
> > >>>>>>>>>> SQL Client specific needs to be determined, it could also be a
> > >>>>>> general
> > >>>>>>>>>> `table.multi-sql-async` option?
> > >>>>>>>>>>
> > >>>>>>>>>> 3) DELETE JAR
> > >>>>>>>>>>
> > >>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
> > >>> like
> > >>>>> one
> > >>>>>>> is
> > >>>>>>>>>> actively deleting the JAR in the corresponding path.
> > >>>>>>>>>>
> > >>>>>>>>>> 4) LIST JAR
> > >>>>>>>>>>
> > >>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
> > >>> such
> > >>>> as
> > >>>>>>> `SHOW
> > >>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > >>>>>>>>>>
> > >>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > >>>>>>>>>>
> > >>>>>>>>>> We should keep the details in sync with
> > >>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> > >>>>> about
> > >>>>>>>>>> differently named ExplainDetails. I would vote for
> > >>>> `ESTIMATED_COST`
> > >>>>>>>>>> instead of `COST`. I'm sure the original author had a reason
> > >>> why
> > >>>> to
> > >>>>>>> call
> > >>>>>>>>>> it that way.
> > >>>>>>>>>>
> > >>>>>>>>>> 6) Implementation details
> > >>>>>>>>>>
> > >>>>>>>>>> It would be nice to understand how we plan to implement the
> > >>> given
> > >>>>>>>>>> features. Most of the commands and config options should go
> > >>> into
> > >>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
> > >>> users
> > >>>>>> have a
> > >>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
> > >>> provide a
> > >>>>>>> similar
> > >>>>>>>>>> user experience in notebooks or interactive programs than the
> > >>> SQL
> > >>>>>>> Client.
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>>>>>>>> [2]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > >>>>>>>>>>
> > >>>>>>>>>> Regards,
> > >>>>>>>>>> Timo
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
> > >>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather than
> > >>>>>> `UNSET`.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi, Jingsong.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
> > >>> `RESET`
> > >>>> is
> > >>>>>>>>>>>> supported in the current sql client now. Our proposal just
> > >>>>> extends
> > >>>>>>> its
> > >>>>>>>>>>>> grammar and allow users to reset the specified keys.
> > >>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
> > >>> default
> > >>>>>>>>>> value[1].
> > >>>>>>>>>>>> I think it is more friendly for batch users.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>>>>
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too outdated.
> > >>> +1
> > >>>> for
> > >>>>>>>>>>>>> improving it.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>> Jingsong
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> > >>> lirui.fudan@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes look
> > >>>> good
> > >>>>> to
> > >>>>>>>>> me.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> > >>>>> fskmine@gmail.com
> > >>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi, Rui.
> > >>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The main changes:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
> > >>> type.
> > >>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result of
> > >>>>> queries
> > >>>>>> to
> > >>>>>>>>>>>>>> debug
> > >>>>>>>>>>>>>>> when submitting job by -f parameter. It's much convenient
> > >>>>>>> comparing
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>>> writing INSERT INTO statements.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > >>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
> > >>> mode.
> > >>>>> Users
> > >>>>>>>>> can
> > >>>>>>>>>>>>>> set
> > >>>>>>>>>>>>>>> this option false and the client will process the next
> > >>> job
> > >>>>> until
> > >>>>>>>>> the
> > >>>>>>>>>>>>>>> current job finishes. The default value of this option is
> > >>>>> false,
> > >>>>>>>>>> which
> > >>>>>>>>>>>>>>> means the client will execute the next job when the
> > >>> current
> > >>>>> job
> > >>>>>> is
> > >>>>>>>>>>>>>>> submitted.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Shengkai,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
> > >>> have
> > >>>>>>>>> different
> > >>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
> > >>>>> example,
> > >>>>>> if
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>>> client just submits the job and exits, what happens if
> > >>> the
> > >>>>> file
> > >>>>>>>>>>>>>> contains
> > >>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
> > >>> them
> > >>>> as
> > >>>>> a
> > >>>>>>>>>>>>>> statement
> > >>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
> > >>> STATEMENT
> > >>>>> SET
> > >>>>>> in
> > >>>>>>>>>> that
> > >>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit the
> > >>>> two
> > >>>>>>> jobs,
> > >>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > >>>>>> fskmine@gmail.com
> > >>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi Rui,
> > >>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
> > >>> suggestions.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
> > >>> the
> > >>>> set
> > >>>>>>>>>>>>>> command. In
> > >>>>>>>>>>>>>>>>> the implementation, it will just put the key-value into
> > >>>> the
> > >>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
> > >>> table
> > >>>>>>> config.
> > >>>>>>>>>> If
> > >>>>>>>>>>>>>> hive
> > >>>>>>>>>>>>>>>>> supports to read the setting from the table config,
> > >>> users
> > >>>>> are
> > >>>>>>>>> able
> > >>>>>>>>>>>>>> to set
> > >>>>>>>>>>>>>>>>> the hive-related settings.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit the
> > >>> job
> > >>>>> and
> > >>>>>>>>>> exit.
> > >>>>>>>>>>>>>> If
> > >>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job by
> > >>>>>>>>> themselves,
> > >>>>>>>>>>>>>> which is
> > >>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
> > >>> case,
> > >>>>>>> queries
> > >>>>>>>>>>>>>> are used
> > >>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
> > >>>>>> interactive
> > >>>>>>>>>>>>>> mode.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
> > >>> think
> > >>>> it
> > >>>>>>>>>> covers a
> > >>>>>>>>>>>>>>>>>> lot of useful features which will dramatically improve
> > >>>> the
> > >>>>>>>>>>>>>> usability of our
> > >>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
> > >>>>> configurations
> > >>>>>>>>> via
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> SET command? A connector may have its own
> > >>> configurations
> > >>>>> and
> > >>>>>> we
> > >>>>>>>>>>>>>> don't have
> > >>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in SQL
> > >>>>>> Client.
> > >>>>>>>>> For
> > >>>>>>>>>>>>>> example,
> > >>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
> > >>> using
> > >>>>> hive
> > >>>>>>>>>>>>>> connector [1].
> > >>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
> > >>> files
> > >>>>>>>>> specified
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option but
> > >>>> allows
> > >>>>>>>>>> queries
> > >>>>>>>>>>>>>> in the
> > >>>>>>>>>>>>>>>>>> file. And a common use case is to run some query and
> > >>>>> redirect
> > >>>>>>>>> the
> > >>>>>>>>>>>>>> results
> > >>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like to
> > >>> do
> > >>>>> the
> > >>>>>>>>> same,
> > >>>>>>>>>>>>>>>>>> especially in batch scenarios.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > >>>>>>>>>>>>>> liuyang0704@gmail.com>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Hi Shengkai,
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
> > >>> additional
> > >>>>>>>>>>>>>> suggestions:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > >>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
> > >>> sql.
> > >>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> > >>>>> collect
> > >>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> results
> > >>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
> > >>>>>>>>>>>>>>>>>>>          which may have memory issues in JM or Local
> > >>> for
> > >>>>> the
> > >>>>>>> big
> > >>>>>>>>>> query
> > >>>>>>>>>>>>>>>>>>> result.
> > >>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > >>>>>>>>>>>>>>>>>>>          We may change to use SelectTableSink, which
> > >>> is
> > >>>>> based
> > >>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > >>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
> > >>> is in
> > >>>>>>>>> FLIP-91.
> > >>>>>>>>>>>>>> Seems
> > >>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > >>>>>>>>>>>>>>>>>>>          Provide a long running service out of the
> > >>> box to
> > >>>>>>>>>> facilitate
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> sql
> > >>>>>>>>>>>>>>>>>>> submission is necessary.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> What do you think of these?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
> > >>>> 下午8:54写道：
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Hi devs,
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
> > >>>> FLIP-163:SQL
> > >>>>>>>>> Client
> > >>>>>>>>>>>>>>>>>>>> Improvements.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of the
> > >>>> sql
> > >>>>>>>>> client.
> > >>>>>>>>>>>>>> For
> > >>>>>>>>>>>>>>>>>>>> example, users can not register the table proposed
> > >>> by
> > >>>>>>> FLIP-95.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
> > >>>> initialize
> > >>>>>> the
> > >>>>>>>>>>>>>> table
> > >>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
> > >>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> > >>>>> parameter;
> > >>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > >>>>>>>>>>>>>>>>>>>> - support statement set syntax;
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
> > >>> FLIP-163[1].
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> *With kind regards
> > >>>>>>>>>>>>>>>>>>>
> > >>>>> ------------------------------------------------------------
> > >>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
> > >>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > >>>>>> Science
> > >>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > >>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <liuyang0704@gmail.com
> > >>>>
> > >>>>>>>>>>>>>>>>>>> QQ: 3239559*
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>> Best regards!
> > >>>>>>>>>>>>>>>>>> Rui Li
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>> Best regards!
> > >>>>>>>>>>>>>>>> Rui Li
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>> Best regards!
> > >>>>>>>>>>>>>> Rui Li
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>> Best, Jingsong Lee
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best regards!
> > >>>>>> Rui Li
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>


-- 
Best regards!
Rui Li

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi Timo,

1) How should we execute statements in CLI and in file? Should there be a
difference?
I do think we should unify the behavior of CLI and SQL files. SQL files can
be thought of as a shortcut of
"start CLI" => "copy content of SQL files" => "past content in CLI".
Actually, we already did this in kafka_e2e.sql [1].
I think it's hard for users to understand why SQL files behave differently
from CLI, all the other systems don't have such a difference.

If we distinguish SQL files and CLI, should there be a difference in JDBC
driver and UI platform?
Personally, they all should have consistent behavior.

2) Should we have different behavior for batch and streaming?
I think we all agree streaming users prefer async execution, otherwise it's
weird and difficult to use if the
submit script or CLI never exists. On the other hand, batch SQL users are
used to SQL statements being
executed blockly.

Either unified async execution or unified sync execution, will hurt one
side of the streaming
batch users. In order to make both sides happy, I think we can have
different behavior for batch and streaming.
There are many essential differences between batch and stream systems, I
think it's normal to have some
different behaviors, and the behavior doesn't break the unified batch
stream semantics.


Thus, I'm +1 to Alternative 1:
We consider batch/streaming mode and block for batch INSERT INTO and async
for streaming INSERT INTO/STATEMENT SET.
And this behavior is consistent across CLI and files.

Best,
Jark

[1]:
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql

On Fri, 5 Feb 2021 at 21:49, Timo Walther <tw...@apache.org> wrote:

> Hi Jark,
>
> thanks for the summary. I hope we can also find a good long-term
> solution on the async/sync execution behavior topic.
>
> It should be discussed in a bigger round because it is (similar to the
> time function discussion) related to batch-streaming unification where
> we should stick to the SQL standard to some degree but also need to come
> up with good streaming semantics.
>
> Let me summarize the problem again to hear opinions:
>
> - Batch SQL users are used to execute SQL files sequentially (from top
> to bottom).
> - Batch SQL users are used to SQL statements being executed blocking.
> One after the other. Esp. when moving around data with INSERT INTO.
> - Streaming users prefer async execution because unbounded stream are
> more frequent than bounded streams.
> - We decided to make Flink Table API is async because in a programming
> language it is easy to call `.await()` on the result to make it blocking.
> - INSERT INTO statements in the current SQL Client implementation are
> always submitted asynchrounous.
> - Other client's such as Ververica platform allow only one INSERT INTO
> or a STATEMENT SET at the end of a file that will run asynchrounously.
>
> Questions:
>
> - How should we execute statements in CLI and in file? Should there be a
> difference?
> - Should we have different behavior for batch and streaming?
> - Shall we solve parts with a config option or is it better to make it
> explicit in the SQL job definition because it influences the semantics
> of multiple INSERT INTOs?
>
> Let me summarize my opinion at the moment:
>
> - SQL files should always be executed blocking by default. Because they
> could potentially contain a long list of INSERT INTO statements. This
> would be SQL standard compliant.
> - If we allow async execution, we should make this explicit in the SQL
> file via `BEGIN ASYNC; ... END;`.
> - In the CLI, we always execute async to maintain the old behavior. We
> can also assume that people are only using the CLI to fire statements
> and close the CLI afterwards.
>
> Alternative 1:
> - We consider batch/streaming mode and block for batch INSERT INTO and
> async for streaming INSERT INTO/STATEMENT SET
>
> What do others think?
>
> Regards,
> Timo
>
>
>
>
> On 05.02.21 04:03, Jark Wu wrote:
> > Hi all,
> >
> > After an offline discussion with Timo and Kurt, we have reached some
> > consensus.
> > Please correct me if I am wrong or missed anything.
> >
> > 1) We will introduce "table.planner" and "table.execution-mode" instead
> of
> > "sql-client" prefix,
> > and add `TableEnvironment.create(Configuration)` interface. These 2
> options
> > can only be used
> > for tableEnv initialization. If used after initialization, Flink should
> > throw an exception. We may can
> > support dynamic switch the planner in the future.
> >
> > 2) We will have only one parser,
> > i.e. org.apache.flink.table.delegation.Parser. It accepts a string
> > statement, and returns a list of Operation. It will first use regex to
> > match some special statement,
> >   e.g. SET, ADD JAR, others will be delegated to the underlying Calcite
> > parser. The Parser can
> > have different implementations, e.g. HiveParser.
> >
> > 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect. But
> we
> > can allow
> > DELETE JAR, LIST JAR in Hive dialect through HiveParser.
> >
> > 4) We don't have a conclusion for async/sync execution behavior yet.
> >
> > Best,
> > Jark
> >
> >
> >
> > On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
> >
> >> Hi Ingo,
> >>
> >> Since we have supported the WITH syntax and SET command since v1.9
> [1][2],
> >> and
> >> we have never received such complaints, I think it's fine for such
> >> differences.
> >>
> >> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
> >> string literal keys[3],
> >> and the SET <key>=<value> doesn't allow quoted keys [4].
> >>
> >> Best,
> >> Jark
> >>
> >> [1]:
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
> >> [2]:
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
> >> [3]:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> >> [4]:
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
> >> (search "set mapred.reduce.tasks=32")
> >>
> >> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> regarding the (un-)quoted question, compatibility is of course an
> >>> important
> >>> argument, but in terms of consistency I'd find it a bit surprising that
> >>> WITH handles it differently than SET, and I wonder if that could cause
> >>> friction for developers when writing their SQL.
> >>>
> >>>
> >>> Regards
> >>> Ingo
> >>>
> >>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> Regarding "One Parser", I think it's not possible for now because
> >>> Calcite
> >>>> parser can't parse
> >>>> special characters (e.g. "-") unless quoting them as string literals.
> >>>> That's why the WITH option
> >>>> key are string literals not identifiers.
> >>>>
> >>>> SET table.exec.mini-batch.enabled = true and ADD JAR
> >>>> /local/my-home/test.jar
> >>>> have the same
> >>>> problems. That's why we propose two parser, one splits lines into
> >>> multiple
> >>>> statements and match special
> >>>> command through regex which is light-weight, and delegate other
> >>> statements
> >>>> to the other parser which is Calcite parser.
> >>>>
> >>>> Note: we should stick on the unquoted SET
> table.exec.mini-batch.enabled
> >>> =
> >>>> true syntax,
> >>>> both for backward-compatibility and easy-to-use, and all the other
> >>> systems
> >>>> don't have quotes on the key.
> >>>>
> >>>>
> >>>> Regarding "table.planner" vs "sql-client.planner",
> >>>> if we want to use "table.planner", I think we should explain clearly
> >>> what's
> >>>> the scope it can be used in documentation.
> >>>> Otherwise, there will be users complaining why the planner doesn't
> >>> change
> >>>> when setting the configuration on TableEnv.
> >>>> Would be better throwing an exception to indicate users it's now
> >>> allowed to
> >>>> change planner after TableEnv is initialized.
> >>>> However, it seems not easy to implement.
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com> wrote:
> >>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>> Regarding "table.planner" and "table.execution-mode"
> >>>>> If we define that those two options are just used to initialize the
> >>>>> TableEnvironment, +1 for introducing table options instead of
> >>> sql-client
> >>>>> options.
> >>>>>
> >>>>> Regarding "the sql client, we will maintain two parsers", I want to
> >>> give
> >>>>> more inputs:
> >>>>> We want to introduce sql-gateway into the Flink project (see FLIP-24
> &
> >>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client
> >>> and
> >>>>> the gateway service will communicate through Rest API. The " ADD JAR
> >>>>> /local/path/jar " will be executed in the CLI client machine. So when
> >>> we
> >>>>> submit a sql file which contains multiple statements, the CLI client
> >>>> needs
> >>>>> to pick out the "ADD JAR" line, and also statements need to be
> >>> submitted
> >>>> or
> >>>>> executed one by one to make sure the result is correct. The sql file
> >>> may
> >>>> be
> >>>>> look like:
> >>>>>
> >>>>> SET xxx=yyy;
> >>>>> create table my_table ...;
> >>>>> create table my_sink ...;
> >>>>> ADD JAR /local/path/jar1;
> >>>>> create function my_udf as com....MyUdf;
> >>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>> REMOVE JAR /local/path/jar1;
> >>>>> drop function my_udf;
> >>>>> ADD JAR /local/path/jar2;
> >>>>> create function my_udf as com....MyUdf2;
> >>>>> insert into my_sink select ..., my_udf(xx) from ...;
> >>>>>
> >>>>> The lines need to be splitted into multiple statements first in the
> >>> CLI
> >>>>> client, there are two approaches:
> >>>>> 1. The CLI client depends on the sql-parser: the sql-parser splits
> the
> >>>>> lines and tells which lines are "ADD JAR".
> >>>>> pro: there is only one parser
> >>>>> cons: It's a little heavy that the CLI client depends on the
> >>> sql-parser,
> >>>>> because the CLI client is just a simple tool which receives the user
> >>>>> commands and displays the result. The non "ADD JAR" command will be
> >>>> parsed
> >>>>> twice.
> >>>>>
> >>>>> 2. The CLI client splits the lines into multiple statements and finds
> >>> the
> >>>>> ADD JAR command through regex matching.
> >>>>> pro: The CLI client is very light-weight.
> >>>>> cons: there are two parsers.
> >>>>>
> >>>>> (personally, I prefer the second option)
> >>>>>
> >>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
> >>>>> For default dialect, we support SHOW JARS, but if we switch to hive
> >>>>> dialect, LIST JARS is also supported.
> >>>>>
> >>>>>
> >>>>> [1]
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> >>>>> [2]
> >>>>>
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>
> >>>>> Best,
> >>>>> Godfrey
> >>>>>
> >>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> >>>>>
> >>>>>> Hi guys,
> >>>>>>
> >>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> >>>>>> commands than LIST JARS. I don't have a strong opinion about REMOVE
> >>> vs
> >>>>>> DELETE though.
> >>>>>>
> >>>>>> While flink doesn't need to follow hive syntax, as far as I know,
> >>> most
> >>>>>> users who are requesting these features are previously hive users.
> >>> So I
> >>>>>> wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
> >>>> JARS
> >>>>>> as synonyms? It's just like lots of systems accept both EXIT and
> >>> QUIT
> >>>> as
> >>>>>> the command to terminate the program. So if that's not hard to
> >>> achieve,
> >>>>> and
> >>>>>> will make users happier, I don't see a reason why we must choose one
> >>>> over
> >>>>>> the other.
> >>>>>>
> >>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> some feedback regarding the open questions. Maybe we can discuss
> >>> the
> >>>>>>> `TableEnvironment.executeMultiSql` story offline to determine how
> >>> we
> >>>>>>> proceed with this in the near future.
> >>>>>>>
> >>>>>>> 1) "whether the table environment has the ability to update
> >>> itself"
> >>>>>>>
> >>>>>>> Maybe there was some misunderstanding. I don't think that we
> >>> should
> >>>>>>> support
> >>> `tEnv.getConfig.getConfiguration.setString("table.planner",
> >>>>>>> "old")`. Instead I'm proposing to support
> >>>>>>> `TableEnvironment.create(Configuration)` where planner and
> >>> execution
> >>>>>>> mode are read immediately and a subsequent changes to these
> >>> options
> >>>>> will
> >>>>>>> have no effect. We are doing it similar in `new
> >>>>>>> StreamExecutionEnvironment(Configuration)`. These two
> >>> ConfigOption's
> >>>>>>> must not be SQL Client specific but can be part of the core table
> >>>> code
> >>>>>>> base. Many users would like to get a 100% preconfigured
> >>> environment
> >>>>> from
> >>>>>>> just Configuration. And this is not possible right now. We can
> >>> solve
> >>>>>>> both use cases in one change.
> >>>>>>>
> >>>>>>> 2) "the sql client, we will maintain two parsers"
> >>>>>>>
> >>>>>>> I remember we had some discussion about this and decided that we
> >>>> would
> >>>>>>> like to maintain only one parser. In the end it is "One Flink SQL"
> >>>>> where
> >>>>>>> commands influence each other also with respect to keywords. It
> >>>> should
> >>>>>>> be fine to include the SQL Client commands in the Flink parser. Of
> >>>>>>> cource the table environment would not be able to handle the
> >>>>> `Operation`
> >>>>>>> instance that would be the result but we can introduce hooks to
> >>>> handle
> >>>>>>> those `Operation`s. Or we introduce parser extensions.
> >>>>>>>
> >>>>>>> Can we skip `table.job.async` in the first version? We should
> >>> further
> >>>>>>> discuss whether we introduce a special SQL clause for wrapping
> >>> async
> >>>>>>> behavior or if we use a config option? Esp. for streaming queries
> >>> we
> >>>>>>> need to be careful and should force users to either "one INSERT
> >>> INTO"
> >>>>> or
> >>>>>>> "one STATEMENT SET".
> >>>>>>>
> >>>>>>> 3) 4) "HIVE also uses these commands"
> >>>>>>>
> >>>>>>> In general, Hive is not a good reference. Aligning the commands
> >>> more
> >>>>>>> with the remaining commands should be our goal. We just had a
> >>> MODULE
> >>>>>>> discussion where we selected SHOW instead of LIST. But it is true
> >>>> that
> >>>>>>> JARs are not part of the catalog which is why I would not use
> >>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> >>>> language.
> >>>>>>> Take a look at the Java collection API as another example.
> >>>>>>>
> >>>>>>> 6) "Most of the commands should belong to the table environment"
> >>>>>>>
> >>>>>>> Thanks for updating the FLIP this makes things easier to
> >>> understand.
> >>>> It
> >>>>>>> is good to see that most commends will be available in
> >>>>> TableEnvironment.
> >>>>>>> However, I would also support SET and RESET for consistency.
> >>> Again,
> >>>>> from
> >>>>>>> an architectural point of view, if we would allow some kind of
> >>>>>>> `Operation` hook in table environment, we could check for SQL
> >>> Client
> >>>>>>> specific options and forward to regular
> >>>> `TableConfig.getConfiguration`
> >>>>>>> otherwise. What do you think?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>>
> >>>>>>> On 03.02.21 08:58, Jark Wu wrote:
> >>>>>>>> Hi Timo,
> >>>>>>>>
> >>>>>>>> I will respond some of the questions:
> >>>>>>>>
> >>>>>>>> 1) SQL client specific options
> >>>>>>>>
> >>>>>>>> Whether it starts with "table" or "sql-client" depends on where
> >>> the
> >>>>>>>> configuration takes effect.
> >>>>>>>> If it is a table configuration, we should make clear what's the
> >>>>>> behavior
> >>>>>>>> when users change
> >>>>>>>> the configuration in the lifecycle of TableEnvironment.
> >>>>>>>>
> >>>>>>>> I agree with Shengkai `sql-client.planner` and
> >>>>>>> `sql-client.execution.mode`
> >>>>>>>> are something special
> >>>>>>>> that can't be changed after TableEnvironment has been
> >>> initialized.
> >>>>> You
> >>>>>>> can
> >>>>>>>> see
> >>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method to
> >>>>> override
> >>>>>>>> configuration after
> >>>>>>>> StreamExecutionEnvironment has been initialized.
> >>>>>>>>
> >>>>>>>> Therefore, I think it would be better to still use
> >>>>>> `sql-client.planner`
> >>>>>>>> and `sql-client.execution.mode`.
> >>>>>>>>
> >>>>>>>> 2) Execution file
> >>>>>>>>
> >>>>>>>> >From my point of view, there is a big difference between
> >>>>>>>> `sql-client.job.detach` and
> >>>>>>>> `TableEnvironment.executeMultiSql()` that
> >>> `sql-client.job.detach`
> >>>>> will
> >>>>>>>> affect every single DML statement
> >>>>>>>> in the terminal, not only the statements in SQL files. I think
> >>> the
> >>>>>> single
> >>>>>>>> DML statement in the interactive
> >>>>>>>> terminal is something like tEnv#executeSql() instead of
> >>>>>>>> tEnv#executeMultiSql.
> >>>>>>>> So I don't like the "multi" and "sql" keyword in
> >>>>>> `table.multi-sql-async`.
> >>>>>>>> I just find that runtime provides a configuration called
> >>>>>>>> "execution.attached" [1] which is false by default
> >>>>>>>> which specifies if the pipeline is submitted in attached or
> >>>> detached
> >>>>>>> mode.
> >>>>>>>> It provides exactly the same
> >>>>>>>> functionality of `sql-client.job.detach`. What do you think
> >>> about
> >>>>> using
> >>>>>>>> this option?
> >>>>>>>>
> >>>>>>>> If we also want to support this config in TableEnvironment, I
> >>> think
> >>>>> it
> >>>>>>>> should also affect the DML execution
> >>>>>>>>    of `tEnv#executeSql()`, not only DMLs in
> >>>> `tEnv#executeMultiSql()`.
> >>>>>>>> Therefore, the behavior may look like this:
> >>>>>>>>
> >>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
> >>> by
> >>>>>>> default
> >>>>>>>> tableResult.await()   ==> manually block until finish
> >>>>>>>>
> >>> tEnv.getConfig().getConfiguration().setString("execution.attached",
> >>>>>>> "true")
> >>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> >>>>> don't
> >>>>>>> need
> >>>>>>>> to wait on the TableResult
> >>>>>>>> tEnv.executeMultiSql(
> >>>>>>>> """
> >>>>>>>> CREATE TABLE ....  ==> always sync
> >>>>>>>> INSERT INTO ...  => sync, because we set configuration above
> >>>>>>>> SET execution.attached = false;
> >>>>>>>> INSERT INTO ...  => async
> >>>>>>>> """)
> >>>>>>>>
> >>>>>>>> On the other hand, I think `sql-client.job.detach`
> >>>>>>>> and `TableEnvironment.executeMultiSql()` should be two separate
> >>>>> topics,
> >>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
> >>>>>>>> `TableEnvironment#executeSql()` to support multi-line
> >>> statements.
> >>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
> >>> it to
> >>>>>> block
> >>>>>>>> this FLIP, maybe we can discuss this in another thread.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>> [1]:
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> >>>>>>>>
> >>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi, Timo.
> >>>>>>>>> Thanks for your detailed feedback. I have some thoughts about
> >>> your
> >>>>>>>>> feedback.
> >>>>>>>>>
> >>>>>>>>> *Regarding #1*: I think the main problem is whether the table
> >>>>>>> environment
> >>>>>>>>> has the ability to update itself. Let's take a simple program
> >>> as
> >>>> an
> >>>>>>>>> example.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
> >>>>>>>>>
> >>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
> >>> "old");
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> tEnv.executeSql("...");
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> If we regard this option as a table option, users don't have to
> >>>>> create
> >>>>>>>>> another table environment manually. In that case, tEnv needs to
> >>>>> check
> >>>>>>>>> whether the current mode and planner are the same as before
> >>> when
> >>>>>>> executeSql
> >>>>>>>>> or explainSql. I don't think it's easy work for the table
> >>>>> environment,
> >>>>>>>>> especially if users have a StreamExecutionEnvironment but set
> >>> old
> >>>>>>> planner
> >>>>>>>>> and batch mode. But when we make this option as a sql client
> >>>> option,
> >>>>>>> users
> >>>>>>>>> only use the SET command to change the setting. We can rebuild
> >>> a
> >>>> new
> >>>>>>> table
> >>>>>>>>> environment when set successes.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> *Regarding #2*: I think we need to discuss the implementation
> >>>> before
> >>>>>>>>> continuing this topic. In the sql client, we will maintain two
> >>>>>> parsers.
> >>>>>>> The
> >>>>>>>>> first parser(client parser) will only match the sql client
> >>>> commands.
> >>>>>> If
> >>>>>>> the
> >>>>>>>>> client parser can't parse the statement, we will leverage the
> >>>> power
> >>>>> of
> >>>>>>> the
> >>>>>>>>> table environment to execute. According to our blueprint,
> >>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
> >>>> Therefore,
> >>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> >>>>>>>>>
> >>>>>>>>> But if we need to introduce the
> >>> `TableEnvironment.executeMultiSql`
> >>>>> in
> >>>>>>> the
> >>>>>>>>> future, I think it's OK to use the option
> >>> `table.multi-sql-async`
> >>>>>> rather
> >>>>>>>>> than option `sql-client.job.detach`. But we think the name is
> >>> not
> >>>>>>> suitable
> >>>>>>>>> because the name is confusing for others. When setting the
> >>> option
> >>>>>>> false, we
> >>>>>>>>> just mean it will block the execution of the INSERT INTO
> >>>> statement,
> >>>>>> not
> >>>>>>> DDL
> >>>>>>>>> or others(other sql statements are always executed
> >>> synchronously).
> >>>>> So
> >>>>>>> how
> >>>>>>>>> about `table.job.async`? It only works for the sql-client and
> >>> the
> >>>>>>>>> executeMultiSql. If we set this value false, the table
> >>> environment
> >>>>>> will
> >>>>>>>>> return the result until the job finishes.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR and
> >>>> LIST
> >>>>>> JAR
> >>>>>>>>> because HIVE also uses these commands to add the jar into the
> >>>>>> classpath
> >>>>>>> or
> >>>>>>>>> delete the jar. If we use  such commands, it can reduce our
> >>> work
> >>>> for
> >>>>>>> hive
> >>>>>>>>> compatibility.
> >>>>>>>>>
> >>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
> >>>>> maintained
> >>>>>> by
> >>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
> >>>> grammar,
> >>>>>>> maybe
> >>>>>>>>> we should use
> >>>>>>>>>
> >>>>>>>>> `ADD JAR` -> `CREATE JAR`,
> >>>>>>>>> `DELETE JAR` -> `DROP JAR`,
> >>>>>>>>> `LIST JAR` -> `SHOW JAR`.
> >>>>>>>>>
> >>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
> >>> consistent.
> >>>>>>>>>
> >>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to the
> >>>> table
> >>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
> >>>>> identify
> >>>>>>> which
> >>>>>>>>> commands should belong to the sql client and which commands
> >>> should
> >>>>>>> belong
> >>>>>>>>> to the table environment. I also add a new section about
> >>>>>> implementation
> >>>>>>>>> details in the FLIP.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Shengkai
> >>>>>>>>>
> >>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> >>>>>>>>>
> >>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
> >>> SQL
> >>>>>> Client
> >>>>>>> a
> >>>>>>>>>> very good update and make it production ready.
> >>>>>>>>>>
> >>>>>>>>>> Here is some feedback from my side:
> >>>>>>>>>>
> >>>>>>>>>> 1) SQL client specific options
> >>>>>>>>>>
> >>>>>>>>>> I don't think that `sql-client.planner` and
> >>>>>> `sql-client.execution.mode`
> >>>>>>>>>> are SQL Client specific. Similar to
> >>> `StreamExecutionEnvironment`
> >>>>> and
> >>>>>>>>>> `ExecutionConfig#configure` that have been added recently, we
> >>>>> should
> >>>>>>>>>> offer a possibility for TableEnvironment. How about we offer
> >>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
> >>>> `table.planner`
> >>>>>> and
> >>>>>>>>>> `table.execution-mode` to
> >>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
> >>>>>>>>>>
> >>>>>>>>>> 2) Execution file
> >>>>>>>>>>
> >>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1] including
> >>> the
> >>>>>>> mailing
> >>>>>>>>>> list thread at that time? Could you further elaborate how the
> >>>>>>>>>> multi-statement execution should work for a unified
> >>>> batch/streaming
> >>>>>>>>>> story? According to our past discussions, each line in an
> >>>> execution
> >>>>>>> file
> >>>>>>>>>> should be executed blocking which means a streaming query
> >>> needs a
> >>>>>>>>>> statement set to execute multiple INSERT INTO statement,
> >>> correct?
> >>>>> We
> >>>>>>>>>> should also offer this functionality in
> >>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
> >>>>> `sql-client.job.detach`
> >>>>>>> is
> >>>>>>>>>> SQL Client specific needs to be determined, it could also be a
> >>>>>> general
> >>>>>>>>>> `table.multi-sql-async` option?
> >>>>>>>>>>
> >>>>>>>>>> 3) DELETE JAR
> >>>>>>>>>>
> >>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
> >>> like
> >>>>> one
> >>>>>>> is
> >>>>>>>>>> actively deleting the JAR in the corresponding path.
> >>>>>>>>>>
> >>>>>>>>>> 4) LIST JAR
> >>>>>>>>>>
> >>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
> >>> such
> >>>> as
> >>>>>>> `SHOW
> >>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
> >>>>>>>>>>
> >>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >>>>>>>>>>
> >>>>>>>>>> We should keep the details in sync with
> >>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> >>>>> about
> >>>>>>>>>> differently named ExplainDetails. I would vote for
> >>>> `ESTIMATED_COST`
> >>>>>>>>>> instead of `COST`. I'm sure the original author had a reason
> >>> why
> >>>> to
> >>>>>>> call
> >>>>>>>>>> it that way.
> >>>>>>>>>>
> >>>>>>>>>> 6) Implementation details
> >>>>>>>>>>
> >>>>>>>>>> It would be nice to understand how we plan to implement the
> >>> given
> >>>>>>>>>> features. Most of the commands and config options should go
> >>> into
> >>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
> >>> users
> >>>>>> have a
> >>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
> >>> provide a
> >>>>>>> similar
> >>>>>>>>>> user experience in notebooks or interactive programs than the
> >>> SQL
> >>>>>>> Client.
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>>>>>>> [2]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Timo
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
> >>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather than
> >>>>>> `UNSET`.
> >>>>>>>>>>>
> >>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi, Jingsong.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
> >>> `RESET`
> >>>> is
> >>>>>>>>>>>> supported in the current sql client now. Our proposal just
> >>>>> extends
> >>>>>>> its
> >>>>>>>>>>>> grammar and allow users to reset the specified keys.
> >>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
> >>> default
> >>>>>>>>>> value[1].
> >>>>>>>>>>>> I think it is more friendly for batch users.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>>>>>>>>>>>
> >>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too outdated.
> >>> +1
> >>>> for
> >>>>>>>>>>>>> improving it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Jingsong
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> >>> lirui.fudan@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes look
> >>>> good
> >>>>> to
> >>>>>>>>> me.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> >>>>> fskmine@gmail.com
> >>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi, Rui.
> >>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The main changes:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
> >>> type.
> >>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result of
> >>>>> queries
> >>>>>> to
> >>>>>>>>>>>>>> debug
> >>>>>>>>>>>>>>> when submitting job by -f parameter. It's much convenient
> >>>>>>> comparing
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> writing INSERT INTO statements.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> >>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
> >>> mode.
> >>>>> Users
> >>>>>>>>> can
> >>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>> this option false and the client will process the next
> >>> job
> >>>>> until
> >>>>>>>>> the
> >>>>>>>>>>>>>>> current job finishes. The default value of this option is
> >>>>> false,
> >>>>>>>>>> which
> >>>>>>>>>>>>>>> means the client will execute the next job when the
> >>> current
> >>>>> job
> >>>>>> is
> >>>>>>>>>>>>>>> submitted.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
> >>> have
> >>>>>>>>> different
> >>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
> >>>>> example,
> >>>>>> if
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> client just submits the job and exits, what happens if
> >>> the
> >>>>> file
> >>>>>>>>>>>>>> contains
> >>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
> >>> them
> >>>> as
> >>>>> a
> >>>>>>>>>>>>>> statement
> >>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
> >>> STATEMENT
> >>>>> SET
> >>>>>> in
> >>>>>>>>>> that
> >>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit the
> >>>> two
> >>>>>>> jobs,
> >>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> >>>>>> fskmine@gmail.com
> >>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Rui,
> >>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
> >>> suggestions.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
> >>> the
> >>>> set
> >>>>>>>>>>>>>> command. In
> >>>>>>>>>>>>>>>>> the implementation, it will just put the key-value into
> >>>> the
> >>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
> >>> table
> >>>>>>> config.
> >>>>>>>>>> If
> >>>>>>>>>>>>>> hive
> >>>>>>>>>>>>>>>>> supports to read the setting from the table config,
> >>> users
> >>>>> are
> >>>>>>>>> able
> >>>>>>>>>>>>>> to set
> >>>>>>>>>>>>>>>>> the hive-related settings.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit the
> >>> job
> >>>>> and
> >>>>>>>>>> exit.
> >>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job by
> >>>>>>>>> themselves,
> >>>>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
> >>> case,
> >>>>>>> queries
> >>>>>>>>>>>>>> are used
> >>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
> >>>>>> interactive
> >>>>>>>>>>>>>> mode.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
> >>> think
> >>>> it
> >>>>>>>>>> covers a
> >>>>>>>>>>>>>>>>>> lot of useful features which will dramatically improve
> >>>> the
> >>>>>>>>>>>>>> usability of our
> >>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
> >>>>> configurations
> >>>>>>>>> via
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> SET command? A connector may have its own
> >>> configurations
> >>>>> and
> >>>>>> we
> >>>>>>>>>>>>>> don't have
> >>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in SQL
> >>>>>> Client.
> >>>>>>>>> For
> >>>>>>>>>>>>>> example,
> >>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
> >>> using
> >>>>> hive
> >>>>>>>>>>>>>> connector [1].
> >>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
> >>> files
> >>>>>>>>> specified
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option but
> >>>> allows
> >>>>>>>>>> queries
> >>>>>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>> file. And a common use case is to run some query and
> >>>>> redirect
> >>>>>>>>> the
> >>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like to
> >>> do
> >>>>> the
> >>>>>>>>> same,
> >>>>>>>>>>>>>>>>>> especially in batch scenarios.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>>>>>>>>>>>> liuyang0704@gmail.com>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
> >>> additional
> >>>>>>>>>>>>>> suggestions:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> >>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
> >>> sql.
> >>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> >>>>> collect
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
> >>>>>>>>>>>>>>>>>>>          which may have memory issues in JM or Local
> >>> for
> >>>>> the
> >>>>>>> big
> >>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>> result.
> >>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> >>>>>>>>>>>>>>>>>>>          We may change to use SelectTableSink, which
> >>> is
> >>>>> based
> >>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
> >>> is in
> >>>>>>>>> FLIP-91.
> >>>>>>>>>>>>>> Seems
> >>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> >>>>>>>>>>>>>>>>>>>          Provide a long running service out of the
> >>> box to
> >>>>>>>>>> facilitate
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> sql
> >>>>>>>>>>>>>>>>>>> submission is necessary.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> What do you think of these?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
> >>>> 下午8:54写道：
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
> >>>> FLIP-163:SQL
> >>>>>>>>> Client
> >>>>>>>>>>>>>>>>>>>> Improvements.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of the
> >>>> sql
> >>>>>>>>> client.
> >>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>> example, users can not register the table proposed
> >>> by
> >>>>>>> FLIP-95.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
> >>>> initialize
> >>>>>> the
> >>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> >>>>> parameter;
> >>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>>>>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
> >>> FLIP-163[1].
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> *With kind regards
> >>>>>>>>>>>>>>>>>>>
> >>>>> ------------------------------------------------------------
> >>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> >>>>>> Science
> >>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <liuyang0704@gmail.com
> >>>>
> >>>>>>>>>>>>>>>>>>> QQ: 3239559*
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Best, Jingsong Lee
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best regards!
> >>>>>> Rui Li
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Hi Jark,

thanks for the summary. I hope we can also find a good long-term 
solution on the async/sync execution behavior topic.

It should be discussed in a bigger round because it is (similar to the 
time function discussion) related to batch-streaming unification where 
we should stick to the SQL standard to some degree but also need to come 
up with good streaming semantics.

Let me summarize the problem again to hear opinions:

- Batch SQL users are used to execute SQL files sequentially (from top 
to bottom).
- Batch SQL users are used to SQL statements being executed blocking. 
One after the other. Esp. when moving around data with INSERT INTO.
- Streaming users prefer async execution because unbounded stream are 
more frequent than bounded streams.
- We decided to make Flink Table API is async because in a programming 
language it is easy to call `.await()` on the result to make it blocking.
- INSERT INTO statements in the current SQL Client implementation are 
always submitted asynchrounous.
- Other client's such as Ververica platform allow only one INSERT INTO 
or a STATEMENT SET at the end of a file that will run asynchrounously.

Questions:

- How should we execute statements in CLI and in file? Should there be a 
difference?
- Should we have different behavior for batch and streaming?
- Shall we solve parts with a config option or is it better to make it 
explicit in the SQL job definition because it influences the semantics 
of multiple INSERT INTOs?

Let me summarize my opinion at the moment:

- SQL files should always be executed blocking by default. Because they 
could potentially contain a long list of INSERT INTO statements. This 
would be SQL standard compliant.
- If we allow async execution, we should make this explicit in the SQL 
file via `BEGIN ASYNC; ... END;`.
- In the CLI, we always execute async to maintain the old behavior. We 
can also assume that people are only using the CLI to fire statements 
and close the CLI afterwards.

Alternative 1:
- We consider batch/streaming mode and block for batch INSERT INTO and 
async for streaming INSERT INTO/STATEMENT SET

What do others think?

Regards,
Timo




On 05.02.21 04:03, Jark Wu wrote:
> Hi all,
> 
> After an offline discussion with Timo and Kurt, we have reached some
> consensus.
> Please correct me if I am wrong or missed anything.
> 
> 1) We will introduce "table.planner" and "table.execution-mode" instead of
> "sql-client" prefix,
> and add `TableEnvironment.create(Configuration)` interface. These 2 options
> can only be used
> for tableEnv initialization. If used after initialization, Flink should
> throw an exception. We may can
> support dynamic switch the planner in the future.
> 
> 2) We will have only one parser,
> i.e. org.apache.flink.table.delegation.Parser. It accepts a string
> statement, and returns a list of Operation. It will first use regex to
> match some special statement,
>   e.g. SET, ADD JAR, others will be delegated to the underlying Calcite
> parser. The Parser can
> have different implementations, e.g. HiveParser.
> 
> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect. But we
> can allow
> DELETE JAR, LIST JAR in Hive dialect through HiveParser.
> 
> 4) We don't have a conclusion for async/sync execution behavior yet.
> 
> Best,
> Jark
> 
> 
> 
> On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:
> 
>> Hi Ingo,
>>
>> Since we have supported the WITH syntax and SET command since v1.9 [1][2],
>> and
>> we have never received such complaints, I think it's fine for such
>> differences.
>>
>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
>> string literal keys[3],
>> and the SET <key>=<value> doesn't allow quoted keys [4].
>>
>> Best,
>> Jark
>>
>> [1]:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
>> [2]:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
>> [3]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>> [4]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
>> (search "set mapred.reduce.tasks=32")
>>
>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
>>
>>> Hi,
>>>
>>> regarding the (un-)quoted question, compatibility is of course an
>>> important
>>> argument, but in terms of consistency I'd find it a bit surprising that
>>> WITH handles it differently than SET, and I wonder if that could cause
>>> friction for developers when writing their SQL.
>>>
>>>
>>> Regards
>>> Ingo
>>>
>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Regarding "One Parser", I think it's not possible for now because
>>> Calcite
>>>> parser can't parse
>>>> special characters (e.g. "-") unless quoting them as string literals.
>>>> That's why the WITH option
>>>> key are string literals not identifiers.
>>>>
>>>> SET table.exec.mini-batch.enabled = true and ADD JAR
>>>> /local/my-home/test.jar
>>>> have the same
>>>> problems. That's why we propose two parser, one splits lines into
>>> multiple
>>>> statements and match special
>>>> command through regex which is light-weight, and delegate other
>>> statements
>>>> to the other parser which is Calcite parser.
>>>>
>>>> Note: we should stick on the unquoted SET table.exec.mini-batch.enabled
>>> =
>>>> true syntax,
>>>> both for backward-compatibility and easy-to-use, and all the other
>>> systems
>>>> don't have quotes on the key.
>>>>
>>>>
>>>> Regarding "table.planner" vs "sql-client.planner",
>>>> if we want to use "table.planner", I think we should explain clearly
>>> what's
>>>> the scope it can be used in documentation.
>>>> Otherwise, there will be users complaining why the planner doesn't
>>> change
>>>> when setting the configuration on TableEnv.
>>>> Would be better throwing an exception to indicate users it's now
>>> allowed to
>>>> change planner after TableEnv is initialized.
>>>> However, it seems not easy to implement.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Regarding "table.planner" and "table.execution-mode"
>>>>> If we define that those two options are just used to initialize the
>>>>> TableEnvironment, +1 for introducing table options instead of
>>> sql-client
>>>>> options.
>>>>>
>>>>> Regarding "the sql client, we will maintain two parsers", I want to
>>> give
>>>>> more inputs:
>>>>> We want to introduce sql-gateway into the Flink project (see FLIP-24 &
>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client
>>> and
>>>>> the gateway service will communicate through Rest API. The " ADD JAR
>>>>> /local/path/jar " will be executed in the CLI client machine. So when
>>> we
>>>>> submit a sql file which contains multiple statements, the CLI client
>>>> needs
>>>>> to pick out the "ADD JAR" line, and also statements need to be
>>> submitted
>>>> or
>>>>> executed one by one to make sure the result is correct. The sql file
>>> may
>>>> be
>>>>> look like:
>>>>>
>>>>> SET xxx=yyy;
>>>>> create table my_table ...;
>>>>> create table my_sink ...;
>>>>> ADD JAR /local/path/jar1;
>>>>> create function my_udf as com....MyUdf;
>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>> REMOVE JAR /local/path/jar1;
>>>>> drop function my_udf;
>>>>> ADD JAR /local/path/jar2;
>>>>> create function my_udf as com....MyUdf2;
>>>>> insert into my_sink select ..., my_udf(xx) from ...;
>>>>>
>>>>> The lines need to be splitted into multiple statements first in the
>>> CLI
>>>>> client, there are two approaches:
>>>>> 1. The CLI client depends on the sql-parser: the sql-parser splits the
>>>>> lines and tells which lines are "ADD JAR".
>>>>> pro: there is only one parser
>>>>> cons: It's a little heavy that the CLI client depends on the
>>> sql-parser,
>>>>> because the CLI client is just a simple tool which receives the user
>>>>> commands and displays the result. The non "ADD JAR" command will be
>>>> parsed
>>>>> twice.
>>>>>
>>>>> 2. The CLI client splits the lines into multiple statements and finds
>>> the
>>>>> ADD JAR command through regex matching.
>>>>> pro: The CLI client is very light-weight.
>>>>> cons: there are two parsers.
>>>>>
>>>>> (personally, I prefer the second option)
>>>>>
>>>>> Regarding "SHOW or LIST JARS", I think we can support them both.
>>>>> For default dialect, we support SHOW JARS, but if we switch to hive
>>>>> dialect, LIST JARS is also supported.
>>>>>
>>>>>
>>>>> [1]
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>>>>> [2]
>>>>>
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>
>>>>> Best,
>>>>> Godfrey
>>>>>
>>>>> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent with other
>>>>>> commands than LIST JARS. I don't have a strong opinion about REMOVE
>>> vs
>>>>>> DELETE though.
>>>>>>
>>>>>> While flink doesn't need to follow hive syntax, as far as I know,
>>> most
>>>>>> users who are requesting these features are previously hive users.
>>> So I
>>>>>> wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
>>>> JARS
>>>>>> as synonyms? It's just like lots of systems accept both EXIT and
>>> QUIT
>>>> as
>>>>>> the command to terminate the program. So if that's not hard to
>>> achieve,
>>>>> and
>>>>>> will make users happier, I don't see a reason why we must choose one
>>>> over
>>>>>> the other.
>>>>>>
>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org>
>>>> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> some feedback regarding the open questions. Maybe we can discuss
>>> the
>>>>>>> `TableEnvironment.executeMultiSql` story offline to determine how
>>> we
>>>>>>> proceed with this in the near future.
>>>>>>>
>>>>>>> 1) "whether the table environment has the ability to update
>>> itself"
>>>>>>>
>>>>>>> Maybe there was some misunderstanding. I don't think that we
>>> should
>>>>>>> support
>>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>>>>>>> "old")`. Instead I'm proposing to support
>>>>>>> `TableEnvironment.create(Configuration)` where planner and
>>> execution
>>>>>>> mode are read immediately and a subsequent changes to these
>>> options
>>>>> will
>>>>>>> have no effect. We are doing it similar in `new
>>>>>>> StreamExecutionEnvironment(Configuration)`. These two
>>> ConfigOption's
>>>>>>> must not be SQL Client specific but can be part of the core table
>>>> code
>>>>>>> base. Many users would like to get a 100% preconfigured
>>> environment
>>>>> from
>>>>>>> just Configuration. And this is not possible right now. We can
>>> solve
>>>>>>> both use cases in one change.
>>>>>>>
>>>>>>> 2) "the sql client, we will maintain two parsers"
>>>>>>>
>>>>>>> I remember we had some discussion about this and decided that we
>>>> would
>>>>>>> like to maintain only one parser. In the end it is "One Flink SQL"
>>>>> where
>>>>>>> commands influence each other also with respect to keywords. It
>>>> should
>>>>>>> be fine to include the SQL Client commands in the Flink parser. Of
>>>>>>> cource the table environment would not be able to handle the
>>>>> `Operation`
>>>>>>> instance that would be the result but we can introduce hooks to
>>>> handle
>>>>>>> those `Operation`s. Or we introduce parser extensions.
>>>>>>>
>>>>>>> Can we skip `table.job.async` in the first version? We should
>>> further
>>>>>>> discuss whether we introduce a special SQL clause for wrapping
>>> async
>>>>>>> behavior or if we use a config option? Esp. for streaming queries
>>> we
>>>>>>> need to be careful and should force users to either "one INSERT
>>> INTO"
>>>>> or
>>>>>>> "one STATEMENT SET".
>>>>>>>
>>>>>>> 3) 4) "HIVE also uses these commands"
>>>>>>>
>>>>>>> In general, Hive is not a good reference. Aligning the commands
>>> more
>>>>>>> with the remaining commands should be our goal. We just had a
>>> MODULE
>>>>>>> discussion where we selected SHOW instead of LIST. But it is true
>>>> that
>>>>>>> JARs are not part of the catalog which is why I would not use
>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English
>>>> language.
>>>>>>> Take a look at the Java collection API as another example.
>>>>>>>
>>>>>>> 6) "Most of the commands should belong to the table environment"
>>>>>>>
>>>>>>> Thanks for updating the FLIP this makes things easier to
>>> understand.
>>>> It
>>>>>>> is good to see that most commends will be available in
>>>>> TableEnvironment.
>>>>>>> However, I would also support SET and RESET for consistency.
>>> Again,
>>>>> from
>>>>>>> an architectural point of view, if we would allow some kind of
>>>>>>> `Operation` hook in table environment, we could check for SQL
>>> Client
>>>>>>> specific options and forward to regular
>>>> `TableConfig.getConfiguration`
>>>>>>> otherwise. What do you think?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>>
>>>>>>>
>>>>>>> On 03.02.21 08:58, Jark Wu wrote:
>>>>>>>> Hi Timo,
>>>>>>>>
>>>>>>>> I will respond some of the questions:
>>>>>>>>
>>>>>>>> 1) SQL client specific options
>>>>>>>>
>>>>>>>> Whether it starts with "table" or "sql-client" depends on where
>>> the
>>>>>>>> configuration takes effect.
>>>>>>>> If it is a table configuration, we should make clear what's the
>>>>>> behavior
>>>>>>>> when users change
>>>>>>>> the configuration in the lifecycle of TableEnvironment.
>>>>>>>>
>>>>>>>> I agree with Shengkai `sql-client.planner` and
>>>>>>> `sql-client.execution.mode`
>>>>>>>> are something special
>>>>>>>> that can't be changed after TableEnvironment has been
>>> initialized.
>>>>> You
>>>>>>> can
>>>>>>>> see
>>>>>>>> `StreamExecutionEnvironment` provides `configure()`  method to
>>>>> override
>>>>>>>> configuration after
>>>>>>>> StreamExecutionEnvironment has been initialized.
>>>>>>>>
>>>>>>>> Therefore, I think it would be better to still use
>>>>>> `sql-client.planner`
>>>>>>>> and `sql-client.execution.mode`.
>>>>>>>>
>>>>>>>> 2) Execution file
>>>>>>>>
>>>>>>>> >From my point of view, there is a big difference between
>>>>>>>> `sql-client.job.detach` and
>>>>>>>> `TableEnvironment.executeMultiSql()` that
>>> `sql-client.job.detach`
>>>>> will
>>>>>>>> affect every single DML statement
>>>>>>>> in the terminal, not only the statements in SQL files. I think
>>> the
>>>>>> single
>>>>>>>> DML statement in the interactive
>>>>>>>> terminal is something like tEnv#executeSql() instead of
>>>>>>>> tEnv#executeMultiSql.
>>>>>>>> So I don't like the "multi" and "sql" keyword in
>>>>>> `table.multi-sql-async`.
>>>>>>>> I just find that runtime provides a configuration called
>>>>>>>> "execution.attached" [1] which is false by default
>>>>>>>> which specifies if the pipeline is submitted in attached or
>>>> detached
>>>>>>> mode.
>>>>>>>> It provides exactly the same
>>>>>>>> functionality of `sql-client.job.detach`. What do you think
>>> about
>>>>> using
>>>>>>>> this option?
>>>>>>>>
>>>>>>>> If we also want to support this config in TableEnvironment, I
>>> think
>>>>> it
>>>>>>>> should also affect the DML execution
>>>>>>>>    of `tEnv#executeSql()`, not only DMLs in
>>>> `tEnv#executeMultiSql()`.
>>>>>>>> Therefore, the behavior may look like this:
>>>>>>>>
>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
>>> by
>>>>>>> default
>>>>>>>> tableResult.await()   ==> manually block until finish
>>>>>>>>
>>> tEnv.getConfig().getConfiguration().setString("execution.attached",
>>>>>>> "true")
>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
>>>>> don't
>>>>>>> need
>>>>>>>> to wait on the TableResult
>>>>>>>> tEnv.executeMultiSql(
>>>>>>>> """
>>>>>>>> CREATE TABLE ....  ==> always sync
>>>>>>>> INSERT INTO ...  => sync, because we set configuration above
>>>>>>>> SET execution.attached = false;
>>>>>>>> INSERT INTO ...  => async
>>>>>>>> """)
>>>>>>>>
>>>>>>>> On the other hand, I think `sql-client.job.detach`
>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two separate
>>>>> topics,
>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on
>>>>>>>> `TableEnvironment#executeSql()` to support multi-line
>>> statements.
>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't want
>>> it to
>>>>>> block
>>>>>>>> this FLIP, maybe we can discuss this in another thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jark
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>>>>>>>>
>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi, Timo.
>>>>>>>>> Thanks for your detailed feedback. I have some thoughts about
>>> your
>>>>>>>>> feedback.
>>>>>>>>>
>>>>>>>>> *Regarding #1*: I think the main problem is whether the table
>>>>>>> environment
>>>>>>>>> has the ability to update itself. Let's take a simple program
>>> as
>>>> an
>>>>>>>>> example.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...);
>>>>>>>>>
>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner",
>>> "old");
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> tEnv.executeSql("...");
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> If we regard this option as a table option, users don't have to
>>>>> create
>>>>>>>>> another table environment manually. In that case, tEnv needs to
>>>>> check
>>>>>>>>> whether the current mode and planner are the same as before
>>> when
>>>>>>> executeSql
>>>>>>>>> or explainSql. I don't think it's easy work for the table
>>>>> environment,
>>>>>>>>> especially if users have a StreamExecutionEnvironment but set
>>> old
>>>>>>> planner
>>>>>>>>> and batch mode. But when we make this option as a sql client
>>>> option,
>>>>>>> users
>>>>>>>>> only use the SET command to change the setting. We can rebuild
>>> a
>>>> new
>>>>>>> table
>>>>>>>>> environment when set successes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Regarding #2*: I think we need to discuss the implementation
>>>> before
>>>>>>>>> continuing this topic. In the sql client, we will maintain two
>>>>>> parsers.
>>>>>>> The
>>>>>>>>> first parser(client parser) will only match the sql client
>>>> commands.
>>>>>> If
>>>>>>> the
>>>>>>>>> client parser can't parse the statement, we will leverage the
>>>> power
>>>>> of
>>>>>>> the
>>>>>>>>> table environment to execute. According to our blueprint,
>>>>>>>>> TableEnvironment#executeSql is enough for the sql client.
>>>> Therefore,
>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>>>>>>>>>
>>>>>>>>> But if we need to introduce the
>>> `TableEnvironment.executeMultiSql`
>>>>> in
>>>>>>> the
>>>>>>>>> future, I think it's OK to use the option
>>> `table.multi-sql-async`
>>>>>> rather
>>>>>>>>> than option `sql-client.job.detach`. But we think the name is
>>> not
>>>>>>> suitable
>>>>>>>>> because the name is confusing for others. When setting the
>>> option
>>>>>>> false, we
>>>>>>>>> just mean it will block the execution of the INSERT INTO
>>>> statement,
>>>>>> not
>>>>>>> DDL
>>>>>>>>> or others(other sql statements are always executed
>>> synchronously).
>>>>> So
>>>>>>> how
>>>>>>>>> about `table.job.async`? It only works for the sql-client and
>>> the
>>>>>>>>> executeMultiSql. If we set this value false, the table
>>> environment
>>>>>> will
>>>>>>>>> return the result until the job finishes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR and
>>>> LIST
>>>>>> JAR
>>>>>>>>> because HIVE also uses these commands to add the jar into the
>>>>>> classpath
>>>>>>> or
>>>>>>>>> delete the jar. If we use  such commands, it can reduce our
>>> work
>>>> for
>>>>>>> hive
>>>>>>>>> compatibility.
>>>>>>>>>
>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not
>>>>> maintained
>>>>>> by
>>>>>>>>> the Catalog. If we really needs to keep consistent with SQL
>>>> grammar,
>>>>>>> maybe
>>>>>>>>> we should use
>>>>>>>>>
>>>>>>>>> `ADD JAR` -> `CREATE JAR`,
>>>>>>>>> `DELETE JAR` -> `DROP JAR`,
>>>>>>>>> `LIST JAR` -> `SHOW JAR`.
>>>>>>>>>
>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep
>>> consistent.
>>>>>>>>>
>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to the
>>>> table
>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag to
>>>>> identify
>>>>>>> which
>>>>>>>>> commands should belong to the sql client and which commands
>>> should
>>>>>>> belong
>>>>>>>>> to the table environment. I also add a new section about
>>>>>> implementation
>>>>>>>>> details in the FLIP.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Shengkai
>>>>>>>>>
>>>>>>>>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>>>>>>>>>
>>>>>>>>>> Thanks for this great proposal Shengkai. This will give the
>>> SQL
>>>>>> Client
>>>>>>> a
>>>>>>>>>> very good update and make it production ready.
>>>>>>>>>>
>>>>>>>>>> Here is some feedback from my side:
>>>>>>>>>>
>>>>>>>>>> 1) SQL client specific options
>>>>>>>>>>
>>>>>>>>>> I don't think that `sql-client.planner` and
>>>>>> `sql-client.execution.mode`
>>>>>>>>>> are SQL Client specific. Similar to
>>> `StreamExecutionEnvironment`
>>>>> and
>>>>>>>>>> `ExecutionConfig#configure` that have been added recently, we
>>>>> should
>>>>>>>>>> offer a possibility for TableEnvironment. How about we offer
>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a
>>>> `table.planner`
>>>>>> and
>>>>>>>>>> `table.execution-mode` to
>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>>>>>>>>>
>>>>>>>>>> 2) Execution file
>>>>>>>>>>
>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1] including
>>> the
>>>>>>> mailing
>>>>>>>>>> list thread at that time? Could you further elaborate how the
>>>>>>>>>> multi-statement execution should work for a unified
>>>> batch/streaming
>>>>>>>>>> story? According to our past discussions, each line in an
>>>> execution
>>>>>>> file
>>>>>>>>>> should be executed blocking which means a streaming query
>>> needs a
>>>>>>>>>> statement set to execute multiple INSERT INTO statement,
>>> correct?
>>>>> We
>>>>>>>>>> should also offer this functionality in
>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether
>>>>> `sql-client.job.detach`
>>>>>>> is
>>>>>>>>>> SQL Client specific needs to be determined, it could also be a
>>>>>> general
>>>>>>>>>> `table.multi-sql-async` option?
>>>>>>>>>>
>>>>>>>>>> 3) DELETE JAR
>>>>>>>>>>
>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
>>> like
>>>>> one
>>>>>>> is
>>>>>>>>>> actively deleting the JAR in the corresponding path.
>>>>>>>>>>
>>>>>>>>>> 4) LIST JAR
>>>>>>>>>>
>>>>>>>>>> This should be `SHOW JARS` according to other SQL commands
>>> such
>>>> as
>>>>>>> `SHOW
>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>>>>>>>>>
>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>>>>>>>>>
>>>>>>>>>> We should keep the details in sync with
>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
>>>>> about
>>>>>>>>>> differently named ExplainDetails. I would vote for
>>>> `ESTIMATED_COST`
>>>>>>>>>> instead of `COST`. I'm sure the original author had a reason
>>> why
>>>> to
>>>>>>> call
>>>>>>>>>> it that way.
>>>>>>>>>>
>>>>>>>>>> 6) Implementation details
>>>>>>>>>>
>>>>>>>>>> It would be nice to understand how we plan to implement the
>>> given
>>>>>>>>>> features. Most of the commands and config options should go
>>> into
>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This way
>>> users
>>>>>> have a
>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would
>>> provide a
>>>>>>> similar
>>>>>>>>>> user experience in notebooks or interactive programs than the
>>> SQL
>>>>>>> Client.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>>>>>>>>> [2]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Timo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather than
>>>>>> `UNSET`.
>>>>>>>>>>>
>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
>>>>>>>>>>>
>>>>>>>>>>>> Hi, Jingsong.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`.
>>> `RESET`
>>>> is
>>>>>>>>>>>> supported in the current sql client now. Our proposal just
>>>>> extends
>>>>>>> its
>>>>>>>>>>>> grammar and allow users to reset the specified keys.
>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the
>>> default
>>>>>>>>>> value[1].
>>>>>>>>>>>> I think it is more friendly for batch users.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>>>>>>>>>>>
>>>>>>>>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too outdated.
>>> +1
>>>> for
>>>>>>>>>>>>> improving it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jingsong
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>>> lirui.fudan@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes look
>>>> good
>>>>> to
>>>>>>>>> me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>>>>> fskmine@gmail.com
>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi, Rui.
>>>>>>>>>>>>>>> You are right. I have already modified the FLIP.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The main changes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # -f parameter has no restriction about the statement
>>> type.
>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result of
>>>>> queries
>>>>>> to
>>>>>>>>>>>>>> debug
>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much convenient
>>>>>>> comparing
>>>>>>>>>> to
>>>>>>>>>>>>>>> writing INSERT INTO statements.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch
>>> mode.
>>>>> Users
>>>>>>>>> can
>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>> this option false and the client will process the next
>>> job
>>>>> until
>>>>>>>>> the
>>>>>>>>>>>>>>> current job finishes. The default value of this option is
>>>>> false,
>>>>>>>>>> which
>>>>>>>>>>>>>>> means the client will execute the next job when the
>>> current
>>>>> job
>>>>>> is
>>>>>>>>>>>>>>> submitted.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and hive
>>> have
>>>>>>>>> different
>>>>>>>>>>>>>>>> implications, and we should clarify the behavior. For
>>>>> example,
>>>>>> if
>>>>>>>>>> the
>>>>>>>>>>>>>>>> client just submits the job and exits, what happens if
>>> the
>>>>> file
>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should treat
>>> them
>>>> as
>>>>> a
>>>>>>>>>>>>>> statement
>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN
>>> STATEMENT
>>>>> SET
>>>>>> in
>>>>>>>>>> that
>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously submit the
>>>> two
>>>>>>> jobs,
>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>>>>>> fskmine@gmail.com
>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Rui,
>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your
>>> suggestions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
>>> the
>>>> set
>>>>>>>>>>>>>> command. In
>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value into
>>>> the
>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate the
>>> table
>>>>>>> config.
>>>>>>>>>> If
>>>>>>>>>>>>>> hive
>>>>>>>>>>>>>>>>> supports to read the setting from the table config,
>>> users
>>>>> are
>>>>>>>>> able
>>>>>>>>>>>>>> to set
>>>>>>>>>>>>>>>>> the hive-related settings.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit the
>>> job
>>>>> and
>>>>>>>>>> exit.
>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the job by
>>>>>>>>> themselves,
>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In most
>>> case,
>>>>>>> queries
>>>>>>>>>>>>>> are used
>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in the
>>>>>> interactive
>>>>>>>>>>>>>> mode.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
>>> think
>>>> it
>>>>>>>>>> covers a
>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically improve
>>>> the
>>>>>>>>>>>>>> usability of our
>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary
>>>>> configurations
>>>>>>>>> via
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> SET command? A connector may have its own
>>> configurations
>>>>> and
>>>>>> we
>>>>>>>>>>>>>> don't have
>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations in SQL
>>>>>> Client.
>>>>>>>>> For
>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf when
>>> using
>>>>> hive
>>>>>>>>>>>>>> connector [1].
>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
>>> files
>>>>>>>>> specified
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option but
>>>> allows
>>>>>>>>>> queries
>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query and
>>>>> redirect
>>>>>>>>> the
>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would like to
>>> do
>>>>> the
>>>>>>>>> same,
>>>>>>>>>>>>>>>>>> especially in batch scenarios.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>>>>>>>>>>>> liuyang0704@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some
>>> additional
>>>>>>>>>>>>>> suggestions:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
>>> sql.
>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
>>>>> collect
>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>>>>>>>>>>>          which may have memory issues in JM or Local
>>> for
>>>>> the
>>>>>>> big
>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>> result.
>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>>>>>>>>>>>          We may change to use SelectTableSink, which
>>> is
>>>>> based
>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
>>> is in
>>>>>>>>> FLIP-91.
>>>>>>>>>>>>>> Seems
>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long time.
>>>>>>>>>>>>>>>>>>>          Provide a long running service out of the
>>> box to
>>>>>>>>>> facilitate
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> sql
>>>>>>>>>>>>>>>>>>> submission is necessary.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think of these?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
>>>> 下午8:54写道：
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about
>>>> FLIP-163:SQL
>>>>>>>>> Client
>>>>>>>>>>>>>>>>>>>> Improvements.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems of the
>>>> sql
>>>>>>>>> client.
>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>> example, users can not register the table proposed
>>> by
>>>>>>> FLIP-95.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to
>>>> initialize
>>>>>> the
>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
>>>>> parameter;
>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>>>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to
>>> FLIP-163[1].
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *With kind regards
>>>>>>>>>>>>>>>>>>>
>>>>> ------------------------------------------------------------
>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
>>>>>> Science
>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <liuyang0704@gmail.com
>>>>
>>>>>>>>>>>>>>>>>>> QQ: 3239559*
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best, Jingsong Lee
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi all,

After an offline discussion with Timo and Kurt, we have reached some
consensus.
Please correct me if I am wrong or missed anything.

1) We will introduce "table.planner" and "table.execution-mode" instead of
"sql-client" prefix,
and add `TableEnvironment.create(Configuration)` interface. These 2 options
can only be used
for tableEnv initialization. If used after initialization, Flink should
throw an exception. We may can
support dynamic switch the planner in the future.

2) We will have only one parser,
i.e. org.apache.flink.table.delegation.Parser. It accepts a string
statement, and returns a list of Operation. It will first use regex to
match some special statement,
 e.g. SET, ADD JAR, others will be delegated to the underlying Calcite
parser. The Parser can
have different implementations, e.g. HiveParser.

3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect. But we
can allow
DELETE JAR, LIST JAR in Hive dialect through HiveParser.

4) We don't have a conclusion for async/sync execution behavior yet.

Best,
Jark



On Thu, 4 Feb 2021 at 17:50, Jark Wu <im...@gmail.com> wrote:

> Hi Ingo,
>
> Since we have supported the WITH syntax and SET command since v1.9 [1][2],
> and
> we have never received such complaints, I think it's fine for such
> differences.
>
> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
> string literal keys[3],
> and the SET <key>=<value> doesn't allow quoted keys [4].
>
> Best,
> Jark
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
> [2]:
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
> [3]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> [4]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
> (search "set mapred.reduce.tasks=32")
>
> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:
>
>> Hi,
>>
>> regarding the (un-)quoted question, compatibility is of course an
>> important
>> argument, but in terms of consistency I'd find it a bit surprising that
>> WITH handles it differently than SET, and I wonder if that could cause
>> friction for developers when writing their SQL.
>>
>>
>> Regards
>> Ingo
>>
>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > Regarding "One Parser", I think it's not possible for now because
>> Calcite
>> > parser can't parse
>> > special characters (e.g. "-") unless quoting them as string literals.
>> > That's why the WITH option
>> > key are string literals not identifiers.
>> >
>> > SET table.exec.mini-batch.enabled = true and ADD JAR
>> > /local/my-home/test.jar
>> > have the same
>> > problems. That's why we propose two parser, one splits lines into
>> multiple
>> > statements and match special
>> > command through regex which is light-weight, and delegate other
>> statements
>> > to the other parser which is Calcite parser.
>> >
>> > Note: we should stick on the unquoted SET table.exec.mini-batch.enabled
>> =
>> > true syntax,
>> > both for backward-compatibility and easy-to-use, and all the other
>> systems
>> > don't have quotes on the key.
>> >
>> >
>> > Regarding "table.planner" vs "sql-client.planner",
>> > if we want to use "table.planner", I think we should explain clearly
>> what's
>> > the scope it can be used in documentation.
>> > Otherwise, there will be users complaining why the planner doesn't
>> change
>> > when setting the configuration on TableEnv.
>> > Would be better throwing an exception to indicate users it's now
>> allowed to
>> > change planner after TableEnv is initialized.
>> > However, it seems not easy to implement.
>> >
>> > Best,
>> > Jark
>> >
>> > On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com> wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > Regarding "table.planner" and "table.execution-mode"
>> > > If we define that those two options are just used to initialize the
>> > > TableEnvironment, +1 for introducing table options instead of
>> sql-client
>> > > options.
>> > >
>> > > Regarding "the sql client, we will maintain two parsers", I want to
>> give
>> > > more inputs:
>> > > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
>> > > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client
>> and
>> > > the gateway service will communicate through Rest API. The " ADD JAR
>> > > /local/path/jar " will be executed in the CLI client machine. So when
>> we
>> > > submit a sql file which contains multiple statements, the CLI client
>> > needs
>> > > to pick out the "ADD JAR" line, and also statements need to be
>> submitted
>> > or
>> > > executed one by one to make sure the result is correct. The sql file
>> may
>> > be
>> > > look like:
>> > >
>> > > SET xxx=yyy;
>> > > create table my_table ...;
>> > > create table my_sink ...;
>> > > ADD JAR /local/path/jar1;
>> > > create function my_udf as com....MyUdf;
>> > > insert into my_sink select ..., my_udf(xx) from ...;
>> > > REMOVE JAR /local/path/jar1;
>> > > drop function my_udf;
>> > > ADD JAR /local/path/jar2;
>> > > create function my_udf as com....MyUdf2;
>> > > insert into my_sink select ..., my_udf(xx) from ...;
>> > >
>> > > The lines need to be splitted into multiple statements first in the
>> CLI
>> > > client, there are two approaches:
>> > > 1. The CLI client depends on the sql-parser: the sql-parser splits the
>> > > lines and tells which lines are "ADD JAR".
>> > > pro: there is only one parser
>> > > cons: It's a little heavy that the CLI client depends on the
>> sql-parser,
>> > > because the CLI client is just a simple tool which receives the user
>> > > commands and displays the result. The non "ADD JAR" command will be
>> > parsed
>> > > twice.
>> > >
>> > > 2. The CLI client splits the lines into multiple statements and finds
>> the
>> > > ADD JAR command through regex matching.
>> > > pro: The CLI client is very light-weight.
>> > > cons: there are two parsers.
>> > >
>> > > (personally, I prefer the second option)
>> > >
>> > > Regarding "SHOW or LIST JARS", I think we can support them both.
>> > > For default dialect, we support SHOW JARS, but if we switch to hive
>> > > dialect, LIST JARS is also supported.
>> > >
>> > >
>> > > [1]
>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
>> > > [2]
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>> > >
>> > > Best,
>> > > Godfrey
>> > >
>> > > Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>> > >
>> > > > Hi guys,
>> > > >
>> > > > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
>> > > > commands than LIST JARS. I don't have a strong opinion about REMOVE
>> vs
>> > > > DELETE though.
>> > > >
>> > > > While flink doesn't need to follow hive syntax, as far as I know,
>> most
>> > > > users who are requesting these features are previously hive users.
>> So I
>> > > > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
>> > JARS
>> > > > as synonyms? It's just like lots of systems accept both EXIT and
>> QUIT
>> > as
>> > > > the command to terminate the program. So if that's not hard to
>> achieve,
>> > > and
>> > > > will make users happier, I don't see a reason why we must choose one
>> > over
>> > > > the other.
>> > > >
>> > > > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org>
>> > wrote:
>> > > >
>> > > > > Hi everyone,
>> > > > >
>> > > > > some feedback regarding the open questions. Maybe we can discuss
>> the
>> > > > > `TableEnvironment.executeMultiSql` story offline to determine how
>> we
>> > > > > proceed with this in the near future.
>> > > > >
>> > > > > 1) "whether the table environment has the ability to update
>> itself"
>> > > > >
>> > > > > Maybe there was some misunderstanding. I don't think that we
>> should
>> > > > > support
>> `tEnv.getConfig.getConfiguration.setString("table.planner",
>> > > > > "old")`. Instead I'm proposing to support
>> > > > > `TableEnvironment.create(Configuration)` where planner and
>> execution
>> > > > > mode are read immediately and a subsequent changes to these
>> options
>> > > will
>> > > > > have no effect. We are doing it similar in `new
>> > > > > StreamExecutionEnvironment(Configuration)`. These two
>> ConfigOption's
>> > > > > must not be SQL Client specific but can be part of the core table
>> > code
>> > > > > base. Many users would like to get a 100% preconfigured
>> environment
>> > > from
>> > > > > just Configuration. And this is not possible right now. We can
>> solve
>> > > > > both use cases in one change.
>> > > > >
>> > > > > 2) "the sql client, we will maintain two parsers"
>> > > > >
>> > > > > I remember we had some discussion about this and decided that we
>> > would
>> > > > > like to maintain only one parser. In the end it is "One Flink SQL"
>> > > where
>> > > > > commands influence each other also with respect to keywords. It
>> > should
>> > > > > be fine to include the SQL Client commands in the Flink parser. Of
>> > > > > cource the table environment would not be able to handle the
>> > > `Operation`
>> > > > > instance that would be the result but we can introduce hooks to
>> > handle
>> > > > > those `Operation`s. Or we introduce parser extensions.
>> > > > >
>> > > > > Can we skip `table.job.async` in the first version? We should
>> further
>> > > > > discuss whether we introduce a special SQL clause for wrapping
>> async
>> > > > > behavior or if we use a config option? Esp. for streaming queries
>> we
>> > > > > need to be careful and should force users to either "one INSERT
>> INTO"
>> > > or
>> > > > > "one STATEMENT SET".
>> > > > >
>> > > > > 3) 4) "HIVE also uses these commands"
>> > > > >
>> > > > > In general, Hive is not a good reference. Aligning the commands
>> more
>> > > > > with the remaining commands should be our goal. We just had a
>> MODULE
>> > > > > discussion where we selected SHOW instead of LIST. But it is true
>> > that
>> > > > > JARs are not part of the catalog which is why I would not use
>> > > > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English
>> > language.
>> > > > > Take a look at the Java collection API as another example.
>> > > > >
>> > > > > 6) "Most of the commands should belong to the table environment"
>> > > > >
>> > > > > Thanks for updating the FLIP this makes things easier to
>> understand.
>> > It
>> > > > > is good to see that most commends will be available in
>> > > TableEnvironment.
>> > > > > However, I would also support SET and RESET for consistency.
>> Again,
>> > > from
>> > > > > an architectural point of view, if we would allow some kind of
>> > > > > `Operation` hook in table environment, we could check for SQL
>> Client
>> > > > > specific options and forward to regular
>> > `TableConfig.getConfiguration`
>> > > > > otherwise. What do you think?
>> > > > >
>> > > > > Regards,
>> > > > > Timo
>> > > > >
>> > > > >
>> > > > > On 03.02.21 08:58, Jark Wu wrote:
>> > > > > > Hi Timo,
>> > > > > >
>> > > > > > I will respond some of the questions:
>> > > > > >
>> > > > > > 1) SQL client specific options
>> > > > > >
>> > > > > > Whether it starts with "table" or "sql-client" depends on where
>> the
>> > > > > > configuration takes effect.
>> > > > > > If it is a table configuration, we should make clear what's the
>> > > > behavior
>> > > > > > when users change
>> > > > > > the configuration in the lifecycle of TableEnvironment.
>> > > > > >
>> > > > > > I agree with Shengkai `sql-client.planner` and
>> > > > > `sql-client.execution.mode`
>> > > > > > are something special
>> > > > > > that can't be changed after TableEnvironment has been
>> initialized.
>> > > You
>> > > > > can
>> > > > > > see
>> > > > > > `StreamExecutionEnvironment` provides `configure()`  method to
>> > > override
>> > > > > > configuration after
>> > > > > > StreamExecutionEnvironment has been initialized.
>> > > > > >
>> > > > > > Therefore, I think it would be better to still use
>> > > > `sql-client.planner`
>> > > > > > and `sql-client.execution.mode`.
>> > > > > >
>> > > > > > 2) Execution file
>> > > > > >
>> > > > > >>From my point of view, there is a big difference between
>> > > > > > `sql-client.job.detach` and
>> > > > > > `TableEnvironment.executeMultiSql()` that
>> `sql-client.job.detach`
>> > > will
>> > > > > > affect every single DML statement
>> > > > > > in the terminal, not only the statements in SQL files. I think
>> the
>> > > > single
>> > > > > > DML statement in the interactive
>> > > > > > terminal is something like tEnv#executeSql() instead of
>> > > > > > tEnv#executeMultiSql.
>> > > > > > So I don't like the "multi" and "sql" keyword in
>> > > > `table.multi-sql-async`.
>> > > > > > I just find that runtime provides a configuration called
>> > > > > > "execution.attached" [1] which is false by default
>> > > > > > which specifies if the pipeline is submitted in attached or
>> > detached
>> > > > > mode.
>> > > > > > It provides exactly the same
>> > > > > > functionality of `sql-client.job.detach`. What do you think
>> about
>> > > using
>> > > > > > this option?
>> > > > > >
>> > > > > > If we also want to support this config in TableEnvironment, I
>> think
>> > > it
>> > > > > > should also affect the DML execution
>> > > > > >   of `tEnv#executeSql()`, not only DMLs in
>> > `tEnv#executeMultiSql()`.
>> > > > > > Therefore, the behavior may look like this:
>> > > > > >
>> > > > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
>> by
>> > > > > default
>> > > > > > tableResult.await()   ==> manually block until finish
>> > > > > >
>> tEnv.getConfig().getConfiguration().setString("execution.attached",
>> > > > > "true")
>> > > > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
>> > > don't
>> > > > > need
>> > > > > > to wait on the TableResult
>> > > > > > tEnv.executeMultiSql(
>> > > > > > """
>> > > > > > CREATE TABLE ....  ==> always sync
>> > > > > > INSERT INTO ...  => sync, because we set configuration above
>> > > > > > SET execution.attached = false;
>> > > > > > INSERT INTO ...  => async
>> > > > > > """)
>> > > > > >
>> > > > > > On the other hand, I think `sql-client.job.detach`
>> > > > > > and `TableEnvironment.executeMultiSql()` should be two separate
>> > > topics,
>> > > > > > as Shengkai mentioned above, SQL CLI only depends on
>> > > > > > `TableEnvironment#executeSql()` to support multi-line
>> statements.
>> > > > > > I'm fine with making `executeMultiSql()` clear but don't want
>> it to
>> > > > block
>> > > > > > this FLIP, maybe we can discuss this in another thread.
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > > Jark
>> > > > > >
>> > > > > > [1]:
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>> > > > > >
>> > > > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > >> Hi, Timo.
>> > > > > >> Thanks for your detailed feedback. I have some thoughts about
>> your
>> > > > > >> feedback.
>> > > > > >>
>> > > > > >> *Regarding #1*: I think the main problem is whether the table
>> > > > > environment
>> > > > > >> has the ability to update itself. Let's take a simple program
>> as
>> > an
>> > > > > >> example.
>> > > > > >>
>> > > > > >>
>> > > > > >> ```
>> > > > > >> TableEnvironment tEnv = TableEnvironment.create(...);
>> > > > > >>
>> > > > > >> tEnv.getConfig.getConfiguration.setString("table.planner",
>> "old");
>> > > > > >>
>> > > > > >>
>> > > > > >> tEnv.executeSql("...");
>> > > > > >>
>> > > > > >> ```
>> > > > > >>
>> > > > > >> If we regard this option as a table option, users don't have to
>> > > create
>> > > > > >> another table environment manually. In that case, tEnv needs to
>> > > check
>> > > > > >> whether the current mode and planner are the same as before
>> when
>> > > > > executeSql
>> > > > > >> or explainSql. I don't think it's easy work for the table
>> > > environment,
>> > > > > >> especially if users have a StreamExecutionEnvironment but set
>> old
>> > > > > planner
>> > > > > >> and batch mode. But when we make this option as a sql client
>> > option,
>> > > > > users
>> > > > > >> only use the SET command to change the setting. We can rebuild
>> a
>> > new
>> > > > > table
>> > > > > >> environment when set successes.
>> > > > > >>
>> > > > > >>
>> > > > > >> *Regarding #2*: I think we need to discuss the implementation
>> > before
>> > > > > >> continuing this topic. In the sql client, we will maintain two
>> > > > parsers.
>> > > > > The
>> > > > > >> first parser(client parser) will only match the sql client
>> > commands.
>> > > > If
>> > > > > the
>> > > > > >> client parser can't parse the statement, we will leverage the
>> > power
>> > > of
>> > > > > the
>> > > > > >> table environment to execute. According to our blueprint,
>> > > > > >> TableEnvironment#executeSql is enough for the sql client.
>> > Therefore,
>> > > > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>> > > > > >>
>> > > > > >> But if we need to introduce the
>> `TableEnvironment.executeMultiSql`
>> > > in
>> > > > > the
>> > > > > >> future, I think it's OK to use the option
>> `table.multi-sql-async`
>> > > > rather
>> > > > > >> than option `sql-client.job.detach`. But we think the name is
>> not
>> > > > > suitable
>> > > > > >> because the name is confusing for others. When setting the
>> option
>> > > > > false, we
>> > > > > >> just mean it will block the execution of the INSERT INTO
>> > statement,
>> > > > not
>> > > > > DDL
>> > > > > >> or others(other sql statements are always executed
>> synchronously).
>> > > So
>> > > > > how
>> > > > > >> about `table.job.async`? It only works for the sql-client and
>> the
>> > > > > >> executeMultiSql. If we set this value false, the table
>> environment
>> > > > will
>> > > > > >> return the result until the job finishes.
>> > > > > >>
>> > > > > >>
>> > > > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and
>> > LIST
>> > > > JAR
>> > > > > >> because HIVE also uses these commands to add the jar into the
>> > > > classpath
>> > > > > or
>> > > > > >> delete the jar. If we use  such commands, it can reduce our
>> work
>> > for
>> > > > > hive
>> > > > > >> compatibility.
>> > > > > >>
>> > > > > >> For SHOW JAR, I think the main concern is the jars are not
>> > > maintained
>> > > > by
>> > > > > >> the Catalog. If we really needs to keep consistent with SQL
>> > grammar,
>> > > > > maybe
>> > > > > >> we should use
>> > > > > >>
>> > > > > >> `ADD JAR` -> `CREATE JAR`,
>> > > > > >> `DELETE JAR` -> `DROP JAR`,
>> > > > > >> `LIST JAR` -> `SHOW JAR`.
>> > > > > >>
>> > > > > >> *Regarding #5*: I agree with you that we'd better keep
>> consistent.
>> > > > > >>
>> > > > > >> *Regarding #6*: Yes. Most of the commands should belong to the
>> > table
>> > > > > >> environment. In the Summary section, I use the <NOTE> tag to
>> > > identify
>> > > > > which
>> > > > > >> commands should belong to the sql client and which commands
>> should
>> > > > > belong
>> > > > > >> to the table environment. I also add a new section about
>> > > > implementation
>> > > > > >> details in the FLIP.
>> > > > > >>
>> > > > > >> Best,
>> > > > > >> Shengkai
>> > > > > >>
>> > > > > >> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>> > > > > >>
>> > > > > >>> Thanks for this great proposal Shengkai. This will give the
>> SQL
>> > > > Client
>> > > > > a
>> > > > > >>> very good update and make it production ready.
>> > > > > >>>
>> > > > > >>> Here is some feedback from my side:
>> > > > > >>>
>> > > > > >>> 1) SQL client specific options
>> > > > > >>>
>> > > > > >>> I don't think that `sql-client.planner` and
>> > > > `sql-client.execution.mode`
>> > > > > >>> are SQL Client specific. Similar to
>> `StreamExecutionEnvironment`
>> > > and
>> > > > > >>> `ExecutionConfig#configure` that have been added recently, we
>> > > should
>> > > > > >>> offer a possibility for TableEnvironment. How about we offer
>> > > > > >>> `TableEnvironment.create(ReadableConfig)` and add a
>> > `table.planner`
>> > > > and
>> > > > > >>> `table.execution-mode` to
>> > > > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
>> > > > > >>>
>> > > > > >>> 2) Execution file
>> > > > > >>>
>> > > > > >>> Did you have a look at the Appendix of FLIP-84 [1] including
>> the
>> > > > > mailing
>> > > > > >>> list thread at that time? Could you further elaborate how the
>> > > > > >>> multi-statement execution should work for a unified
>> > batch/streaming
>> > > > > >>> story? According to our past discussions, each line in an
>> > execution
>> > > > > file
>> > > > > >>> should be executed blocking which means a streaming query
>> needs a
>> > > > > >>> statement set to execute multiple INSERT INTO statement,
>> correct?
>> > > We
>> > > > > >>> should also offer this functionality in
>> > > > > >>> `TableEnvironment.executeMultiSql()`. Whether
>> > > `sql-client.job.detach`
>> > > > > is
>> > > > > >>> SQL Client specific needs to be determined, it could also be a
>> > > > general
>> > > > > >>> `table.multi-sql-async` option?
>> > > > > >>>
>> > > > > >>> 3) DELETE JAR
>> > > > > >>>
>> > > > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
>> like
>> > > one
>> > > > > is
>> > > > > >>> actively deleting the JAR in the corresponding path.
>> > > > > >>>
>> > > > > >>> 4) LIST JAR
>> > > > > >>>
>> > > > > >>> This should be `SHOW JARS` according to other SQL commands
>> such
>> > as
>> > > > > `SHOW
>> > > > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
>> > > > > >>>
>> > > > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>> > > > > >>>
>> > > > > >>> We should keep the details in sync with
>> > > > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
>> > > about
>> > > > > >>> differently named ExplainDetails. I would vote for
>> > `ESTIMATED_COST`
>> > > > > >>> instead of `COST`. I'm sure the original author had a reason
>> why
>> > to
>> > > > > call
>> > > > > >>> it that way.
>> > > > > >>>
>> > > > > >>> 6) Implementation details
>> > > > > >>>
>> > > > > >>> It would be nice to understand how we plan to implement the
>> given
>> > > > > >>> features. Most of the commands and config options should go
>> into
>> > > > > >>> TableEnvironment and SqlParser directly, correct? This way
>> users
>> > > > have a
>> > > > > >>> unified way of using Flink SQL. TableEnvironment would
>> provide a
>> > > > > similar
>> > > > > >>> user experience in notebooks or interactive programs than the
>> SQL
>> > > > > Client.
>> > > > > >>>
>> > > > > >>> [1]
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>> > > > > >>> [2]
>> > > > > >>>
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>> > > > > >>>
>> > > > > >>> Regards,
>> > > > > >>> Timo
>> > > > > >>>
>> > > > > >>>
>> > > > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
>> > > > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
>> > > > `UNSET`.
>> > > > > >>>>
>> > > > > >>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
>> > > > > >>>>
>> > > > > >>>>> Hi, Jingsong.
>> > > > > >>>>>
>> > > > > >>>>> Thanks for your reply. I think `UNSET` is much better.
>> > > > > >>>>>
>> > > > > >>>>> 1. We don't need to introduce another command `UNSET`.
>> `RESET`
>> > is
>> > > > > >>>>> supported in the current sql client now. Our proposal just
>> > > extends
>> > > > > its
>> > > > > >>>>> grammar and allow users to reset the specified keys.
>> > > > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the
>> default
>> > > > > >>> value[1].
>> > > > > >>>>> I think it is more friendly for batch users.
>> > > > > >>>>>
>> > > > > >>>>> Best,
>> > > > > >>>>> Shengkai
>> > > > > >>>>>
>> > > > > >>>>> [1]
>> > > > > >>>
>> > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>> > > > > >>>>>
>> > > > > >>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
>> > > > > >>>>>
>> > > > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated.
>> +1
>> > for
>> > > > > >>>>>> improving it.
>> > > > > >>>>>>
>> > > > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>> > > > > >>>>>>
>> > > > > >>>>>> Best,
>> > > > > >>>>>> Jingsong
>> > > > > >>>>>>
>> > > > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
>> lirui.fudan@gmail.com>
>> > > > > wrote:
>> > > > > >>>>>>
>> > > > > >>>>>>> Thanks Shengkai for the update! The proposed changes look
>> > good
>> > > to
>> > > > > >> me.
>> > > > > >>>>>>>
>> > > > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
>> > > fskmine@gmail.com
>> > > > >
>> > > > > >>> wrote:
>> > > > > >>>>>>>
>> > > > > >>>>>>>> Hi, Rui.
>> > > > > >>>>>>>> You are right. I have already modified the FLIP.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> The main changes:
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> # -f parameter has no restriction about the statement
>> type.
>> > > > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
>> > > queries
>> > > > to
>> > > > > >>>>>>> debug
>> > > > > >>>>>>>> when submitting job by -f parameter. It's much convenient
>> > > > > comparing
>> > > > > >>> to
>> > > > > >>>>>>>> writing INSERT INTO statements.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
>> > > > > >>>>>>>> Users prefer to execute jobs one by one in the batch
>> mode.
>> > > Users
>> > > > > >> can
>> > > > > >>>>>>> set
>> > > > > >>>>>>>> this option false and the client will process the next
>> job
>> > > until
>> > > > > >> the
>> > > > > >>>>>>>> current job finishes. The default value of this option is
>> > > false,
>> > > > > >>> which
>> > > > > >>>>>>>> means the client will execute the next job when the
>> current
>> > > job
>> > > > is
>> > > > > >>>>>>>> submitted.
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> Best,
>> > > > > >>>>>>>> Shengkai
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>> > > > > >>>>>>>>
>> > > > > >>>>>>>>> Hi Shengkai,
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive
>> have
>> > > > > >> different
>> > > > > >>>>>>>>> implications, and we should clarify the behavior. For
>> > > example,
>> > > > if
>> > > > > >>> the
>> > > > > >>>>>>>>> client just submits the job and exits, what happens if
>> the
>> > > file
>> > > > > >>>>>>> contains
>> > > > > >>>>>>>>> two INSERT statements? I don't think we should treat
>> them
>> > as
>> > > a
>> > > > > >>>>>>> statement
>> > > > > >>>>>>>>> set, because users should explicitly write BEGIN
>> STATEMENT
>> > > SET
>> > > > in
>> > > > > >>> that
>> > > > > >>>>>>>>> case. And the client shouldn't asynchronously submit the
>> > two
>> > > > > jobs,
>> > > > > >>>>>>> because
>> > > > > >>>>>>>>> the 2nd may depend on the 1st, right?
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
>> > > > fskmine@gmail.com
>> > > > > >
>> > > > > >>>>>>> wrote:
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>>> Hi Rui,
>> > > > > >>>>>>>>>> Thanks for your feedback. I agree with your
>> suggestions.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen
>> the
>> > set
>> > > > > >>>>>>> command. In
>> > > > > >>>>>>>>>> the implementation, it will just put the key-value into
>> > the
>> > > > > >>>>>>>>>> `Configuration`, which will be used to generate the
>> table
>> > > > > config.
>> > > > > >>> If
>> > > > > >>>>>>> hive
>> > > > > >>>>>>>>>> supports to read the setting from the table config,
>> users
>> > > are
>> > > > > >> able
>> > > > > >>>>>>> to set
>> > > > > >>>>>>>>>> the hive-related settings.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the
>> job
>> > > and
>> > > > > >>> exit.
>> > > > > >>>>>>> If
>> > > > > >>>>>>>>>> the queries never end, users have to cancel the job by
>> > > > > >> themselves,
>> > > > > >>>>>>> which is
>> > > > > >>>>>>>>>> not reliable(people may forget their jobs). In most
>> case,
>> > > > > queries
>> > > > > >>>>>>> are used
>> > > > > >>>>>>>>>> to analyze the data. Users should use queries in the
>> > > > interactive
>> > > > > >>>>>>> mode.
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> Best,
>> > > > > >>>>>>>>>> Shengkai
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
>> think
>> > it
>> > > > > >>> covers a
>> > > > > >>>>>>>>>>> lot of useful features which will dramatically improve
>> > the
>> > > > > >>>>>>> usability of our
>> > > > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
>> > > configurations
>> > > > > >> via
>> > > > > >>>>>>> the
>> > > > > >>>>>>>>>>> SET command? A connector may have its own
>> configurations
>> > > and
>> > > > we
>> > > > > >>>>>>> don't have
>> > > > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
>> > > > Client.
>> > > > > >> For
>> > > > > >>>>>>> example,
>> > > > > >>>>>>>>>>> users may want to be able to change hive conf when
>> using
>> > > hive
>> > > > > >>>>>>> connector [1].
>> > > > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
>> files
>> > > > > >> specified
>> > > > > >>>>>>> with
>> > > > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but
>> > allows
>> > > > > >>> queries
>> > > > > >>>>>>> in the
>> > > > > >>>>>>>>>>> file. And a common use case is to run some query and
>> > > redirect
>> > > > > >> the
>> > > > > >>>>>>> results
>> > > > > >>>>>>>>>>> to a file. So I think maybe flink users would like to
>> do
>> > > the
>> > > > > >> same,
>> > > > > >>>>>>>>>>> especially in batch scenarios.
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>> > > > > >>>>>>> liuyang0704@gmail.com>
>> > > > > >>>>>>>>>>> wrote:
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>>> Hi Shengkai,
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>> Glad to see this improvement. And I have some
>> additional
>> > > > > >>>>>>> suggestions:
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>> > > > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
>> sql.
>> > > > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
>> > > collect
>> > > > > >> the
>> > > > > >>>>>>>>>>>> results
>> > > > > >>>>>>>>>>>> locally all at once using accumulators at present,
>> > > > > >>>>>>>>>>>>         which may have memory issues in JM or Local
>> for
>> > > the
>> > > > > big
>> > > > > >>> query
>> > > > > >>>>>>>>>>>> result.
>> > > > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
>> > > > > >>>>>>>>>>>>         We may change to use SelectTableSink, which
>> is
>> > > based
>> > > > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>> > > > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which
>> is in
>> > > > > >> FLIP-91.
>> > > > > >>>>>>> Seems
>> > > > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
>> > > > > >>>>>>>>>>>>         Provide a long running service out of the
>> box to
>> > > > > >>> facilitate
>> > > > > >>>>>>> the
>> > > > > >>>>>>>>>>>> sql
>> > > > > >>>>>>>>>>>> submission is necessary.
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>> What do you think of these?
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>> [1]
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
>> > 下午8:54写道：
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> Hi devs,
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> Jark and I want to start a discussion about
>> > FLIP-163:SQL
>> > > > > >> Client
>> > > > > >>>>>>>>>>>>> Improvements.
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> Many users have complained about the problems of the
>> > sql
>> > > > > >> client.
>> > > > > >>>>>>> For
>> > > > > >>>>>>>>>>>>> example, users can not register the table proposed
>> by
>> > > > > FLIP-95.
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> The main changes in this FLIP:
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to
>> > initialize
>> > > > the
>> > > > > >>>>>>> table
>> > > > > >>>>>>>>>>>>> environment and deprecated YAML file;
>> > > > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
>> > > parameter;
>> > > > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>> > > > > >>>>>>>>>>>>> - support statement set syntax;
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> For more detailed changes, please refer to
>> FLIP-163[1].
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> Look forward to your feedback.
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> Best,
>> > > > > >>>>>>>>>>>>> Shengkai
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>> [1]
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>> > > > > >>>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>> --
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>> *With kind regards
>> > > > > >>>>>>>>>>>>
>> > > ------------------------------------------------------------
>> > > > > >>>>>>>>>>>> Sebastian Liu 刘洋
>> > > > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
>> > > > Science
>> > > > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>> > > > > >>>>>>>>>>>> E-mail: liuyang0704@gmail.com <liuyang0704@gmail.com
>> >
>> > > > > >>>>>>>>>>>> QQ: 3239559*
>> > > > > >>>>>>>>>>>>
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>> --
>> > > > > >>>>>>>>>>> Best regards!
>> > > > > >>>>>>>>>>> Rui Li
>> > > > > >>>>>>>>>>>
>> > > > > >>>>>>>>>>
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>> --
>> > > > > >>>>>>>>> Best regards!
>> > > > > >>>>>>>>> Rui Li
>> > > > > >>>>>>>>>
>> > > > > >>>>>>>>
>> > > > > >>>>>>>
>> > > > > >>>>>>> --
>> > > > > >>>>>>> Best regards!
>> > > > > >>>>>>> Rui Li
>> > > > > >>>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>>
>> > > > > >>>>>> --
>> > > > > >>>>>> Best, Jingsong Lee
>> > > > > >>>>>>
>> > > > > >>>>>
>> > > > > >>>>
>> > > > > >>>
>> > > > > >>>
>> > > > > >>
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > > --
>> > > > Best regards!
>> > > > Rui Li
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi Ingo,

Since we have supported the WITH syntax and SET command since v1.9 [1][2],
and
we have never received such complaints, I think it's fine for such
differences.

Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
string literal keys[3],
and the SET <key>=<value> doesn't allow quoted keys [4].

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
[2]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
[3]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
[4]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
(search "set mapred.reduce.tasks=32")

On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <in...@ververica.com> wrote:

> Hi,
>
> regarding the (un-)quoted question, compatibility is of course an important
> argument, but in terms of consistency I'd find it a bit surprising that
> WITH handles it differently than SET, and I wonder if that could cause
> friction for developers when writing their SQL.
>
>
> Regards
> Ingo
>
> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:
>
> > Hi all,
> >
> > Regarding "One Parser", I think it's not possible for now because Calcite
> > parser can't parse
> > special characters (e.g. "-") unless quoting them as string literals.
> > That's why the WITH option
> > key are string literals not identifiers.
> >
> > SET table.exec.mini-batch.enabled = true and ADD JAR
> > /local/my-home/test.jar
> > have the same
> > problems. That's why we propose two parser, one splits lines into
> multiple
> > statements and match special
> > command through regex which is light-weight, and delegate other
> statements
> > to the other parser which is Calcite parser.
> >
> > Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
> > true syntax,
> > both for backward-compatibility and easy-to-use, and all the other
> systems
> > don't have quotes on the key.
> >
> >
> > Regarding "table.planner" vs "sql-client.planner",
> > if we want to use "table.planner", I think we should explain clearly
> what's
> > the scope it can be used in documentation.
> > Otherwise, there will be users complaining why the planner doesn't change
> > when setting the configuration on TableEnv.
> > Would be better throwing an exception to indicate users it's now allowed
> to
> > change planner after TableEnv is initialized.
> > However, it seems not easy to implement.
> >
> > Best,
> > Jark
> >
> > On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > Regarding "table.planner" and "table.execution-mode"
> > > If we define that those two options are just used to initialize the
> > > TableEnvironment, +1 for introducing table options instead of
> sql-client
> > > options.
> > >
> > > Regarding "the sql client, we will maintain two parsers", I want to
> give
> > > more inputs:
> > > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> > > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client
> and
> > > the gateway service will communicate through Rest API. The " ADD JAR
> > > /local/path/jar " will be executed in the CLI client machine. So when
> we
> > > submit a sql file which contains multiple statements, the CLI client
> > needs
> > > to pick out the "ADD JAR" line, and also statements need to be
> submitted
> > or
> > > executed one by one to make sure the result is correct. The sql file
> may
> > be
> > > look like:
> > >
> > > SET xxx=yyy;
> > > create table my_table ...;
> > > create table my_sink ...;
> > > ADD JAR /local/path/jar1;
> > > create function my_udf as com....MyUdf;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > > REMOVE JAR /local/path/jar1;
> > > drop function my_udf;
> > > ADD JAR /local/path/jar2;
> > > create function my_udf as com....MyUdf2;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > >
> > > The lines need to be splitted into multiple statements first in the CLI
> > > client, there are two approaches:
> > > 1. The CLI client depends on the sql-parser: the sql-parser splits the
> > > lines and tells which lines are "ADD JAR".
> > > pro: there is only one parser
> > > cons: It's a little heavy that the CLI client depends on the
> sql-parser,
> > > because the CLI client is just a simple tool which receives the user
> > > commands and displays the result. The non "ADD JAR" command will be
> > parsed
> > > twice.
> > >
> > > 2. The CLI client splits the lines into multiple statements and finds
> the
> > > ADD JAR command through regex matching.
> > > pro: The CLI client is very light-weight.
> > > cons: there are two parsers.
> > >
> > > (personally, I prefer the second option)
> > >
> > > Regarding "SHOW or LIST JARS", I think we can support them both.
> > > For default dialect, we support SHOW JARS, but if we switch to hive
> > > dialect, LIST JARS is also supported.
> > >
> > >
> > > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >
> > > Best,
> > > Godfrey
> > >
> > > Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> > >
> > > > Hi guys,
> > > >
> > > > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > > > commands than LIST JARS. I don't have a strong opinion about REMOVE
> vs
> > > > DELETE though.
> > > >
> > > > While flink doesn't need to follow hive syntax, as far as I know,
> most
> > > > users who are requesting these features are previously hive users.
> So I
> > > > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
> > JARS
> > > > as synonyms? It's just like lots of systems accept both EXIT and QUIT
> > as
> > > > the command to terminate the program. So if that's not hard to
> achieve,
> > > and
> > > > will make users happier, I don't see a reason why we must choose one
> > over
> > > > the other.
> > > >
> > > > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org>
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > some feedback regarding the open questions. Maybe we can discuss
> the
> > > > > `TableEnvironment.executeMultiSql` story offline to determine how
> we
> > > > > proceed with this in the near future.
> > > > >
> > > > > 1) "whether the table environment has the ability to update itself"
> > > > >
> > > > > Maybe there was some misunderstanding. I don't think that we should
> > > > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > > > "old")`. Instead I'm proposing to support
> > > > > `TableEnvironment.create(Configuration)` where planner and
> execution
> > > > > mode are read immediately and a subsequent changes to these options
> > > will
> > > > > have no effect. We are doing it similar in `new
> > > > > StreamExecutionEnvironment(Configuration)`. These two
> ConfigOption's
> > > > > must not be SQL Client specific but can be part of the core table
> > code
> > > > > base. Many users would like to get a 100% preconfigured environment
> > > from
> > > > > just Configuration. And this is not possible right now. We can
> solve
> > > > > both use cases in one change.
> > > > >
> > > > > 2) "the sql client, we will maintain two parsers"
> > > > >
> > > > > I remember we had some discussion about this and decided that we
> > would
> > > > > like to maintain only one parser. In the end it is "One Flink SQL"
> > > where
> > > > > commands influence each other also with respect to keywords. It
> > should
> > > > > be fine to include the SQL Client commands in the Flink parser. Of
> > > > > cource the table environment would not be able to handle the
> > > `Operation`
> > > > > instance that would be the result but we can introduce hooks to
> > handle
> > > > > those `Operation`s. Or we introduce parser extensions.
> > > > >
> > > > > Can we skip `table.job.async` in the first version? We should
> further
> > > > > discuss whether we introduce a special SQL clause for wrapping
> async
> > > > > behavior or if we use a config option? Esp. for streaming queries
> we
> > > > > need to be careful and should force users to either "one INSERT
> INTO"
> > > or
> > > > > "one STATEMENT SET".
> > > > >
> > > > > 3) 4) "HIVE also uses these commands"
> > > > >
> > > > > In general, Hive is not a good reference. Aligning the commands
> more
> > > > > with the remaining commands should be our goal. We just had a
> MODULE
> > > > > discussion where we selected SHOW instead of LIST. But it is true
> > that
> > > > > JARs are not part of the catalog which is why I would not use
> > > > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> > language.
> > > > > Take a look at the Java collection API as another example.
> > > > >
> > > > > 6) "Most of the commands should belong to the table environment"
> > > > >
> > > > > Thanks for updating the FLIP this makes things easier to
> understand.
> > It
> > > > > is good to see that most commends will be available in
> > > TableEnvironment.
> > > > > However, I would also support SET and RESET for consistency. Again,
> > > from
> > > > > an architectural point of view, if we would allow some kind of
> > > > > `Operation` hook in table environment, we could check for SQL
> Client
> > > > > specific options and forward to regular
> > `TableConfig.getConfiguration`
> > > > > otherwise. What do you think?
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > >
> > > > > On 03.02.21 08:58, Jark Wu wrote:
> > > > > > Hi Timo,
> > > > > >
> > > > > > I will respond some of the questions:
> > > > > >
> > > > > > 1) SQL client specific options
> > > > > >
> > > > > > Whether it starts with "table" or "sql-client" depends on where
> the
> > > > > > configuration takes effect.
> > > > > > If it is a table configuration, we should make clear what's the
> > > > behavior
> > > > > > when users change
> > > > > > the configuration in the lifecycle of TableEnvironment.
> > > > > >
> > > > > > I agree with Shengkai `sql-client.planner` and
> > > > > `sql-client.execution.mode`
> > > > > > are something special
> > > > > > that can't be changed after TableEnvironment has been
> initialized.
> > > You
> > > > > can
> > > > > > see
> > > > > > `StreamExecutionEnvironment` provides `configure()`  method to
> > > override
> > > > > > configuration after
> > > > > > StreamExecutionEnvironment has been initialized.
> > > > > >
> > > > > > Therefore, I think it would be better to still use
> > > > `sql-client.planner`
> > > > > > and `sql-client.execution.mode`.
> > > > > >
> > > > > > 2) Execution file
> > > > > >
> > > > > >>From my point of view, there is a big difference between
> > > > > > `sql-client.job.detach` and
> > > > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> > > will
> > > > > > affect every single DML statement
> > > > > > in the terminal, not only the statements in SQL files. I think
> the
> > > > single
> > > > > > DML statement in the interactive
> > > > > > terminal is something like tEnv#executeSql() instead of
> > > > > > tEnv#executeMultiSql.
> > > > > > So I don't like the "multi" and "sql" keyword in
> > > > `table.multi-sql-async`.
> > > > > > I just find that runtime provides a configuration called
> > > > > > "execution.attached" [1] which is false by default
> > > > > > which specifies if the pipeline is submitted in attached or
> > detached
> > > > > mode.
> > > > > > It provides exactly the same
> > > > > > functionality of `sql-client.job.detach`. What do you think about
> > > using
> > > > > > this option?
> > > > > >
> > > > > > If we also want to support this config in TableEnvironment, I
> think
> > > it
> > > > > > should also affect the DML execution
> > > > > >   of `tEnv#executeSql()`, not only DMLs in
> > `tEnv#executeMultiSql()`.
> > > > > > Therefore, the behavior may look like this:
> > > > > >
> > > > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
> by
> > > > > default
> > > > > > tableResult.await()   ==> manually block until finish
> > > > > >
> tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > > > "true")
> > > > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> > > don't
> > > > > need
> > > > > > to wait on the TableResult
> > > > > > tEnv.executeMultiSql(
> > > > > > """
> > > > > > CREATE TABLE ....  ==> always sync
> > > > > > INSERT INTO ...  => sync, because we set configuration above
> > > > > > SET execution.attached = false;
> > > > > > INSERT INTO ...  => async
> > > > > > """)
> > > > > >
> > > > > > On the other hand, I think `sql-client.job.detach`
> > > > > > and `TableEnvironment.executeMultiSql()` should be two separate
> > > topics,
> > > > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > > > I'm fine with making `executeMultiSql()` clear but don't want it
> to
> > > > block
> > > > > > this FLIP, maybe we can discuss this in another thread.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > > > >
> > > > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hi, Timo.
> > > > > >> Thanks for your detailed feedback. I have some thoughts about
> your
> > > > > >> feedback.
> > > > > >>
> > > > > >> *Regarding #1*: I think the main problem is whether the table
> > > > > environment
> > > > > >> has the ability to update itself. Let's take a simple program as
> > an
> > > > > >> example.
> > > > > >>
> > > > > >>
> > > > > >> ```
> > > > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > > > >>
> > > > > >> tEnv.getConfig.getConfiguration.setString("table.planner",
> "old");
> > > > > >>
> > > > > >>
> > > > > >> tEnv.executeSql("...");
> > > > > >>
> > > > > >> ```
> > > > > >>
> > > > > >> If we regard this option as a table option, users don't have to
> > > create
> > > > > >> another table environment manually. In that case, tEnv needs to
> > > check
> > > > > >> whether the current mode and planner are the same as before when
> > > > > executeSql
> > > > > >> or explainSql. I don't think it's easy work for the table
> > > environment,
> > > > > >> especially if users have a StreamExecutionEnvironment but set
> old
> > > > > planner
> > > > > >> and batch mode. But when we make this option as a sql client
> > option,
> > > > > users
> > > > > >> only use the SET command to change the setting. We can rebuild a
> > new
> > > > > table
> > > > > >> environment when set successes.
> > > > > >>
> > > > > >>
> > > > > >> *Regarding #2*: I think we need to discuss the implementation
> > before
> > > > > >> continuing this topic. In the sql client, we will maintain two
> > > > parsers.
> > > > > The
> > > > > >> first parser(client parser) will only match the sql client
> > commands.
> > > > If
> > > > > the
> > > > > >> client parser can't parse the statement, we will leverage the
> > power
> > > of
> > > > > the
> > > > > >> table environment to execute. According to our blueprint,
> > > > > >> TableEnvironment#executeSql is enough for the sql client.
> > Therefore,
> > > > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > > > >>
> > > > > >> But if we need to introduce the
> `TableEnvironment.executeMultiSql`
> > > in
> > > > > the
> > > > > >> future, I think it's OK to use the option
> `table.multi-sql-async`
> > > > rather
> > > > > >> than option `sql-client.job.detach`. But we think the name is
> not
> > > > > suitable
> > > > > >> because the name is confusing for others. When setting the
> option
> > > > > false, we
> > > > > >> just mean it will block the execution of the INSERT INTO
> > statement,
> > > > not
> > > > > DDL
> > > > > >> or others(other sql statements are always executed
> synchronously).
> > > So
> > > > > how
> > > > > >> about `table.job.async`? It only works for the sql-client and
> the
> > > > > >> executeMultiSql. If we set this value false, the table
> environment
> > > > will
> > > > > >> return the result until the job finishes.
> > > > > >>
> > > > > >>
> > > > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and
> > LIST
> > > > JAR
> > > > > >> because HIVE also uses these commands to add the jar into the
> > > > classpath
> > > > > or
> > > > > >> delete the jar. If we use  such commands, it can reduce our work
> > for
> > > > > hive
> > > > > >> compatibility.
> > > > > >>
> > > > > >> For SHOW JAR, I think the main concern is the jars are not
> > > maintained
> > > > by
> > > > > >> the Catalog. If we really needs to keep consistent with SQL
> > grammar,
> > > > > maybe
> > > > > >> we should use
> > > > > >>
> > > > > >> `ADD JAR` -> `CREATE JAR`,
> > > > > >> `DELETE JAR` -> `DROP JAR`,
> > > > > >> `LIST JAR` -> `SHOW JAR`.
> > > > > >>
> > > > > >> *Regarding #5*: I agree with you that we'd better keep
> consistent.
> > > > > >>
> > > > > >> *Regarding #6*: Yes. Most of the commands should belong to the
> > table
> > > > > >> environment. In the Summary section, I use the <NOTE> tag to
> > > identify
> > > > > which
> > > > > >> commands should belong to the sql client and which commands
> should
> > > > > belong
> > > > > >> to the table environment. I also add a new section about
> > > > implementation
> > > > > >> details in the FLIP.
> > > > > >>
> > > > > >> Best,
> > > > > >> Shengkai
> > > > > >>
> > > > > >> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> > > > > >>
> > > > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > > > Client
> > > > > a
> > > > > >>> very good update and make it production ready.
> > > > > >>>
> > > > > >>> Here is some feedback from my side:
> > > > > >>>
> > > > > >>> 1) SQL client specific options
> > > > > >>>
> > > > > >>> I don't think that `sql-client.planner` and
> > > > `sql-client.execution.mode`
> > > > > >>> are SQL Client specific. Similar to
> `StreamExecutionEnvironment`
> > > and
> > > > > >>> `ExecutionConfig#configure` that have been added recently, we
> > > should
> > > > > >>> offer a possibility for TableEnvironment. How about we offer
> > > > > >>> `TableEnvironment.create(ReadableConfig)` and add a
> > `table.planner`
> > > > and
> > > > > >>> `table.execution-mode` to
> > > > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > > > >>>
> > > > > >>> 2) Execution file
> > > > > >>>
> > > > > >>> Did you have a look at the Appendix of FLIP-84 [1] including
> the
> > > > > mailing
> > > > > >>> list thread at that time? Could you further elaborate how the
> > > > > >>> multi-statement execution should work for a unified
> > batch/streaming
> > > > > >>> story? According to our past discussions, each line in an
> > execution
> > > > > file
> > > > > >>> should be executed blocking which means a streaming query
> needs a
> > > > > >>> statement set to execute multiple INSERT INTO statement,
> correct?
> > > We
> > > > > >>> should also offer this functionality in
> > > > > >>> `TableEnvironment.executeMultiSql()`. Whether
> > > `sql-client.job.detach`
> > > > > is
> > > > > >>> SQL Client specific needs to be determined, it could also be a
> > > > general
> > > > > >>> `table.multi-sql-async` option?
> > > > > >>>
> > > > > >>> 3) DELETE JAR
> > > > > >>>
> > > > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
> like
> > > one
> > > > > is
> > > > > >>> actively deleting the JAR in the corresponding path.
> > > > > >>>
> > > > > >>> 4) LIST JAR
> > > > > >>>
> > > > > >>> This should be `SHOW JARS` according to other SQL commands such
> > as
> > > > > `SHOW
> > > > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > > > >>>
> > > > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > > > >>>
> > > > > >>> We should keep the details in sync with
> > > > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> > > about
> > > > > >>> differently named ExplainDetails. I would vote for
> > `ESTIMATED_COST`
> > > > > >>> instead of `COST`. I'm sure the original author had a reason
> why
> > to
> > > > > call
> > > > > >>> it that way.
> > > > > >>>
> > > > > >>> 6) Implementation details
> > > > > >>>
> > > > > >>> It would be nice to understand how we plan to implement the
> given
> > > > > >>> features. Most of the commands and config options should go
> into
> > > > > >>> TableEnvironment and SqlParser directly, correct? This way
> users
> > > > have a
> > > > > >>> unified way of using Flink SQL. TableEnvironment would provide
> a
> > > > > similar
> > > > > >>> user experience in notebooks or interactive programs than the
> SQL
> > > > > Client.
> > > > > >>>
> > > > > >>> [1]
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > > > >>> [2]
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Timo
> > > > > >>>
> > > > > >>>
> > > > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > > > `UNSET`.
> > > > > >>>>
> > > > > >>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > > > > >>>>
> > > > > >>>>> Hi, Jingsong.
> > > > > >>>>>
> > > > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > > > >>>>>
> > > > > >>>>> 1. We don't need to introduce another command `UNSET`.
> `RESET`
> > is
> > > > > >>>>> supported in the current sql client now. Our proposal just
> > > extends
> > > > > its
> > > > > >>>>> grammar and allow users to reset the specified keys.
> > > > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the
> default
> > > > > >>> value[1].
> > > > > >>>>> I think it is more friendly for batch users.
> > > > > >>>>>
> > > > > >>>>> Best,
> > > > > >>>>> Shengkai
> > > > > >>>>>
> > > > > >>>>> [1]
> > > > > >>>
> > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > > > >>>>>
> > > > > >>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > > > > >>>>>
> > > > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1
> > for
> > > > > >>>>>> improving it.
> > > > > >>>>>>
> > > > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > > > >>>>>>
> > > > > >>>>>> Best,
> > > > > >>>>>> Jingsong
> > > > > >>>>>>
> > > > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> lirui.fudan@gmail.com>
> > > > > wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Thanks Shengkai for the update! The proposed changes look
> > good
> > > to
> > > > > >> me.
> > > > > >>>>>>>
> > > > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> > > fskmine@gmail.com
> > > > >
> > > > > >>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi, Rui.
> > > > > >>>>>>>> You are right. I have already modified the FLIP.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The main changes:
> > > > > >>>>>>>>
> > > > > >>>>>>>> # -f parameter has no restriction about the statement
> type.
> > > > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> > > queries
> > > > to
> > > > > >>>>>>> debug
> > > > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > > > comparing
> > > > > >>> to
> > > > > >>>>>>>> writing INSERT INTO statements.
> > > > > >>>>>>>>
> > > > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> > > Users
> > > > > >> can
> > > > > >>>>>>> set
> > > > > >>>>>>>> this option false and the client will process the next job
> > > until
> > > > > >> the
> > > > > >>>>>>>> current job finishes. The default value of this option is
> > > false,
> > > > > >>> which
> > > > > >>>>>>>> means the client will execute the next job when the
> current
> > > job
> > > > is
> > > > > >>>>>>>> submitted.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Shengkai
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Shengkai,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > > > >> different
> > > > > >>>>>>>>> implications, and we should clarify the behavior. For
> > > example,
> > > > if
> > > > > >>> the
> > > > > >>>>>>>>> client just submits the job and exits, what happens if
> the
> > > file
> > > > > >>>>>>> contains
> > > > > >>>>>>>>> two INSERT statements? I don't think we should treat them
> > as
> > > a
> > > > > >>>>>>> statement
> > > > > >>>>>>>>> set, because users should explicitly write BEGIN
> STATEMENT
> > > SET
> > > > in
> > > > > >>> that
> > > > > >>>>>>>>> case. And the client shouldn't asynchronously submit the
> > two
> > > > > jobs,
> > > > > >>>>>>> because
> > > > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > > > fskmine@gmail.com
> > > > > >
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> Hi Rui,
> > > > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the
> > set
> > > > > >>>>>>> command. In
> > > > > >>>>>>>>>> the implementation, it will just put the key-value into
> > the
> > > > > >>>>>>>>>> `Configuration`, which will be used to generate the
> table
> > > > > config.
> > > > > >>> If
> > > > > >>>>>>> hive
> > > > > >>>>>>>>>> supports to read the setting from the table config,
> users
> > > are
> > > > > >> able
> > > > > >>>>>>> to set
> > > > > >>>>>>>>>> the hive-related settings.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the
> job
> > > and
> > > > > >>> exit.
> > > > > >>>>>>> If
> > > > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > > > >> themselves,
> > > > > >>>>>>> which is
> > > > > >>>>>>>>>> not reliable(people may forget their jobs). In most
> case,
> > > > > queries
> > > > > >>>>>>> are used
> > > > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > > > interactive
> > > > > >>>>>>> mode.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Shengkai
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
> think
> > it
> > > > > >>> covers a
> > > > > >>>>>>>>>>> lot of useful features which will dramatically improve
> > the
> > > > > >>>>>>> usability of our
> > > > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> > > configurations
> > > > > >> via
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>> SET command? A connector may have its own
> configurations
> > > and
> > > > we
> > > > > >>>>>>> don't have
> > > > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > > > Client.
> > > > > >> For
> > > > > >>>>>>> example,
> > > > > >>>>>>>>>>> users may want to be able to change hive conf when
> using
> > > hive
> > > > > >>>>>>> connector [1].
> > > > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
> files
> > > > > >> specified
> > > > > >>>>>>> with
> > > > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but
> > allows
> > > > > >>> queries
> > > > > >>>>>>> in the
> > > > > >>>>>>>>>>> file. And a common use case is to run some query and
> > > redirect
> > > > > >> the
> > > > > >>>>>>> results
> > > > > >>>>>>>>>>> to a file. So I think maybe flink users would like to
> do
> > > the
> > > > > >> same,
> > > > > >>>>>>>>>>> especially in batch scenarios.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > > > >>>>>>> liuyang0704@gmail.com>
> > > > > >>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> Hi Shengkai,
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Glad to see this improvement. And I have some
> additional
> > > > > >>>>>>> suggestions:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
> sql.
> > > > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> > > collect
> > > > > >> the
> > > > > >>>>>>>>>>>> results
> > > > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > > > >>>>>>>>>>>>         which may have memory issues in JM or Local
> for
> > > the
> > > > > big
> > > > > >>> query
> > > > > >>>>>>>>>>>> result.
> > > > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> > > based
> > > > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is
> in
> > > > > >> FLIP-91.
> > > > > >>>>>>> Seems
> > > > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > > > >>>>>>>>>>>>         Provide a long running service out of the box
> to
> > > > > >>> facilitate
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>>> sql
> > > > > >>>>>>>>>>>> submission is necessary.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> What do you think of these?
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> [1]
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
> > 下午8:54写道：
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Hi devs,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Jark and I want to start a discussion about
> > FLIP-163:SQL
> > > > > >> Client
> > > > > >>>>>>>>>>>>> Improvements.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Many users have complained about the problems of the
> > sql
> > > > > >> client.
> > > > > >>>>>>> For
> > > > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > > > FLIP-95.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to
> > initialize
> > > > the
> > > > > >>>>>>> table
> > > > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> > > parameter;
> > > > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > > > >>>>>>>>>>>>> - support statement set syntax;
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> For more detailed changes, please refer to
> FLIP-163[1].
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Look forward to your feedback.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Best,
> > > > > >>>>>>>>>>>>> Shengkai
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> [1]
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> *With kind regards
> > > > > >>>>>>>>>>>>
> > > ------------------------------------------------------------
> > > > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > > > Science
> > > > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > > > >>>>>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> > > > > >>>>>>>>>>>> QQ: 3239559*
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> --
> > > > > >>>>>>>>>>> Best regards!
> > > > > >>>>>>>>>>> Rui Li
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> --
> > > > > >>>>>>>>> Best regards!
> > > > > >>>>>>>>> Rui Li
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Best regards!
> > > > > >>>>>>> Rui Li
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> --
> > > > > >>>>>> Best, Jingsong Lee
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Ingo Bürk <in...@ververica.com>.

Hi,

regarding the (un-)quoted question, compatibility is of course an important
argument, but in terms of consistency I'd find it a bit surprising that
WITH handles it differently than SET, and I wonder if that could cause
friction for developers when writing their SQL.


Regards
Ingo

On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <im...@gmail.com> wrote:

> Hi all,
>
> Regarding "One Parser", I think it's not possible for now because Calcite
> parser can't parse
> special characters (e.g. "-") unless quoting them as string literals.
> That's why the WITH option
> key are string literals not identifiers.
>
> SET table.exec.mini-batch.enabled = true and ADD JAR
> /local/my-home/test.jar
> have the same
> problems. That's why we propose two parser, one splits lines into multiple
> statements and match special
> command through regex which is light-weight, and delegate other statements
> to the other parser which is Calcite parser.
>
> Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
> true syntax,
> both for backward-compatibility and easy-to-use, and all the other systems
> don't have quotes on the key.
>
>
> Regarding "table.planner" vs "sql-client.planner",
> if we want to use "table.planner", I think we should explain clearly what's
> the scope it can be used in documentation.
> Otherwise, there will be users complaining why the planner doesn't change
> when setting the configuration on TableEnv.
> Would be better throwing an exception to indicate users it's now allowed to
> change planner after TableEnv is initialized.
> However, it seems not easy to implement.
>
> Best,
> Jark
>
> On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > Regarding "table.planner" and "table.execution-mode"
> > If we define that those two options are just used to initialize the
> > TableEnvironment, +1 for introducing table options instead of sql-client
> > options.
> >
> > Regarding "the sql client, we will maintain two parsers", I want to give
> > more inputs:
> > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client and
> > the gateway service will communicate through Rest API. The " ADD JAR
> > /local/path/jar " will be executed in the CLI client machine. So when we
> > submit a sql file which contains multiple statements, the CLI client
> needs
> > to pick out the "ADD JAR" line, and also statements need to be submitted
> or
> > executed one by one to make sure the result is correct. The sql file may
> be
> > look like:
> >
> > SET xxx=yyy;
> > create table my_table ...;
> > create table my_sink ...;
> > ADD JAR /local/path/jar1;
> > create function my_udf as com....MyUdf;
> > insert into my_sink select ..., my_udf(xx) from ...;
> > REMOVE JAR /local/path/jar1;
> > drop function my_udf;
> > ADD JAR /local/path/jar2;
> > create function my_udf as com....MyUdf2;
> > insert into my_sink select ..., my_udf(xx) from ...;
> >
> > The lines need to be splitted into multiple statements first in the CLI
> > client, there are two approaches:
> > 1. The CLI client depends on the sql-parser: the sql-parser splits the
> > lines and tells which lines are "ADD JAR".
> > pro: there is only one parser
> > cons: It's a little heavy that the CLI client depends on the sql-parser,
> > because the CLI client is just a simple tool which receives the user
> > commands and displays the result. The non "ADD JAR" command will be
> parsed
> > twice.
> >
> > 2. The CLI client splits the lines into multiple statements and finds the
> > ADD JAR command through regex matching.
> > pro: The CLI client is very light-weight.
> > cons: there are two parsers.
> >
> > (personally, I prefer the second option)
> >
> > Regarding "SHOW or LIST JARS", I think we can support them both.
> > For default dialect, we support SHOW JARS, but if we switch to hive
> > dialect, LIST JARS is also supported.
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >
> > Best,
> > Godfrey
> >
> > Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> >
> > > Hi guys,
> > >
> > > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > > commands than LIST JARS. I don't have a strong opinion about REMOVE vs
> > > DELETE though.
> > >
> > > While flink doesn't need to follow hive syntax, as far as I know, most
> > > users who are requesting these features are previously hive users. So I
> > > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
> JARS
> > > as synonyms? It's just like lots of systems accept both EXIT and QUIT
> as
> > > the command to terminate the program. So if that's not hard to achieve,
> > and
> > > will make users happier, I don't see a reason why we must choose one
> over
> > > the other.
> > >
> > > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > some feedback regarding the open questions. Maybe we can discuss the
> > > > `TableEnvironment.executeMultiSql` story offline to determine how we
> > > > proceed with this in the near future.
> > > >
> > > > 1) "whether the table environment has the ability to update itself"
> > > >
> > > > Maybe there was some misunderstanding. I don't think that we should
> > > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > > "old")`. Instead I'm proposing to support
> > > > `TableEnvironment.create(Configuration)` where planner and execution
> > > > mode are read immediately and a subsequent changes to these options
> > will
> > > > have no effect. We are doing it similar in `new
> > > > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > > > must not be SQL Client specific but can be part of the core table
> code
> > > > base. Many users would like to get a 100% preconfigured environment
> > from
> > > > just Configuration. And this is not possible right now. We can solve
> > > > both use cases in one change.
> > > >
> > > > 2) "the sql client, we will maintain two parsers"
> > > >
> > > > I remember we had some discussion about this and decided that we
> would
> > > > like to maintain only one parser. In the end it is "One Flink SQL"
> > where
> > > > commands influence each other also with respect to keywords. It
> should
> > > > be fine to include the SQL Client commands in the Flink parser. Of
> > > > cource the table environment would not be able to handle the
> > `Operation`
> > > > instance that would be the result but we can introduce hooks to
> handle
> > > > those `Operation`s. Or we introduce parser extensions.
> > > >
> > > > Can we skip `table.job.async` in the first version? We should further
> > > > discuss whether we introduce a special SQL clause for wrapping async
> > > > behavior or if we use a config option? Esp. for streaming queries we
> > > > need to be careful and should force users to either "one INSERT INTO"
> > or
> > > > "one STATEMENT SET".
> > > >
> > > > 3) 4) "HIVE also uses these commands"
> > > >
> > > > In general, Hive is not a good reference. Aligning the commands more
> > > > with the remaining commands should be our goal. We just had a MODULE
> > > > discussion where we selected SHOW instead of LIST. But it is true
> that
> > > > JARs are not part of the catalog which is why I would not use
> > > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> language.
> > > > Take a look at the Java collection API as another example.
> > > >
> > > > 6) "Most of the commands should belong to the table environment"
> > > >
> > > > Thanks for updating the FLIP this makes things easier to understand.
> It
> > > > is good to see that most commends will be available in
> > TableEnvironment.
> > > > However, I would also support SET and RESET for consistency. Again,
> > from
> > > > an architectural point of view, if we would allow some kind of
> > > > `Operation` hook in table environment, we could check for SQL Client
> > > > specific options and forward to regular
> `TableConfig.getConfiguration`
> > > > otherwise. What do you think?
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 03.02.21 08:58, Jark Wu wrote:
> > > > > Hi Timo,
> > > > >
> > > > > I will respond some of the questions:
> > > > >
> > > > > 1) SQL client specific options
> > > > >
> > > > > Whether it starts with "table" or "sql-client" depends on where the
> > > > > configuration takes effect.
> > > > > If it is a table configuration, we should make clear what's the
> > > behavior
> > > > > when users change
> > > > > the configuration in the lifecycle of TableEnvironment.
> > > > >
> > > > > I agree with Shengkai `sql-client.planner` and
> > > > `sql-client.execution.mode`
> > > > > are something special
> > > > > that can't be changed after TableEnvironment has been initialized.
> > You
> > > > can
> > > > > see
> > > > > `StreamExecutionEnvironment` provides `configure()`  method to
> > override
> > > > > configuration after
> > > > > StreamExecutionEnvironment has been initialized.
> > > > >
> > > > > Therefore, I think it would be better to still use
> > > `sql-client.planner`
> > > > > and `sql-client.execution.mode`.
> > > > >
> > > > > 2) Execution file
> > > > >
> > > > >>From my point of view, there is a big difference between
> > > > > `sql-client.job.detach` and
> > > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> > will
> > > > > affect every single DML statement
> > > > > in the terminal, not only the statements in SQL files. I think the
> > > single
> > > > > DML statement in the interactive
> > > > > terminal is something like tEnv#executeSql() instead of
> > > > > tEnv#executeMultiSql.
> > > > > So I don't like the "multi" and "sql" keyword in
> > > `table.multi-sql-async`.
> > > > > I just find that runtime provides a configuration called
> > > > > "execution.attached" [1] which is false by default
> > > > > which specifies if the pipeline is submitted in attached or
> detached
> > > > mode.
> > > > > It provides exactly the same
> > > > > functionality of `sql-client.job.detach`. What do you think about
> > using
> > > > > this option?
> > > > >
> > > > > If we also want to support this config in TableEnvironment, I think
> > it
> > > > > should also affect the DML execution
> > > > >   of `tEnv#executeSql()`, not only DMLs in
> `tEnv#executeMultiSql()`.
> > > > > Therefore, the behavior may look like this:
> > > > >
> > > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> > > > default
> > > > > tableResult.await()   ==> manually block until finish
> > > > > tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > > "true")
> > > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> > don't
> > > > need
> > > > > to wait on the TableResult
> > > > > tEnv.executeMultiSql(
> > > > > """
> > > > > CREATE TABLE ....  ==> always sync
> > > > > INSERT INTO ...  => sync, because we set configuration above
> > > > > SET execution.attached = false;
> > > > > INSERT INTO ...  => async
> > > > > """)
> > > > >
> > > > > On the other hand, I think `sql-client.job.detach`
> > > > > and `TableEnvironment.executeMultiSql()` should be two separate
> > topics,
> > > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > > I'm fine with making `executeMultiSql()` clear but don't want it to
> > > block
> > > > > this FLIP, maybe we can discuss this in another thread.
> > > > >
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > [1]:
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > > >
> > > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
> > wrote:
> > > > >
> > > > >> Hi, Timo.
> > > > >> Thanks for your detailed feedback. I have some thoughts about your
> > > > >> feedback.
> > > > >>
> > > > >> *Regarding #1*: I think the main problem is whether the table
> > > > environment
> > > > >> has the ability to update itself. Let's take a simple program as
> an
> > > > >> example.
> > > > >>
> > > > >>
> > > > >> ```
> > > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > > >>
> > > > >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> > > > >>
> > > > >>
> > > > >> tEnv.executeSql("...");
> > > > >>
> > > > >> ```
> > > > >>
> > > > >> If we regard this option as a table option, users don't have to
> > create
> > > > >> another table environment manually. In that case, tEnv needs to
> > check
> > > > >> whether the current mode and planner are the same as before when
> > > > executeSql
> > > > >> or explainSql. I don't think it's easy work for the table
> > environment,
> > > > >> especially if users have a StreamExecutionEnvironment but set old
> > > > planner
> > > > >> and batch mode. But when we make this option as a sql client
> option,
> > > > users
> > > > >> only use the SET command to change the setting. We can rebuild a
> new
> > > > table
> > > > >> environment when set successes.
> > > > >>
> > > > >>
> > > > >> *Regarding #2*: I think we need to discuss the implementation
> before
> > > > >> continuing this topic. In the sql client, we will maintain two
> > > parsers.
> > > > The
> > > > >> first parser(client parser) will only match the sql client
> commands.
> > > If
> > > > the
> > > > >> client parser can't parse the statement, we will leverage the
> power
> > of
> > > > the
> > > > >> table environment to execute. According to our blueprint,
> > > > >> TableEnvironment#executeSql is enough for the sql client.
> Therefore,
> > > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > > >>
> > > > >> But if we need to introduce the `TableEnvironment.executeMultiSql`
> > in
> > > > the
> > > > >> future, I think it's OK to use the option `table.multi-sql-async`
> > > rather
> > > > >> than option `sql-client.job.detach`. But we think the name is not
> > > > suitable
> > > > >> because the name is confusing for others. When setting the option
> > > > false, we
> > > > >> just mean it will block the execution of the INSERT INTO
> statement,
> > > not
> > > > DDL
> > > > >> or others(other sql statements are always executed synchronously).
> > So
> > > > how
> > > > >> about `table.job.async`? It only works for the sql-client and the
> > > > >> executeMultiSql. If we set this value false, the table environment
> > > will
> > > > >> return the result until the job finishes.
> > > > >>
> > > > >>
> > > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and
> LIST
> > > JAR
> > > > >> because HIVE also uses these commands to add the jar into the
> > > classpath
> > > > or
> > > > >> delete the jar. If we use  such commands, it can reduce our work
> for
> > > > hive
> > > > >> compatibility.
> > > > >>
> > > > >> For SHOW JAR, I think the main concern is the jars are not
> > maintained
> > > by
> > > > >> the Catalog. If we really needs to keep consistent with SQL
> grammar,
> > > > maybe
> > > > >> we should use
> > > > >>
> > > > >> `ADD JAR` -> `CREATE JAR`,
> > > > >> `DELETE JAR` -> `DROP JAR`,
> > > > >> `LIST JAR` -> `SHOW JAR`.
> > > > >>
> > > > >> *Regarding #5*: I agree with you that we'd better keep consistent.
> > > > >>
> > > > >> *Regarding #6*: Yes. Most of the commands should belong to the
> table
> > > > >> environment. In the Summary section, I use the <NOTE> tag to
> > identify
> > > > which
> > > > >> commands should belong to the sql client and which commands should
> > > > belong
> > > > >> to the table environment. I also add a new section about
> > > implementation
> > > > >> details in the FLIP.
> > > > >>
> > > > >> Best,
> > > > >> Shengkai
> > > > >>
> > > > >> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> > > > >>
> > > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > > Client
> > > > a
> > > > >>> very good update and make it production ready.
> > > > >>>
> > > > >>> Here is some feedback from my side:
> > > > >>>
> > > > >>> 1) SQL client specific options
> > > > >>>
> > > > >>> I don't think that `sql-client.planner` and
> > > `sql-client.execution.mode`
> > > > >>> are SQL Client specific. Similar to `StreamExecutionEnvironment`
> > and
> > > > >>> `ExecutionConfig#configure` that have been added recently, we
> > should
> > > > >>> offer a possibility for TableEnvironment. How about we offer
> > > > >>> `TableEnvironment.create(ReadableConfig)` and add a
> `table.planner`
> > > and
> > > > >>> `table.execution-mode` to
> > > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > > >>>
> > > > >>> 2) Execution file
> > > > >>>
> > > > >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> > > > mailing
> > > > >>> list thread at that time? Could you further elaborate how the
> > > > >>> multi-statement execution should work for a unified
> batch/streaming
> > > > >>> story? According to our past discussions, each line in an
> execution
> > > > file
> > > > >>> should be executed blocking which means a streaming query needs a
> > > > >>> statement set to execute multiple INSERT INTO statement, correct?
> > We
> > > > >>> should also offer this functionality in
> > > > >>> `TableEnvironment.executeMultiSql()`. Whether
> > `sql-client.job.detach`
> > > > is
> > > > >>> SQL Client specific needs to be determined, it could also be a
> > > general
> > > > >>> `table.multi-sql-async` option?
> > > > >>>
> > > > >>> 3) DELETE JAR
> > > > >>>
> > > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like
> > one
> > > > is
> > > > >>> actively deleting the JAR in the corresponding path.
> > > > >>>
> > > > >>> 4) LIST JAR
> > > > >>>
> > > > >>> This should be `SHOW JARS` according to other SQL commands such
> as
> > > > `SHOW
> > > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > > >>>
> > > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > > >>>
> > > > >>> We should keep the details in sync with
> > > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> > about
> > > > >>> differently named ExplainDetails. I would vote for
> `ESTIMATED_COST`
> > > > >>> instead of `COST`. I'm sure the original author had a reason why
> to
> > > > call
> > > > >>> it that way.
> > > > >>>
> > > > >>> 6) Implementation details
> > > > >>>
> > > > >>> It would be nice to understand how we plan to implement the given
> > > > >>> features. Most of the commands and config options should go into
> > > > >>> TableEnvironment and SqlParser directly, correct? This way users
> > > have a
> > > > >>> unified way of using Flink SQL. TableEnvironment would provide a
> > > > similar
> > > > >>> user experience in notebooks or interactive programs than the SQL
> > > > Client.
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > > >>> [2]
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > > >>>
> > > > >>> Regards,
> > > > >>> Timo
> > > > >>>
> > > > >>>
> > > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > > `UNSET`.
> > > > >>>>
> > > > >>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > > > >>>>
> > > > >>>>> Hi, Jingsong.
> > > > >>>>>
> > > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > > >>>>>
> > > > >>>>> 1. We don't need to introduce another command `UNSET`. `RESET`
> is
> > > > >>>>> supported in the current sql client now. Our proposal just
> > extends
> > > > its
> > > > >>>>> grammar and allow users to reset the specified keys.
> > > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> > > > >>> value[1].
> > > > >>>>> I think it is more friendly for batch users.
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>> Shengkai
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>>
> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > > >>>>>
> > > > >>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > > > >>>>>
> > > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1
> for
> > > > >>>>>> improving it.
> > > > >>>>>>
> > > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Jingsong
> > > > >>>>>>
> > > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com>
> > > > wrote:
> > > > >>>>>>
> > > > >>>>>>> Thanks Shengkai for the update! The proposed changes look
> good
> > to
> > > > >> me.
> > > > >>>>>>>
> > > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> > fskmine@gmail.com
> > > >
> > > > >>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi, Rui.
> > > > >>>>>>>> You are right. I have already modified the FLIP.
> > > > >>>>>>>>
> > > > >>>>>>>> The main changes:
> > > > >>>>>>>>
> > > > >>>>>>>> # -f parameter has no restriction about the statement type.
> > > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> > queries
> > > to
> > > > >>>>>>> debug
> > > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > > comparing
> > > > >>> to
> > > > >>>>>>>> writing INSERT INTO statements.
> > > > >>>>>>>>
> > > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> > Users
> > > > >> can
> > > > >>>>>>> set
> > > > >>>>>>>> this option false and the client will process the next job
> > until
> > > > >> the
> > > > >>>>>>>> current job finishes. The default value of this option is
> > false,
> > > > >>> which
> > > > >>>>>>>> means the client will execute the next job when the current
> > job
> > > is
> > > > >>>>>>>> submitted.
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Shengkai
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Shengkai,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > > >> different
> > > > >>>>>>>>> implications, and we should clarify the behavior. For
> > example,
> > > if
> > > > >>> the
> > > > >>>>>>>>> client just submits the job and exits, what happens if the
> > file
> > > > >>>>>>> contains
> > > > >>>>>>>>> two INSERT statements? I don't think we should treat them
> as
> > a
> > > > >>>>>>> statement
> > > > >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT
> > SET
> > > in
> > > > >>> that
> > > > >>>>>>>>> case. And the client shouldn't asynchronously submit the
> two
> > > > jobs,
> > > > >>>>>>> because
> > > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > > fskmine@gmail.com
> > > > >
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Hi Rui,
> > > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the
> set
> > > > >>>>>>> command. In
> > > > >>>>>>>>>> the implementation, it will just put the key-value into
> the
> > > > >>>>>>>>>> `Configuration`, which will be used to generate the table
> > > > config.
> > > > >>> If
> > > > >>>>>>> hive
> > > > >>>>>>>>>> supports to read the setting from the table config, users
> > are
> > > > >> able
> > > > >>>>>>> to set
> > > > >>>>>>>>>> the hive-related settings.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job
> > and
> > > > >>> exit.
> > > > >>>>>>> If
> > > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > > >> themselves,
> > > > >>>>>>> which is
> > > > >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> > > > queries
> > > > >>>>>>> are used
> > > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > > interactive
> > > > >>>>>>> mode.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Shengkai
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think
> it
> > > > >>> covers a
> > > > >>>>>>>>>>> lot of useful features which will dramatically improve
> the
> > > > >>>>>>> usability of our
> > > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> > configurations
> > > > >> via
> > > > >>>>>>> the
> > > > >>>>>>>>>>> SET command? A connector may have its own configurations
> > and
> > > we
> > > > >>>>>>> don't have
> > > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > > Client.
> > > > >> For
> > > > >>>>>>> example,
> > > > >>>>>>>>>>> users may want to be able to change hive conf when using
> > hive
> > > > >>>>>>> connector [1].
> > > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> > > > >> specified
> > > > >>>>>>> with
> > > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but
> allows
> > > > >>> queries
> > > > >>>>>>> in the
> > > > >>>>>>>>>>> file. And a common use case is to run some query and
> > redirect
> > > > >> the
> > > > >>>>>>> results
> > > > >>>>>>>>>>> to a file. So I think maybe flink users would like to do
> > the
> > > > >> same,
> > > > >>>>>>>>>>> especially in batch scenarios.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > > >>>>>>> liuyang0704@gmail.com>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Hi Shengkai,
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> > > > >>>>>>> suggestions:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> > collect
> > > > >> the
> > > > >>>>>>>>>>>> results
> > > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > > >>>>>>>>>>>>         which may have memory issues in JM or Local for
> > the
> > > > big
> > > > >>> query
> > > > >>>>>>>>>>>> result.
> > > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> > based
> > > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> > > > >> FLIP-91.
> > > > >>>>>>> Seems
> > > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > > >>>>>>>>>>>>         Provide a long running service out of the box to
> > > > >>> facilitate
> > > > >>>>>>> the
> > > > >>>>>>>>>>>> sql
> > > > >>>>>>>>>>>> submission is necessary.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> What do you think of these?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> [1]
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四
> 下午8:54写道：
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Hi devs,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Jark and I want to start a discussion about
> FLIP-163:SQL
> > > > >> Client
> > > > >>>>>>>>>>>>> Improvements.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Many users have complained about the problems of the
> sql
> > > > >> client.
> > > > >>>>>>> For
> > > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > > FLIP-95.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to
> initialize
> > > the
> > > > >>>>>>> table
> > > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> > parameter;
> > > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > > >>>>>>>>>>>>> - support statement set syntax;
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Look forward to your feedback.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>>> Shengkai
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> [1]
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> --
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> *With kind regards
> > > > >>>>>>>>>>>>
> > ------------------------------------------------------------
> > > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > > Science
> > > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > > >>>>>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> > > > >>>>>>>>>>>> QQ: 3239559*
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>> Best regards!
> > > > >>>>>>>>>>> Rui Li
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> --
> > > > >>>>>>>>> Best regards!
> > > > >>>>>>>>> Rui Li
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Best regards!
> > > > >>>>>>> Rui Li
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> --
> > > > >>>>>> Best, Jingsong Lee
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi all,

Regarding "One Parser", I think it's not possible for now because Calcite
parser can't parse
special characters (e.g. "-") unless quoting them as string literals.
That's why the WITH option
key are string literals not identifiers.

SET table.exec.mini-batch.enabled = true and ADD JAR /local/my-home/test.jar
have the same
problems. That's why we propose two parser, one splits lines into multiple
statements and match special
command through regex which is light-weight, and delegate other statements
to the other parser which is Calcite parser.

Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
true syntax,
both for backward-compatibility and easy-to-use, and all the other systems
don't have quotes on the key.


Regarding "table.planner" vs "sql-client.planner",
if we want to use "table.planner", I think we should explain clearly what's
the scope it can be used in documentation.
Otherwise, there will be users complaining why the planner doesn't change
when setting the configuration on TableEnv.
Would be better throwing an exception to indicate users it's now allowed to
change planner after TableEnv is initialized.
However, it seems not easy to implement.

Best,
Jark

On Thu, 4 Feb 2021 at 15:49, godfrey he <go...@gmail.com> wrote:

> Hi everyone,
>
> Regarding "table.planner" and "table.execution-mode"
> If we define that those two options are just used to initialize the
> TableEnvironment, +1 for introducing table options instead of sql-client
> options.
>
> Regarding "the sql client, we will maintain two parsers", I want to give
> more inputs:
> We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client and
> the gateway service will communicate through Rest API. The " ADD JAR
> /local/path/jar " will be executed in the CLI client machine. So when we
> submit a sql file which contains multiple statements, the CLI client needs
> to pick out the "ADD JAR" line, and also statements need to be submitted or
> executed one by one to make sure the result is correct. The sql file may be
> look like:
>
> SET xxx=yyy;
> create table my_table ...;
> create table my_sink ...;
> ADD JAR /local/path/jar1;
> create function my_udf as com....MyUdf;
> insert into my_sink select ..., my_udf(xx) from ...;
> REMOVE JAR /local/path/jar1;
> drop function my_udf;
> ADD JAR /local/path/jar2;
> create function my_udf as com....MyUdf2;
> insert into my_sink select ..., my_udf(xx) from ...;
>
> The lines need to be splitted into multiple statements first in the CLI
> client, there are two approaches:
> 1. The CLI client depends on the sql-parser: the sql-parser splits the
> lines and tells which lines are "ADD JAR".
> pro: there is only one parser
> cons: It's a little heavy that the CLI client depends on the sql-parser,
> because the CLI client is just a simple tool which receives the user
> commands and displays the result. The non "ADD JAR" command will be parsed
> twice.
>
> 2. The CLI client splits the lines into multiple statements and finds the
> ADD JAR command through regex matching.
> pro: The CLI client is very light-weight.
> cons: there are two parsers.
>
> (personally, I prefer the second option)
>
> Regarding "SHOW or LIST JARS", I think we can support them both.
> For default dialect, we support SHOW JARS, but if we switch to hive
> dialect, LIST JARS is also supported.
>
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>
> Best,
> Godfrey
>
> Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：
>
> > Hi guys,
> >
> > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > commands than LIST JARS. I don't have a strong opinion about REMOVE vs
> > DELETE though.
> >
> > While flink doesn't need to follow hive syntax, as far as I know, most
> > users who are requesting these features are previously hive users. So I
> > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE JARS
> > as synonyms? It's just like lots of systems accept both EXIT and QUIT as
> > the command to terminate the program. So if that's not hard to achieve,
> and
> > will make users happier, I don't see a reason why we must choose one over
> > the other.
> >
> > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org> wrote:
> >
> > > Hi everyone,
> > >
> > > some feedback regarding the open questions. Maybe we can discuss the
> > > `TableEnvironment.executeMultiSql` story offline to determine how we
> > > proceed with this in the near future.
> > >
> > > 1) "whether the table environment has the ability to update itself"
> > >
> > > Maybe there was some misunderstanding. I don't think that we should
> > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > "old")`. Instead I'm proposing to support
> > > `TableEnvironment.create(Configuration)` where planner and execution
> > > mode are read immediately and a subsequent changes to these options
> will
> > > have no effect. We are doing it similar in `new
> > > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > > must not be SQL Client specific but can be part of the core table code
> > > base. Many users would like to get a 100% preconfigured environment
> from
> > > just Configuration. And this is not possible right now. We can solve
> > > both use cases in one change.
> > >
> > > 2) "the sql client, we will maintain two parsers"
> > >
> > > I remember we had some discussion about this and decided that we would
> > > like to maintain only one parser. In the end it is "One Flink SQL"
> where
> > > commands influence each other also with respect to keywords. It should
> > > be fine to include the SQL Client commands in the Flink parser. Of
> > > cource the table environment would not be able to handle the
> `Operation`
> > > instance that would be the result but we can introduce hooks to handle
> > > those `Operation`s. Or we introduce parser extensions.
> > >
> > > Can we skip `table.job.async` in the first version? We should further
> > > discuss whether we introduce a special SQL clause for wrapping async
> > > behavior or if we use a config option? Esp. for streaming queries we
> > > need to be careful and should force users to either "one INSERT INTO"
> or
> > > "one STATEMENT SET".
> > >
> > > 3) 4) "HIVE also uses these commands"
> > >
> > > In general, Hive is not a good reference. Aligning the commands more
> > > with the remaining commands should be our goal. We just had a MODULE
> > > discussion where we selected SHOW instead of LIST. But it is true that
> > > JARs are not part of the catalog which is why I would not use
> > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
> > > Take a look at the Java collection API as another example.
> > >
> > > 6) "Most of the commands should belong to the table environment"
> > >
> > > Thanks for updating the FLIP this makes things easier to understand. It
> > > is good to see that most commends will be available in
> TableEnvironment.
> > > However, I would also support SET and RESET for consistency. Again,
> from
> > > an architectural point of view, if we would allow some kind of
> > > `Operation` hook in table environment, we could check for SQL Client
> > > specific options and forward to regular `TableConfig.getConfiguration`
> > > otherwise. What do you think?
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 03.02.21 08:58, Jark Wu wrote:
> > > > Hi Timo,
> > > >
> > > > I will respond some of the questions:
> > > >
> > > > 1) SQL client specific options
> > > >
> > > > Whether it starts with "table" or "sql-client" depends on where the
> > > > configuration takes effect.
> > > > If it is a table configuration, we should make clear what's the
> > behavior
> > > > when users change
> > > > the configuration in the lifecycle of TableEnvironment.
> > > >
> > > > I agree with Shengkai `sql-client.planner` and
> > > `sql-client.execution.mode`
> > > > are something special
> > > > that can't be changed after TableEnvironment has been initialized.
> You
> > > can
> > > > see
> > > > `StreamExecutionEnvironment` provides `configure()`  method to
> override
> > > > configuration after
> > > > StreamExecutionEnvironment has been initialized.
> > > >
> > > > Therefore, I think it would be better to still use
> > `sql-client.planner`
> > > > and `sql-client.execution.mode`.
> > > >
> > > > 2) Execution file
> > > >
> > > >>From my point of view, there is a big difference between
> > > > `sql-client.job.detach` and
> > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> will
> > > > affect every single DML statement
> > > > in the terminal, not only the statements in SQL files. I think the
> > single
> > > > DML statement in the interactive
> > > > terminal is something like tEnv#executeSql() instead of
> > > > tEnv#executeMultiSql.
> > > > So I don't like the "multi" and "sql" keyword in
> > `table.multi-sql-async`.
> > > > I just find that runtime provides a configuration called
> > > > "execution.attached" [1] which is false by default
> > > > which specifies if the pipeline is submitted in attached or detached
> > > mode.
> > > > It provides exactly the same
> > > > functionality of `sql-client.job.detach`. What do you think about
> using
> > > > this option?
> > > >
> > > > If we also want to support this config in TableEnvironment, I think
> it
> > > > should also affect the DML execution
> > > >   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> > > > Therefore, the behavior may look like this:
> > > >
> > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> > > default
> > > > tableResult.await()   ==> manually block until finish
> > > > tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > "true")
> > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> don't
> > > need
> > > > to wait on the TableResult
> > > > tEnv.executeMultiSql(
> > > > """
> > > > CREATE TABLE ....  ==> always sync
> > > > INSERT INTO ...  => sync, because we set configuration above
> > > > SET execution.attached = false;
> > > > INSERT INTO ...  => async
> > > > """)
> > > >
> > > > On the other hand, I think `sql-client.job.detach`
> > > > and `TableEnvironment.executeMultiSql()` should be two separate
> topics,
> > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > I'm fine with making `executeMultiSql()` clear but don't want it to
> > block
> > > > this FLIP, maybe we can discuss this in another thread.
> > > >
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > >
> > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com>
> wrote:
> > > >
> > > >> Hi, Timo.
> > > >> Thanks for your detailed feedback. I have some thoughts about your
> > > >> feedback.
> > > >>
> > > >> *Regarding #1*: I think the main problem is whether the table
> > > environment
> > > >> has the ability to update itself. Let's take a simple program as an
> > > >> example.
> > > >>
> > > >>
> > > >> ```
> > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > >>
> > > >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> > > >>
> > > >>
> > > >> tEnv.executeSql("...");
> > > >>
> > > >> ```
> > > >>
> > > >> If we regard this option as a table option, users don't have to
> create
> > > >> another table environment manually. In that case, tEnv needs to
> check
> > > >> whether the current mode and planner are the same as before when
> > > executeSql
> > > >> or explainSql. I don't think it's easy work for the table
> environment,
> > > >> especially if users have a StreamExecutionEnvironment but set old
> > > planner
> > > >> and batch mode. But when we make this option as a sql client option,
> > > users
> > > >> only use the SET command to change the setting. We can rebuild a new
> > > table
> > > >> environment when set successes.
> > > >>
> > > >>
> > > >> *Regarding #2*: I think we need to discuss the implementation before
> > > >> continuing this topic. In the sql client, we will maintain two
> > parsers.
> > > The
> > > >> first parser(client parser) will only match the sql client commands.
> > If
> > > the
> > > >> client parser can't parse the statement, we will leverage the power
> of
> > > the
> > > >> table environment to execute. According to our blueprint,
> > > >> TableEnvironment#executeSql is enough for the sql client. Therefore,
> > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > >>
> > > >> But if we need to introduce the `TableEnvironment.executeMultiSql`
> in
> > > the
> > > >> future, I think it's OK to use the option `table.multi-sql-async`
> > rather
> > > >> than option `sql-client.job.detach`. But we think the name is not
> > > suitable
> > > >> because the name is confusing for others. When setting the option
> > > false, we
> > > >> just mean it will block the execution of the INSERT INTO statement,
> > not
> > > DDL
> > > >> or others(other sql statements are always executed synchronously).
> So
> > > how
> > > >> about `table.job.async`? It only works for the sql-client and the
> > > >> executeMultiSql. If we set this value false, the table environment
> > will
> > > >> return the result until the job finishes.
> > > >>
> > > >>
> > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST
> > JAR
> > > >> because HIVE also uses these commands to add the jar into the
> > classpath
> > > or
> > > >> delete the jar. If we use  such commands, it can reduce our work for
> > > hive
> > > >> compatibility.
> > > >>
> > > >> For SHOW JAR, I think the main concern is the jars are not
> maintained
> > by
> > > >> the Catalog. If we really needs to keep consistent with SQL grammar,
> > > maybe
> > > >> we should use
> > > >>
> > > >> `ADD JAR` -> `CREATE JAR`,
> > > >> `DELETE JAR` -> `DROP JAR`,
> > > >> `LIST JAR` -> `SHOW JAR`.
> > > >>
> > > >> *Regarding #5*: I agree with you that we'd better keep consistent.
> > > >>
> > > >> *Regarding #6*: Yes. Most of the commands should belong to the table
> > > >> environment. In the Summary section, I use the <NOTE> tag to
> identify
> > > which
> > > >> commands should belong to the sql client and which commands should
> > > belong
> > > >> to the table environment. I also add a new section about
> > implementation
> > > >> details in the FLIP.
> > > >>
> > > >> Best,
> > > >> Shengkai
> > > >>
> > > >> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> > > >>
> > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > Client
> > > a
> > > >>> very good update and make it production ready.
> > > >>>
> > > >>> Here is some feedback from my side:
> > > >>>
> > > >>> 1) SQL client specific options
> > > >>>
> > > >>> I don't think that `sql-client.planner` and
> > `sql-client.execution.mode`
> > > >>> are SQL Client specific. Similar to `StreamExecutionEnvironment`
> and
> > > >>> `ExecutionConfig#configure` that have been added recently, we
> should
> > > >>> offer a possibility for TableEnvironment. How about we offer
> > > >>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner`
> > and
> > > >>> `table.execution-mode` to
> > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > >>>
> > > >>> 2) Execution file
> > > >>>
> > > >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> > > mailing
> > > >>> list thread at that time? Could you further elaborate how the
> > > >>> multi-statement execution should work for a unified batch/streaming
> > > >>> story? According to our past discussions, each line in an execution
> > > file
> > > >>> should be executed blocking which means a streaming query needs a
> > > >>> statement set to execute multiple INSERT INTO statement, correct?
> We
> > > >>> should also offer this functionality in
> > > >>> `TableEnvironment.executeMultiSql()`. Whether
> `sql-client.job.detach`
> > > is
> > > >>> SQL Client specific needs to be determined, it could also be a
> > general
> > > >>> `table.multi-sql-async` option?
> > > >>>
> > > >>> 3) DELETE JAR
> > > >>>
> > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like
> one
> > > is
> > > >>> actively deleting the JAR in the corresponding path.
> > > >>>
> > > >>> 4) LIST JAR
> > > >>>
> > > >>> This should be `SHOW JARS` according to other SQL commands such as
> > > `SHOW
> > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > >>>
> > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > >>>
> > > >>> We should keep the details in sync with
> > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> about
> > > >>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> > > >>> instead of `COST`. I'm sure the original author had a reason why to
> > > call
> > > >>> it that way.
> > > >>>
> > > >>> 6) Implementation details
> > > >>>
> > > >>> It would be nice to understand how we plan to implement the given
> > > >>> features. Most of the commands and config options should go into
> > > >>> TableEnvironment and SqlParser directly, correct? This way users
> > have a
> > > >>> unified way of using Flink SQL. TableEnvironment would provide a
> > > similar
> > > >>> user experience in notebooks or interactive programs than the SQL
> > > Client.
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > >>> [2]
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > >>>
> > > >>> Regards,
> > > >>> Timo
> > > >>>
> > > >>>
> > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > `UNSET`.
> > > >>>>
> > > >>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > > >>>>
> > > >>>>> Hi, Jingsong.
> > > >>>>>
> > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > >>>>>
> > > >>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
> > > >>>>> supported in the current sql client now. Our proposal just
> extends
> > > its
> > > >>>>> grammar and allow users to reset the specified keys.
> > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> > > >>> value[1].
> > > >>>>> I think it is more friendly for batch users.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Shengkai
> > > >>>>>
> > > >>>>> [1]
> > > >>>
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > >>>>>
> > > >>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > > >>>>>
> > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> > > >>>>>> improving it.
> > > >>>>>>
> > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Jingsong
> > > >>>>>>
> > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com>
> > > wrote:
> > > >>>>>>
> > > >>>>>>> Thanks Shengkai for the update! The proposed changes look good
> to
> > > >> me.
> > > >>>>>>>
> > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> fskmine@gmail.com
> > >
> > > >>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi, Rui.
> > > >>>>>>>> You are right. I have already modified the FLIP.
> > > >>>>>>>>
> > > >>>>>>>> The main changes:
> > > >>>>>>>>
> > > >>>>>>>> # -f parameter has no restriction about the statement type.
> > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> queries
> > to
> > > >>>>>>> debug
> > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > comparing
> > > >>> to
> > > >>>>>>>> writing INSERT INTO statements.
> > > >>>>>>>>
> > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> Users
> > > >> can
> > > >>>>>>> set
> > > >>>>>>>> this option false and the client will process the next job
> until
> > > >> the
> > > >>>>>>>> current job finishes. The default value of this option is
> false,
> > > >>> which
> > > >>>>>>>> means the client will execute the next job when the current
> job
> > is
> > > >>>>>>>> submitted.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Shengkai
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > > >>>>>>>>
> > > >>>>>>>>> Hi Shengkai,
> > > >>>>>>>>>
> > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > >> different
> > > >>>>>>>>> implications, and we should clarify the behavior. For
> example,
> > if
> > > >>> the
> > > >>>>>>>>> client just submits the job and exits, what happens if the
> file
> > > >>>>>>> contains
> > > >>>>>>>>> two INSERT statements? I don't think we should treat them as
> a
> > > >>>>>>> statement
> > > >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT
> SET
> > in
> > > >>> that
> > > >>>>>>>>> case. And the client shouldn't asynchronously submit the two
> > > jobs,
> > > >>>>>>> because
> > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > fskmine@gmail.com
> > > >
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi Rui,
> > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> > > >>>>>>> command. In
> > > >>>>>>>>>> the implementation, it will just put the key-value into the
> > > >>>>>>>>>> `Configuration`, which will be used to generate the table
> > > config.
> > > >>> If
> > > >>>>>>> hive
> > > >>>>>>>>>> supports to read the setting from the table config, users
> are
> > > >> able
> > > >>>>>>> to set
> > > >>>>>>>>>> the hive-related settings.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job
> and
> > > >>> exit.
> > > >>>>>>> If
> > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > >> themselves,
> > > >>>>>>> which is
> > > >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> > > queries
> > > >>>>>>> are used
> > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > interactive
> > > >>>>>>> mode.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Shengkai
> > > >>>>>>>>>>
> > > >>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> > > >>> covers a
> > > >>>>>>>>>>> lot of useful features which will dramatically improve the
> > > >>>>>>> usability of our
> > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> configurations
> > > >> via
> > > >>>>>>> the
> > > >>>>>>>>>>> SET command? A connector may have its own configurations
> and
> > we
> > > >>>>>>> don't have
> > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > Client.
> > > >> For
> > > >>>>>>> example,
> > > >>>>>>>>>>> users may want to be able to change hive conf when using
> hive
> > > >>>>>>> connector [1].
> > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> > > >> specified
> > > >>>>>>> with
> > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
> > > >>> queries
> > > >>>>>>> in the
> > > >>>>>>>>>>> file. And a common use case is to run some query and
> redirect
> > > >> the
> > > >>>>>>> results
> > > >>>>>>>>>>> to a file. So I think maybe flink users would like to do
> the
> > > >> same,
> > > >>>>>>>>>>> especially in batch scenarios.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > >>>>>>> liuyang0704@gmail.com>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Shengkai,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> > > >>>>>>> suggestions:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> collect
> > > >> the
> > > >>>>>>>>>>>> results
> > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > >>>>>>>>>>>>         which may have memory issues in JM or Local for
> the
> > > big
> > > >>> query
> > > >>>>>>>>>>>> result.
> > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> based
> > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> > > >> FLIP-91.
> > > >>>>>>> Seems
> > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > >>>>>>>>>>>>         Provide a long running service out of the box to
> > > >>> facilitate
> > > >>>>>>> the
> > > >>>>>>>>>>>> sql
> > > >>>>>>>>>>>> submission is necessary.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> What do you think of these?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi devs,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> > > >> Client
> > > >>>>>>>>>>>>> Improvements.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Many users have complained about the problems of the sql
> > > >> client.
> > > >>>>>>> For
> > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > FLIP-95.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize
> > the
> > > >>>>>>> table
> > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> parameter;
> > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > >>>>>>>>>>>>> - support statement set syntax;
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Look forward to your feedback.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> --
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> *With kind regards
> > > >>>>>>>>>>>>
> ------------------------------------------------------------
> > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > Science
> > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > >>>>>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> > > >>>>>>>>>>>> QQ: 3239559*
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>> Best regards!
> > > >>>>>>>>>>> Rui Li
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Best regards!
> > > >>>>>>>>> Rui Li
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Best regards!
> > > >>>>>>> Rui Li
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Best, Jingsong Lee
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
> > --
> > Best regards!
> > Rui Li
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by godfrey he <go...@gmail.com>.

Hi everyone,

Regarding "table.planner" and "table.execution-mode"
If we define that those two options are just used to initialize the
TableEnvironment, +1 for introducing table options instead of sql-client
options.

Regarding "the sql client, we will maintain two parsers", I want to give
more inputs:
We want to introduce sql-gateway into the Flink project (see FLIP-24 &
FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client and
the gateway service will communicate through Rest API. The " ADD JAR
/local/path/jar " will be executed in the CLI client machine. So when we
submit a sql file which contains multiple statements, the CLI client needs
to pick out the "ADD JAR" line, and also statements need to be submitted or
executed one by one to make sure the result is correct. The sql file may be
look like:

SET xxx=yyy;
create table my_table ...;
create table my_sink ...;
ADD JAR /local/path/jar1;
create function my_udf as com....MyUdf;
insert into my_sink select ..., my_udf(xx) from ...;
REMOVE JAR /local/path/jar1;
drop function my_udf;
ADD JAR /local/path/jar2;
create function my_udf as com....MyUdf2;
insert into my_sink select ..., my_udf(xx) from ...;

The lines need to be splitted into multiple statements first in the CLI
client, there are two approaches:
1. The CLI client depends on the sql-parser: the sql-parser splits the
lines and tells which lines are "ADD JAR".
pro: there is only one parser
cons: It's a little heavy that the CLI client depends on the sql-parser,
because the CLI client is just a simple tool which receives the user
commands and displays the result. The non "ADD JAR" command will be parsed
twice.

2. The CLI client splits the lines into multiple statements and finds the
ADD JAR command through regex matching.
pro: The CLI client is very light-weight.
cons: there are two parsers.

(personally, I prefer the second option)

Regarding "SHOW or LIST JARS", I think we can support them both.
For default dialect, we support SHOW JARS, but if we switch to hive
dialect, LIST JARS is also supported.


[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway

Best,
Godfrey

Rui Li <li...@gmail.com> 于2021年2月4日周四 上午10:40写道：

> Hi guys,
>
> Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> commands than LIST JARS. I don't have a strong opinion about REMOVE vs
> DELETE though.
>
> While flink doesn't need to follow hive syntax, as far as I know, most
> users who are requesting these features are previously hive users. So I
> wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE JARS
> as synonyms? It's just like lots of systems accept both EXIT and QUIT as
> the command to terminate the program. So if that's not hard to achieve, and
> will make users happier, I don't see a reason why we must choose one over
> the other.
>
> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org> wrote:
>
> > Hi everyone,
> >
> > some feedback regarding the open questions. Maybe we can discuss the
> > `TableEnvironment.executeMultiSql` story offline to determine how we
> > proceed with this in the near future.
> >
> > 1) "whether the table environment has the ability to update itself"
> >
> > Maybe there was some misunderstanding. I don't think that we should
> > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > "old")`. Instead I'm proposing to support
> > `TableEnvironment.create(Configuration)` where planner and execution
> > mode are read immediately and a subsequent changes to these options will
> > have no effect. We are doing it similar in `new
> > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > must not be SQL Client specific but can be part of the core table code
> > base. Many users would like to get a 100% preconfigured environment from
> > just Configuration. And this is not possible right now. We can solve
> > both use cases in one change.
> >
> > 2) "the sql client, we will maintain two parsers"
> >
> > I remember we had some discussion about this and decided that we would
> > like to maintain only one parser. In the end it is "One Flink SQL" where
> > commands influence each other also with respect to keywords. It should
> > be fine to include the SQL Client commands in the Flink parser. Of
> > cource the table environment would not be able to handle the `Operation`
> > instance that would be the result but we can introduce hooks to handle
> > those `Operation`s. Or we introduce parser extensions.
> >
> > Can we skip `table.job.async` in the first version? We should further
> > discuss whether we introduce a special SQL clause for wrapping async
> > behavior or if we use a config option? Esp. for streaming queries we
> > need to be careful and should force users to either "one INSERT INTO" or
> > "one STATEMENT SET".
> >
> > 3) 4) "HIVE also uses these commands"
> >
> > In general, Hive is not a good reference. Aligning the commands more
> > with the remaining commands should be our goal. We just had a MODULE
> > discussion where we selected SHOW instead of LIST. But it is true that
> > JARs are not part of the catalog which is why I would not use
> > CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
> > Take a look at the Java collection API as another example.
> >
> > 6) "Most of the commands should belong to the table environment"
> >
> > Thanks for updating the FLIP this makes things easier to understand. It
> > is good to see that most commends will be available in TableEnvironment.
> > However, I would also support SET and RESET for consistency. Again, from
> > an architectural point of view, if we would allow some kind of
> > `Operation` hook in table environment, we could check for SQL Client
> > specific options and forward to regular `TableConfig.getConfiguration`
> > otherwise. What do you think?
> >
> > Regards,
> > Timo
> >
> >
> > On 03.02.21 08:58, Jark Wu wrote:
> > > Hi Timo,
> > >
> > > I will respond some of the questions:
> > >
> > > 1) SQL client specific options
> > >
> > > Whether it starts with "table" or "sql-client" depends on where the
> > > configuration takes effect.
> > > If it is a table configuration, we should make clear what's the
> behavior
> > > when users change
> > > the configuration in the lifecycle of TableEnvironment.
> > >
> > > I agree with Shengkai `sql-client.planner` and
> > `sql-client.execution.mode`
> > > are something special
> > > that can't be changed after TableEnvironment has been initialized. You
> > can
> > > see
> > > `StreamExecutionEnvironment` provides `configure()`  method to override
> > > configuration after
> > > StreamExecutionEnvironment has been initialized.
> > >
> > > Therefore, I think it would be better to still use
> `sql-client.planner`
> > > and `sql-client.execution.mode`.
> > >
> > > 2) Execution file
> > >
> > >>From my point of view, there is a big difference between
> > > `sql-client.job.detach` and
> > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
> > > affect every single DML statement
> > > in the terminal, not only the statements in SQL files. I think the
> single
> > > DML statement in the interactive
> > > terminal is something like tEnv#executeSql() instead of
> > > tEnv#executeMultiSql.
> > > So I don't like the "multi" and "sql" keyword in
> `table.multi-sql-async`.
> > > I just find that runtime provides a configuration called
> > > "execution.attached" [1] which is false by default
> > > which specifies if the pipeline is submitted in attached or detached
> > mode.
> > > It provides exactly the same
> > > functionality of `sql-client.job.detach`. What do you think about using
> > > this option?
> > >
> > > If we also want to support this config in TableEnvironment, I think it
> > > should also affect the DML execution
> > >   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> > > Therefore, the behavior may look like this:
> > >
> > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> > default
> > > tableResult.await()   ==> manually block until finish
> > > tEnv.getConfig().getConfiguration().setString("execution.attached",
> > "true")
> > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't
> > need
> > > to wait on the TableResult
> > > tEnv.executeMultiSql(
> > > """
> > > CREATE TABLE ....  ==> always sync
> > > INSERT INTO ...  => sync, because we set configuration above
> > > SET execution.attached = false;
> > > INSERT INTO ...  => async
> > > """)
> > >
> > > On the other hand, I think `sql-client.job.detach`
> > > and `TableEnvironment.executeMultiSql()` should be two separate topics,
> > > as Shengkai mentioned above, SQL CLI only depends on
> > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > I'm fine with making `executeMultiSql()` clear but don't want it to
> block
> > > this FLIP, maybe we can discuss this in another thread.
> > >
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > >
> > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com> wrote:
> > >
> > >> Hi, Timo.
> > >> Thanks for your detailed feedback. I have some thoughts about your
> > >> feedback.
> > >>
> > >> *Regarding #1*: I think the main problem is whether the table
> > environment
> > >> has the ability to update itself. Let's take a simple program as an
> > >> example.
> > >>
> > >>
> > >> ```
> > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > >>
> > >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> > >>
> > >>
> > >> tEnv.executeSql("...");
> > >>
> > >> ```
> > >>
> > >> If we regard this option as a table option, users don't have to create
> > >> another table environment manually. In that case, tEnv needs to check
> > >> whether the current mode and planner are the same as before when
> > executeSql
> > >> or explainSql. I don't think it's easy work for the table environment,
> > >> especially if users have a StreamExecutionEnvironment but set old
> > planner
> > >> and batch mode. But when we make this option as a sql client option,
> > users
> > >> only use the SET command to change the setting. We can rebuild a new
> > table
> > >> environment when set successes.
> > >>
> > >>
> > >> *Regarding #2*: I think we need to discuss the implementation before
> > >> continuing this topic. In the sql client, we will maintain two
> parsers.
> > The
> > >> first parser(client parser) will only match the sql client commands.
> If
> > the
> > >> client parser can't parse the statement, we will leverage the power of
> > the
> > >> table environment to execute. According to our blueprint,
> > >> TableEnvironment#executeSql is enough for the sql client. Therefore,
> > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > >>
> > >> But if we need to introduce the `TableEnvironment.executeMultiSql` in
> > the
> > >> future, I think it's OK to use the option `table.multi-sql-async`
> rather
> > >> than option `sql-client.job.detach`. But we think the name is not
> > suitable
> > >> because the name is confusing for others. When setting the option
> > false, we
> > >> just mean it will block the execution of the INSERT INTO statement,
> not
> > DDL
> > >> or others(other sql statements are always executed synchronously). So
> > how
> > >> about `table.job.async`? It only works for the sql-client and the
> > >> executeMultiSql. If we set this value false, the table environment
> will
> > >> return the result until the job finishes.
> > >>
> > >>
> > >> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST
> JAR
> > >> because HIVE also uses these commands to add the jar into the
> classpath
> > or
> > >> delete the jar. If we use  such commands, it can reduce our work for
> > hive
> > >> compatibility.
> > >>
> > >> For SHOW JAR, I think the main concern is the jars are not maintained
> by
> > >> the Catalog. If we really needs to keep consistent with SQL grammar,
> > maybe
> > >> we should use
> > >>
> > >> `ADD JAR` -> `CREATE JAR`,
> > >> `DELETE JAR` -> `DROP JAR`,
> > >> `LIST JAR` -> `SHOW JAR`.
> > >>
> > >> *Regarding #5*: I agree with you that we'd better keep consistent.
> > >>
> > >> *Regarding #6*: Yes. Most of the commands should belong to the table
> > >> environment. In the Summary section, I use the <NOTE> tag to identify
> > which
> > >> commands should belong to the sql client and which commands should
> > belong
> > >> to the table environment. I also add a new section about
> implementation
> > >> details in the FLIP.
> > >>
> > >> Best,
> > >> Shengkai
> > >>
> > >> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> > >>
> > >>> Thanks for this great proposal Shengkai. This will give the SQL
> Client
> > a
> > >>> very good update and make it production ready.
> > >>>
> > >>> Here is some feedback from my side:
> > >>>
> > >>> 1) SQL client specific options
> > >>>
> > >>> I don't think that `sql-client.planner` and
> `sql-client.execution.mode`
> > >>> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> > >>> `ExecutionConfig#configure` that have been added recently, we should
> > >>> offer a possibility for TableEnvironment. How about we offer
> > >>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner`
> and
> > >>> `table.execution-mode` to
> > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > >>>
> > >>> 2) Execution file
> > >>>
> > >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> > mailing
> > >>> list thread at that time? Could you further elaborate how the
> > >>> multi-statement execution should work for a unified batch/streaming
> > >>> story? According to our past discussions, each line in an execution
> > file
> > >>> should be executed blocking which means a streaming query needs a
> > >>> statement set to execute multiple INSERT INTO statement, correct? We
> > >>> should also offer this functionality in
> > >>> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach`
> > is
> > >>> SQL Client specific needs to be determined, it could also be a
> general
> > >>> `table.multi-sql-async` option?
> > >>>
> > >>> 3) DELETE JAR
> > >>>
> > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one
> > is
> > >>> actively deleting the JAR in the corresponding path.
> > >>>
> > >>> 4) LIST JAR
> > >>>
> > >>> This should be `SHOW JARS` according to other SQL commands such as
> > `SHOW
> > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > >>>
> > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > >>>
> > >>> We should keep the details in sync with
> > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> > >>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> > >>> instead of `COST`. I'm sure the original author had a reason why to
> > call
> > >>> it that way.
> > >>>
> > >>> 6) Implementation details
> > >>>
> > >>> It would be nice to understand how we plan to implement the given
> > >>> features. Most of the commands and config options should go into
> > >>> TableEnvironment and SqlParser directly, correct? This way users
> have a
> > >>> unified way of using Flink SQL. TableEnvironment would provide a
> > similar
> > >>> user experience in notebooks or interactive programs than the SQL
> > Client.
> > >>>
> > >>> [1]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>> [2]
> > >>>
> > >>>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > >>>
> > >>> Regards,
> > >>> Timo
> > >>>
> > >>>
> > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> `UNSET`.
> > >>>>
> > >>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > >>>>
> > >>>>> Hi, Jingsong.
> > >>>>>
> > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > >>>>>
> > >>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
> > >>>>> supported in the current sql client now. Our proposal just extends
> > its
> > >>>>> grammar and allow users to reset the specified keys.
> > >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> > >>> value[1].
> > >>>>> I think it is more friendly for batch users.
> > >>>>>
> > >>>>> Best,
> > >>>>> Shengkai
> > >>>>>
> > >>>>> [1]
> > >>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > >>>>>
> > >>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > >>>>>
> > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> > >>>>>> improving it.
> > >>>>>>
> > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Jingsong
> > >>>>>>
> > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com>
> > wrote:
> > >>>>>>
> > >>>>>>> Thanks Shengkai for the update! The proposed changes look good to
> > >> me.
> > >>>>>>>
> > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fskmine@gmail.com
> >
> > >>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi, Rui.
> > >>>>>>>> You are right. I have already modified the FLIP.
> > >>>>>>>>
> > >>>>>>>> The main changes:
> > >>>>>>>>
> > >>>>>>>> # -f parameter has no restriction about the statement type.
> > >>>>>>>> Sometimes, users use the pipe to redirect the result of queries
> to
> > >>>>>>> debug
> > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > comparing
> > >>> to
> > >>>>>>>> writing INSERT INTO statements.
> > >>>>>>>>
> > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > >>>>>>>> Users prefer to execute jobs one by one in the batch mode. Users
> > >> can
> > >>>>>>> set
> > >>>>>>>> this option false and the client will process the next job until
> > >> the
> > >>>>>>>> current job finishes. The default value of this option is false,
> > >>> which
> > >>>>>>>> means the client will execute the next job when the current job
> is
> > >>>>>>>> submitted.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Shengkai
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > >>>>>>>>
> > >>>>>>>>> Hi Shengkai,
> > >>>>>>>>>
> > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > >> different
> > >>>>>>>>> implications, and we should clarify the behavior. For example,
> if
> > >>> the
> > >>>>>>>>> client just submits the job and exits, what happens if the file
> > >>>>>>> contains
> > >>>>>>>>> two INSERT statements? I don't think we should treat them as a
> > >>>>>>> statement
> > >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT SET
> in
> > >>> that
> > >>>>>>>>> case. And the client shouldn't asynchronously submit the two
> > jobs,
> > >>>>>>> because
> > >>>>>>>>> the 2nd may depend on the 1st, right?
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> fskmine@gmail.com
> > >
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi Rui,
> > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > >>>>>>>>>>
> > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> > >>>>>>> command. In
> > >>>>>>>>>> the implementation, it will just put the key-value into the
> > >>>>>>>>>> `Configuration`, which will be used to generate the table
> > config.
> > >>> If
> > >>>>>>> hive
> > >>>>>>>>>> supports to read the setting from the table config, users are
> > >> able
> > >>>>>>> to set
> > >>>>>>>>>> the hive-related settings.
> > >>>>>>>>>>
> > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job and
> > >>> exit.
> > >>>>>>> If
> > >>>>>>>>>> the queries never end, users have to cancel the job by
> > >> themselves,
> > >>>>>>> which is
> > >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> > queries
> > >>>>>>> are used
> > >>>>>>>>>> to analyze the data. Users should use queries in the
> interactive
> > >>>>>>> mode.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Shengkai
> > >>>>>>>>>>
> > >>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > >>>>>>>>>>
> > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> > >>> covers a
> > >>>>>>>>>>> lot of useful features which will dramatically improve the
> > >>>>>>> usability of our
> > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1. Do you think we can let users set arbitrary configurations
> > >> via
> > >>>>>>> the
> > >>>>>>>>>>> SET command? A connector may have its own configurations and
> we
> > >>>>>>> don't have
> > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> Client.
> > >> For
> > >>>>>>> example,
> > >>>>>>>>>>> users may want to be able to change hive conf when using hive
> > >>>>>>> connector [1].
> > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> > >> specified
> > >>>>>>> with
> > >>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
> > >>> queries
> > >>>>>>> in the
> > >>>>>>>>>>> file. And a common use case is to run some query and redirect
> > >> the
> > >>>>>>> results
> > >>>>>>>>>>> to a file. So I think maybe flink users would like to do the
> > >> same,
> > >>>>>>>>>>> especially in batch scenarios.
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > >>>>>>> liuyang0704@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Shengkai,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> > >>>>>>> suggestions:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client collect
> > >> the
> > >>>>>>>>>>>> results
> > >>>>>>>>>>>> locally all at once using accumulators at present,
> > >>>>>>>>>>>>         which may have memory issues in JM or Local for the
> > big
> > >>> query
> > >>>>>>>>>>>> result.
> > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > >>>>>>>>>>>>         We may change to use SelectTableSink, which is based
> > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> > >> FLIP-91.
> > >>>>>>> Seems
> > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > >>>>>>>>>>>>         Provide a long running service out of the box to
> > >>> facilitate
> > >>>>>>> the
> > >>>>>>>>>>>> sql
> > >>>>>>>>>>>> submission is necessary.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> What do you think of these?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi devs,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> > >> Client
> > >>>>>>>>>>>>> Improvements.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Many users have complained about the problems of the sql
> > >> client.
> > >>>>>>> For
> > >>>>>>>>>>>>> example, users can not register the table proposed by
> > FLIP-95.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The main changes in this FLIP:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize
> the
> > >>>>>>> table
> > >>>>>>>>>>>>> environment and deprecated YAML file;
> > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > >>>>>>>>>>>>> - support statement set syntax;
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Look forward to your feedback.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *With kind regards
> > >>>>>>>>>>>> ------------------------------------------------------------
> > >>>>>>>>>>>> Sebastian Liu 刘洋
> > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> Science
> > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > >>>>>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> > >>>>>>>>>>>> QQ: 3239559*
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>> Best regards!
> > >>>>>>>>>>> Rui Li
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Best regards!
> > >>>>>>>>> Rui Li
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Best regards!
> > >>>>>>> Rui Li
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best, Jingsong Lee
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> >
>
> --
> Best regards!
> Rui Li
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Rui Li <li...@gmail.com>.

Hi guys,

Regarding #3 and #4, I agree SHOW JARS is more consistent with other
commands than LIST JARS. I don't have a strong opinion about REMOVE vs
DELETE though.

While flink doesn't need to follow hive syntax, as far as I know, most
users who are requesting these features are previously hive users. So I
wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE JARS
as synonyms? It's just like lots of systems accept both EXIT and QUIT as
the command to terminate the program. So if that's not hard to achieve, and
will make users happier, I don't see a reason why we must choose one over
the other.

On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <tw...@apache.org> wrote:

> Hi everyone,
>
> some feedback regarding the open questions. Maybe we can discuss the
> `TableEnvironment.executeMultiSql` story offline to determine how we
> proceed with this in the near future.
>
> 1) "whether the table environment has the ability to update itself"
>
> Maybe there was some misunderstanding. I don't think that we should
> support `tEnv.getConfig.getConfiguration.setString("table.planner",
> "old")`. Instead I'm proposing to support
> `TableEnvironment.create(Configuration)` where planner and execution
> mode are read immediately and a subsequent changes to these options will
> have no effect. We are doing it similar in `new
> StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> must not be SQL Client specific but can be part of the core table code
> base. Many users would like to get a 100% preconfigured environment from
> just Configuration. And this is not possible right now. We can solve
> both use cases in one change.
>
> 2) "the sql client, we will maintain two parsers"
>
> I remember we had some discussion about this and decided that we would
> like to maintain only one parser. In the end it is "One Flink SQL" where
> commands influence each other also with respect to keywords. It should
> be fine to include the SQL Client commands in the Flink parser. Of
> cource the table environment would not be able to handle the `Operation`
> instance that would be the result but we can introduce hooks to handle
> those `Operation`s. Or we introduce parser extensions.
>
> Can we skip `table.job.async` in the first version? We should further
> discuss whether we introduce a special SQL clause for wrapping async
> behavior or if we use a config option? Esp. for streaming queries we
> need to be careful and should force users to either "one INSERT INTO" or
> "one STATEMENT SET".
>
> 3) 4) "HIVE also uses these commands"
>
> In general, Hive is not a good reference. Aligning the commands more
> with the remaining commands should be our goal. We just had a MODULE
> discussion where we selected SHOW instead of LIST. But it is true that
> JARs are not part of the catalog which is why I would not use
> CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
> Take a look at the Java collection API as another example.
>
> 6) "Most of the commands should belong to the table environment"
>
> Thanks for updating the FLIP this makes things easier to understand. It
> is good to see that most commends will be available in TableEnvironment.
> However, I would also support SET and RESET for consistency. Again, from
> an architectural point of view, if we would allow some kind of
> `Operation` hook in table environment, we could check for SQL Client
> specific options and forward to regular `TableConfig.getConfiguration`
> otherwise. What do you think?
>
> Regards,
> Timo
>
>
> On 03.02.21 08:58, Jark Wu wrote:
> > Hi Timo,
> >
> > I will respond some of the questions:
> >
> > 1) SQL client specific options
> >
> > Whether it starts with "table" or "sql-client" depends on where the
> > configuration takes effect.
> > If it is a table configuration, we should make clear what's the behavior
> > when users change
> > the configuration in the lifecycle of TableEnvironment.
> >
> > I agree with Shengkai `sql-client.planner` and
> `sql-client.execution.mode`
> > are something special
> > that can't be changed after TableEnvironment has been initialized. You
> can
> > see
> > `StreamExecutionEnvironment` provides `configure()`  method to override
> > configuration after
> > StreamExecutionEnvironment has been initialized.
> >
> > Therefore, I think it would be better to still use  `sql-client.planner`
> > and `sql-client.execution.mode`.
> >
> > 2) Execution file
> >
> >>From my point of view, there is a big difference between
> > `sql-client.job.detach` and
> > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
> > affect every single DML statement
> > in the terminal, not only the statements in SQL files. I think the single
> > DML statement in the interactive
> > terminal is something like tEnv#executeSql() instead of
> > tEnv#executeMultiSql.
> > So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
> > I just find that runtime provides a configuration called
> > "execution.attached" [1] which is false by default
> > which specifies if the pipeline is submitted in attached or detached
> mode.
> > It provides exactly the same
> > functionality of `sql-client.job.detach`. What do you think about using
> > this option?
> >
> > If we also want to support this config in TableEnvironment, I think it
> > should also affect the DML execution
> >   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> > Therefore, the behavior may look like this:
> >
> > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> default
> > tableResult.await()   ==> manually block until finish
> > tEnv.getConfig().getConfiguration().setString("execution.attached",
> "true")
> > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't
> need
> > to wait on the TableResult
> > tEnv.executeMultiSql(
> > """
> > CREATE TABLE ....  ==> always sync
> > INSERT INTO ...  => sync, because we set configuration above
> > SET execution.attached = false;
> > INSERT INTO ...  => async
> > """)
> >
> > On the other hand, I think `sql-client.job.detach`
> > and `TableEnvironment.executeMultiSql()` should be two separate topics,
> > as Shengkai mentioned above, SQL CLI only depends on
> > `TableEnvironment#executeSql()` to support multi-line statements.
> > I'm fine with making `executeMultiSql()` clear but don't want it to block
> > this FLIP, maybe we can discuss this in another thread.
> >
> >
> > Best,
> > Jark
> >
> > [1]:
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> >
> > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com> wrote:
> >
> >> Hi, Timo.
> >> Thanks for your detailed feedback. I have some thoughts about your
> >> feedback.
> >>
> >> *Regarding #1*: I think the main problem is whether the table
> environment
> >> has the ability to update itself. Let's take a simple program as an
> >> example.
> >>
> >>
> >> ```
> >> TableEnvironment tEnv = TableEnvironment.create(...);
> >>
> >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> >>
> >>
> >> tEnv.executeSql("...");
> >>
> >> ```
> >>
> >> If we regard this option as a table option, users don't have to create
> >> another table environment manually. In that case, tEnv needs to check
> >> whether the current mode and planner are the same as before when
> executeSql
> >> or explainSql. I don't think it's easy work for the table environment,
> >> especially if users have a StreamExecutionEnvironment but set old
> planner
> >> and batch mode. But when we make this option as a sql client option,
> users
> >> only use the SET command to change the setting. We can rebuild a new
> table
> >> environment when set successes.
> >>
> >>
> >> *Regarding #2*: I think we need to discuss the implementation before
> >> continuing this topic. In the sql client, we will maintain two parsers.
> The
> >> first parser(client parser) will only match the sql client commands. If
> the
> >> client parser can't parse the statement, we will leverage the power of
> the
> >> table environment to execute. According to our blueprint,
> >> TableEnvironment#executeSql is enough for the sql client. Therefore,
> >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> >>
> >> But if we need to introduce the `TableEnvironment.executeMultiSql` in
> the
> >> future, I think it's OK to use the option `table.multi-sql-async` rather
> >> than option `sql-client.job.detach`. But we think the name is not
> suitable
> >> because the name is confusing for others. When setting the option
> false, we
> >> just mean it will block the execution of the INSERT INTO statement, not
> DDL
> >> or others(other sql statements are always executed synchronously). So
> how
> >> about `table.job.async`? It only works for the sql-client and the
> >> executeMultiSql. If we set this value false, the table environment will
> >> return the result until the job finishes.
> >>
> >>
> >> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
> >> because HIVE also uses these commands to add the jar into the classpath
> or
> >> delete the jar. If we use  such commands, it can reduce our work for
> hive
> >> compatibility.
> >>
> >> For SHOW JAR, I think the main concern is the jars are not maintained by
> >> the Catalog. If we really needs to keep consistent with SQL grammar,
> maybe
> >> we should use
> >>
> >> `ADD JAR` -> `CREATE JAR`,
> >> `DELETE JAR` -> `DROP JAR`,
> >> `LIST JAR` -> `SHOW JAR`.
> >>
> >> *Regarding #5*: I agree with you that we'd better keep consistent.
> >>
> >> *Regarding #6*: Yes. Most of the commands should belong to the table
> >> environment. In the Summary section, I use the <NOTE> tag to identify
> which
> >> commands should belong to the sql client and which commands should
> belong
> >> to the table environment. I also add a new section about implementation
> >> details in the FLIP.
> >>
> >> Best,
> >> Shengkai
> >>
> >> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
> >>
> >>> Thanks for this great proposal Shengkai. This will give the SQL Client
> a
> >>> very good update and make it production ready.
> >>>
> >>> Here is some feedback from my side:
> >>>
> >>> 1) SQL client specific options
> >>>
> >>> I don't think that `sql-client.planner` and `sql-client.execution.mode`
> >>> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> >>> `ExecutionConfig#configure` that have been added recently, we should
> >>> offer a possibility for TableEnvironment. How about we offer
> >>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
> >>> `table.execution-mode` to
> >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> >>>
> >>> 2) Execution file
> >>>
> >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> mailing
> >>> list thread at that time? Could you further elaborate how the
> >>> multi-statement execution should work for a unified batch/streaming
> >>> story? According to our past discussions, each line in an execution
> file
> >>> should be executed blocking which means a streaming query needs a
> >>> statement set to execute multiple INSERT INTO statement, correct? We
> >>> should also offer this functionality in
> >>> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach`
> is
> >>> SQL Client specific needs to be determined, it could also be a general
> >>> `table.multi-sql-async` option?
> >>>
> >>> 3) DELETE JAR
> >>>
> >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one
> is
> >>> actively deleting the JAR in the corresponding path.
> >>>
> >>> 4) LIST JAR
> >>>
> >>> This should be `SHOW JARS` according to other SQL commands such as
> `SHOW
> >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> >>>
> >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >>>
> >>> We should keep the details in sync with
> >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> >>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> >>> instead of `COST`. I'm sure the original author had a reason why to
> call
> >>> it that way.
> >>>
> >>> 6) Implementation details
> >>>
> >>> It would be nice to understand how we plan to implement the given
> >>> features. Most of the commands and config options should go into
> >>> TableEnvironment and SqlParser directly, correct? This way users have a
> >>> unified way of using Flink SQL. TableEnvironment would provide a
> similar
> >>> user experience in notebooks or interactive programs than the SQL
> Client.
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>> [2]
> >>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 02.02.21 10:13, Shengkai Fang wrote:
> >>>> Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> >>>>
> >>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> >>>>
> >>>>> Hi, Jingsong.
> >>>>>
> >>>>> Thanks for your reply. I think `UNSET` is much better.
> >>>>>
> >>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
> >>>>> supported in the current sql client now. Our proposal just extends
> its
> >>>>> grammar and allow users to reset the specified keys.
> >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> >>> value[1].
> >>>>> I think it is more friendly for batch users.
> >>>>>
> >>>>> Best,
> >>>>> Shengkai
> >>>>>
> >>>>> [1]
> >>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>>>>
> >>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> >>>>>
> >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> >>>>>> improving it.
> >>>>>>
> >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong
> >>>>>>
> >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com>
> wrote:
> >>>>>>
> >>>>>>> Thanks Shengkai for the update! The proposed changes look good to
> >> me.
> >>>>>>>
> >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi, Rui.
> >>>>>>>> You are right. I have already modified the FLIP.
> >>>>>>>>
> >>>>>>>> The main changes:
> >>>>>>>>
> >>>>>>>> # -f parameter has no restriction about the statement type.
> >>>>>>>> Sometimes, users use the pipe to redirect the result of queries to
> >>>>>>> debug
> >>>>>>>> when submitting job by -f parameter. It's much convenient
> comparing
> >>> to
> >>>>>>>> writing INSERT INTO statements.
> >>>>>>>>
> >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> >>>>>>>> Users prefer to execute jobs one by one in the batch mode. Users
> >> can
> >>>>>>> set
> >>>>>>>> this option false and the client will process the next job until
> >> the
> >>>>>>>> current job finishes. The default value of this option is false,
> >>> which
> >>>>>>>> means the client will execute the next job when the current job is
> >>>>>>>> submitted.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Shengkai
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> >>>>>>>>
> >>>>>>>>> Hi Shengkai,
> >>>>>>>>>
> >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> >> different
> >>>>>>>>> implications, and we should clarify the behavior. For example, if
> >>> the
> >>>>>>>>> client just submits the job and exits, what happens if the file
> >>>>>>> contains
> >>>>>>>>> two INSERT statements? I don't think we should treat them as a
> >>>>>>> statement
> >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
> >>> that
> >>>>>>>>> case. And the client shouldn't asynchronously submit the two
> jobs,
> >>>>>>> because
> >>>>>>>>> the 2nd may depend on the 1st, right?
> >>>>>>>>>
> >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fskmine@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Rui,
> >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> >>>>>>>>>>
> >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> >>>>>>> command. In
> >>>>>>>>>> the implementation, it will just put the key-value into the
> >>>>>>>>>> `Configuration`, which will be used to generate the table
> config.
> >>> If
> >>>>>>> hive
> >>>>>>>>>> supports to read the setting from the table config, users are
> >> able
> >>>>>>> to set
> >>>>>>>>>> the hive-related settings.
> >>>>>>>>>>
> >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job and
> >>> exit.
> >>>>>>> If
> >>>>>>>>>> the queries never end, users have to cancel the job by
> >> themselves,
> >>>>>>> which is
> >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> queries
> >>>>>>> are used
> >>>>>>>>>> to analyze the data. Users should use queries in the interactive
> >>>>>>> mode.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Shengkai
> >>>>>>>>>>
> >>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> >>>>>>>>>>
> >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> >>> covers a
> >>>>>>>>>>> lot of useful features which will dramatically improve the
> >>>>>>> usability of our
> >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Do you think we can let users set arbitrary configurations
> >> via
> >>>>>>> the
> >>>>>>>>>>> SET command? A connector may have its own configurations and we
> >>>>>>> don't have
> >>>>>>>>>>> a way to dynamically change such configurations in SQL Client.
> >> For
> >>>>>>> example,
> >>>>>>>>>>> users may want to be able to change hive conf when using hive
> >>>>>>> connector [1].
> >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> >> specified
> >>>>>>> with
> >>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
> >>> queries
> >>>>>>> in the
> >>>>>>>>>>> file. And a common use case is to run some query and redirect
> >> the
> >>>>>>> results
> >>>>>>>>>>> to a file. So I think maybe flink users would like to do the
> >> same,
> >>>>>>>>>>> especially in batch scenarios.
> >>>>>>>>>>>
> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>>>>> liuyang0704@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> >>>>>>> suggestions:
> >>>>>>>>>>>>
> >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client collect
> >> the
> >>>>>>>>>>>> results
> >>>>>>>>>>>> locally all at once using accumulators at present,
> >>>>>>>>>>>>         which may have memory issues in JM or Local for the
> big
> >>> query
> >>>>>>>>>>>> result.
> >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> >>>>>>>>>>>>         We may change to use SelectTableSink, which is based
> >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> >> FLIP-91.
> >>>>>>> Seems
> >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> >>>>>>>>>>>>         Provide a long running service out of the box to
> >>> facilitate
> >>>>>>> the
> >>>>>>>>>>>> sql
> >>>>>>>>>>>> submission is necessary.
> >>>>>>>>>>>>
> >>>>>>>>>>>> What do you think of these?
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> >> Client
> >>>>>>>>>>>>> Improvements.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Many users have complained about the problems of the sql
> >> client.
> >>>>>>> For
> >>>>>>>>>>>>> example, users can not register the table proposed by
> FLIP-95.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize the
> >>>>>>> table
> >>>>>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>>
> >>>>>>>>>>>> *With kind regards
> >>>>>>>>>>>> ------------------------------------------------------------
> >>>>>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
> >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> >>>>>>>>>>>> QQ: 3239559*
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Best regards!
> >>>>>>>>>>> Rui Li
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Best regards!
> >>>>>>>>> Rui Li
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards!
> >>>>>>> Rui Li
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best, Jingsong Lee
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

-- 
Best regards!
Rui Li

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Hi everyone,

some feedback regarding the open questions. Maybe we can discuss the 
`TableEnvironment.executeMultiSql` story offline to determine how we 
proceed with this in the near future.

1) "whether the table environment has the ability to update itself"

Maybe there was some misunderstanding. I don't think that we should 
support `tEnv.getConfig.getConfiguration.setString("table.planner", 
"old")`. Instead I'm proposing to support 
`TableEnvironment.create(Configuration)` where planner and execution 
mode are read immediately and a subsequent changes to these options will 
have no effect. We are doing it similar in `new 
StreamExecutionEnvironment(Configuration)`. These two ConfigOption's 
must not be SQL Client specific but can be part of the core table code 
base. Many users would like to get a 100% preconfigured environment from 
just Configuration. And this is not possible right now. We can solve 
both use cases in one change.

2) "the sql client, we will maintain two parsers"

I remember we had some discussion about this and decided that we would 
like to maintain only one parser. In the end it is "One Flink SQL" where 
commands influence each other also with respect to keywords. It should 
be fine to include the SQL Client commands in the Flink parser. Of 
cource the table environment would not be able to handle the `Operation` 
instance that would be the result but we can introduce hooks to handle 
those `Operation`s. Or we introduce parser extensions.

Can we skip `table.job.async` in the first version? We should further 
discuss whether we introduce a special SQL clause for wrapping async 
behavior or if we use a config option? Esp. for streaming queries we 
need to be careful and should force users to either "one INSERT INTO" or 
"one STATEMENT SET".

3) 4) "HIVE also uses these commands"

In general, Hive is not a good reference. Aligning the commands more 
with the remaining commands should be our goal. We just had a MODULE 
discussion where we selected SHOW instead of LIST. But it is true that 
JARs are not part of the catalog which is why I would not use 
CREATE/DROP. ADD/REMOVE are commonly siblings in the English language. 
Take a look at the Java collection API as another example.

6) "Most of the commands should belong to the table environment"

Thanks for updating the FLIP this makes things easier to understand. It 
is good to see that most commends will be available in TableEnvironment. 
However, I would also support SET and RESET for consistency. Again, from 
an architectural point of view, if we would allow some kind of 
`Operation` hook in table environment, we could check for SQL Client 
specific options and forward to regular `TableConfig.getConfiguration` 
otherwise. What do you think?

Regards,
Timo


On 03.02.21 08:58, Jark Wu wrote:
> Hi Timo,
> 
> I will respond some of the questions:
> 
> 1) SQL client specific options
> 
> Whether it starts with "table" or "sql-client" depends on where the
> configuration takes effect.
> If it is a table configuration, we should make clear what's the behavior
> when users change
> the configuration in the lifecycle of TableEnvironment.
> 
> I agree with Shengkai `sql-client.planner` and `sql-client.execution.mode`
> are something special
> that can't be changed after TableEnvironment has been initialized. You can
> see
> `StreamExecutionEnvironment` provides `configure()`  method to override
> configuration after
> StreamExecutionEnvironment has been initialized.
> 
> Therefore, I think it would be better to still use  `sql-client.planner`
> and `sql-client.execution.mode`.
> 
> 2) Execution file
> 
>>From my point of view, there is a big difference between
> `sql-client.job.detach` and
> `TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
> affect every single DML statement
> in the terminal, not only the statements in SQL files. I think the single
> DML statement in the interactive
> terminal is something like tEnv#executeSql() instead of
> tEnv#executeMultiSql.
> So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
> I just find that runtime provides a configuration called
> "execution.attached" [1] which is false by default
> which specifies if the pipeline is submitted in attached or detached mode.
> It provides exactly the same
> functionality of `sql-client.job.detach`. What do you think about using
> this option?
> 
> If we also want to support this config in TableEnvironment, I think it
> should also affect the DML execution
>   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> Therefore, the behavior may look like this:
> 
> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by default
> tableResult.await()   ==> manually block until finish
> tEnv.getConfig().getConfiguration().setString("execution.attached", "true")
> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't need
> to wait on the TableResult
> tEnv.executeMultiSql(
> """
> CREATE TABLE ....  ==> always sync
> INSERT INTO ...  => sync, because we set configuration above
> SET execution.attached = false;
> INSERT INTO ...  => async
> """)
> 
> On the other hand, I think `sql-client.job.detach`
> and `TableEnvironment.executeMultiSql()` should be two separate topics,
> as Shengkai mentioned above, SQL CLI only depends on
> `TableEnvironment#executeSql()` to support multi-line statements.
> I'm fine with making `executeMultiSql()` clear but don't want it to block
> this FLIP, maybe we can discuss this in another thread.
> 
> 
> Best,
> Jark
> 
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> 
> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com> wrote:
> 
>> Hi, Timo.
>> Thanks for your detailed feedback. I have some thoughts about your
>> feedback.
>>
>> *Regarding #1*: I think the main problem is whether the table environment
>> has the ability to update itself. Let's take a simple program as an
>> example.
>>
>>
>> ```
>> TableEnvironment tEnv = TableEnvironment.create(...);
>>
>> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
>>
>>
>> tEnv.executeSql("...");
>>
>> ```
>>
>> If we regard this option as a table option, users don't have to create
>> another table environment manually. In that case, tEnv needs to check
>> whether the current mode and planner are the same as before when executeSql
>> or explainSql. I don't think it's easy work for the table environment,
>> especially if users have a StreamExecutionEnvironment but set old planner
>> and batch mode. But when we make this option as a sql client option, users
>> only use the SET command to change the setting. We can rebuild a new table
>> environment when set successes.
>>
>>
>> *Regarding #2*: I think we need to discuss the implementation before
>> continuing this topic. In the sql client, we will maintain two parsers. The
>> first parser(client parser) will only match the sql client commands. If the
>> client parser can't parse the statement, we will leverage the power of the
>> table environment to execute. According to our blueprint,
>> TableEnvironment#executeSql is enough for the sql client. Therefore,
>> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>>
>> But if we need to introduce the `TableEnvironment.executeMultiSql` in the
>> future, I think it's OK to use the option `table.multi-sql-async` rather
>> than option `sql-client.job.detach`. But we think the name is not suitable
>> because the name is confusing for others. When setting the option false, we
>> just mean it will block the execution of the INSERT INTO statement, not DDL
>> or others(other sql statements are always executed synchronously). So how
>> about `table.job.async`? It only works for the sql-client and the
>> executeMultiSql. If we set this value false, the table environment will
>> return the result until the job finishes.
>>
>>
>> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
>> because HIVE also uses these commands to add the jar into the classpath or
>> delete the jar. If we use  such commands, it can reduce our work for hive
>> compatibility.
>>
>> For SHOW JAR, I think the main concern is the jars are not maintained by
>> the Catalog. If we really needs to keep consistent with SQL grammar, maybe
>> we should use
>>
>> `ADD JAR` -> `CREATE JAR`,
>> `DELETE JAR` -> `DROP JAR`,
>> `LIST JAR` -> `SHOW JAR`.
>>
>> *Regarding #5*: I agree with you that we'd better keep consistent.
>>
>> *Regarding #6*: Yes. Most of the commands should belong to the table
>> environment. In the Summary section, I use the <NOTE> tag to identify which
>> commands should belong to the sql client and which commands should belong
>> to the table environment. I also add a new section about implementation
>> details in the FLIP.
>>
>> Best,
>> Shengkai
>>
>> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>>
>>> Thanks for this great proposal Shengkai. This will give the SQL Client a
>>> very good update and make it production ready.
>>>
>>> Here is some feedback from my side:
>>>
>>> 1) SQL client specific options
>>>
>>> I don't think that `sql-client.planner` and `sql-client.execution.mode`
>>> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
>>> `ExecutionConfig#configure` that have been added recently, we should
>>> offer a possibility for TableEnvironment. How about we offer
>>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
>>> `table.execution-mode` to
>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>>
>>> 2) Execution file
>>>
>>> Did you have a look at the Appendix of FLIP-84 [1] including the mailing
>>> list thread at that time? Could you further elaborate how the
>>> multi-statement execution should work for a unified batch/streaming
>>> story? According to our past discussions, each line in an execution file
>>> should be executed blocking which means a streaming query needs a
>>> statement set to execute multiple INSERT INTO statement, correct? We
>>> should also offer this functionality in
>>> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
>>> SQL Client specific needs to be determined, it could also be a general
>>> `table.multi-sql-async` option?
>>>
>>> 3) DELETE JAR
>>>
>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
>>> actively deleting the JAR in the corresponding path.
>>>
>>> 4) LIST JAR
>>>
>>> This should be `SHOW JARS` according to other SQL commands such as `SHOW
>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>>
>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>>
>>> We should keep the details in sync with
>>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
>>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
>>> instead of `COST`. I'm sure the original author had a reason why to call
>>> it that way.
>>>
>>> 6) Implementation details
>>>
>>> It would be nice to understand how we plan to implement the given
>>> features. Most of the commands and config options should go into
>>> TableEnvironment and SqlParser directly, correct? This way users have a
>>> unified way of using Flink SQL. TableEnvironment would provide a similar
>>> user experience in notebooks or interactive programs than the SQL Client.
>>>
>>> [1]
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>> [2]
>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>>> Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
>>>>
>>>> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
>>>>
>>>>> Hi, Jingsong.
>>>>>
>>>>> Thanks for your reply. I think `UNSET` is much better.
>>>>>
>>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
>>>>> supported in the current sql client now. Our proposal just extends its
>>>>> grammar and allow users to reset the specified keys.
>>>>> 2. Hive beeline also uses `RESET` to set the key to the default
>>> value[1].
>>>>> I think it is more friendly for batch users.
>>>>>
>>>>> Best,
>>>>> Shengkai
>>>>>
>>>>> [1]
>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>>>>
>>>>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
>>>>>
>>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
>>>>>> improving it.
>>>>>>
>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>>>>
>>>>>> Best,
>>>>>> Jingsong
>>>>>>
>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Shengkai for the update! The proposed changes look good to
>> me.
>>>>>>>
>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com>
>>> wrote:
>>>>>>>
>>>>>>>> Hi, Rui.
>>>>>>>> You are right. I have already modified the FLIP.
>>>>>>>>
>>>>>>>> The main changes:
>>>>>>>>
>>>>>>>> # -f parameter has no restriction about the statement type.
>>>>>>>> Sometimes, users use the pipe to redirect the result of queries to
>>>>>>> debug
>>>>>>>> when submitting job by -f parameter. It's much convenient comparing
>>> to
>>>>>>>> writing INSERT INTO statements.
>>>>>>>>
>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
>>>>>>>> Users prefer to execute jobs one by one in the batch mode. Users
>> can
>>>>>>> set
>>>>>>>> this option false and the client will process the next job until
>> the
>>>>>>>> current job finishes. The default value of this option is false,
>>> which
>>>>>>>> means the client will execute the next job when the current job is
>>>>>>>> submitted.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Shengkai
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>>>>>>>>
>>>>>>>>> Hi Shengkai,
>>>>>>>>>
>>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
>> different
>>>>>>>>> implications, and we should clarify the behavior. For example, if
>>> the
>>>>>>>>> client just submits the job and exits, what happens if the file
>>>>>>> contains
>>>>>>>>> two INSERT statements? I don't think we should treat them as a
>>>>>>> statement
>>>>>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
>>> that
>>>>>>>>> case. And the client shouldn't asynchronously submit the two jobs,
>>>>>>> because
>>>>>>>>> the 2nd may depend on the 1st, right?
>>>>>>>>>
>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Rui,
>>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
>>>>>>>>>>
>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
>>>>>>> command. In
>>>>>>>>>> the implementation, it will just put the key-value into the
>>>>>>>>>> `Configuration`, which will be used to generate the table config.
>>> If
>>>>>>> hive
>>>>>>>>>> supports to read the setting from the table config, users are
>> able
>>>>>>> to set
>>>>>>>>>> the hive-related settings.
>>>>>>>>>>
>>>>>>>>>> For the suggestion 2: The -f parameter will submit the job and
>>> exit.
>>>>>>> If
>>>>>>>>>> the queries never end, users have to cancel the job by
>> themselves,
>>>>>>> which is
>>>>>>>>>> not reliable(people may forget their jobs). In most case, queries
>>>>>>> are used
>>>>>>>>>> to analyze the data. Users should use queries in the interactive
>>>>>>> mode.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Shengkai
>>>>>>>>>>
>>>>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>>>>>>>>>>
>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
>>> covers a
>>>>>>>>>>> lot of useful features which will dramatically improve the
>>>>>>> usability of our
>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>>>>>>>>>>>
>>>>>>>>>>> 1. Do you think we can let users set arbitrary configurations
>> via
>>>>>>> the
>>>>>>>>>>> SET command? A connector may have its own configurations and we
>>>>>>> don't have
>>>>>>>>>>> a way to dynamically change such configurations in SQL Client.
>> For
>>>>>>> example,
>>>>>>>>>>> users may want to be able to change hive conf when using hive
>>>>>>> connector [1].
>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
>> specified
>>>>>>> with
>>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
>>> queries
>>>>>>> in the
>>>>>>>>>>> file. And a common use case is to run some query and redirect
>> the
>>>>>>> results
>>>>>>>>>>> to a file. So I think maybe flink users would like to do the
>> same,
>>>>>>>>>>> especially in batch scenarios.
>>>>>>>>>>>
>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>>>>> liuyang0704@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>
>>>>>>>>>>>> Glad to see this improvement. And I have some additional
>>>>>>> suggestions:
>>>>>>>>>>>>
>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client collect
>> the
>>>>>>>>>>>> results
>>>>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>>>>         which may have memory issues in JM or Local for the big
>>> query
>>>>>>>>>>>> result.
>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>>>>         We may change to use SelectTableSink, which is based
>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
>> FLIP-91.
>>>>>>> Seems
>>>>>>>>>>>> that this FLIP has not moved forward for a long time.
>>>>>>>>>>>>         Provide a long running service out of the box to
>>> facilitate
>>>>>>> the
>>>>>>>>>>>> sql
>>>>>>>>>>>> submission is necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think of these?
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
>> Client
>>>>>>>>>>>>> Improvements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Many users have complained about the problems of the sql
>> client.
>>>>>>> For
>>>>>>>>>>>>> example, users can not register the table proposed by FLIP-95.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize the
>>>>>>> table
>>>>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
>>>>>>>>>>>>>
>>>>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> *With kind regards
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
>>>>>>>>>>>> QQ: 3239559*
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best regards!
>>>>>>>>>>> Rui Li
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards!
>>>>>>>>> Rui Li
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards!
>>>>>>> Rui Li
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best, Jingsong Lee
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jark Wu <im...@gmail.com>.

Hi Timo,

I will respond some of the questions:

1) SQL client specific options

Whether it starts with "table" or "sql-client" depends on where the
configuration takes effect.
If it is a table configuration, we should make clear what's the behavior
when users change
the configuration in the lifecycle of TableEnvironment.

I agree with Shengkai `sql-client.planner` and `sql-client.execution.mode`
are something special
that can't be changed after TableEnvironment has been initialized. You can
see
`StreamExecutionEnvironment` provides `configure()`  method to override
configuration after
StreamExecutionEnvironment has been initialized.

Therefore, I think it would be better to still use  `sql-client.planner`
and `sql-client.execution.mode`.

2) Execution file

From my point of view, there is a big difference between
`sql-client.job.detach` and
`TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
affect every single DML statement
in the terminal, not only the statements in SQL files. I think the single
DML statement in the interactive
terminal is something like tEnv#executeSql() instead of
tEnv#executeMultiSql.
So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
I just find that runtime provides a configuration called
"execution.attached" [1] which is false by default
which specifies if the pipeline is submitted in attached or detached mode.
It provides exactly the same
functionality of `sql-client.job.detach`. What do you think about using
this option?

If we also want to support this config in TableEnvironment, I think it
should also affect the DML execution
 of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
Therefore, the behavior may look like this:

val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by default
tableResult.await()   ==> manually block until finish
tEnv.getConfig().getConfiguration().setString("execution.attached", "true")
val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't need
to wait on the TableResult
tEnv.executeMultiSql(
"""
CREATE TABLE ....  ==> always sync
INSERT INTO ...  => sync, because we set configuration above
SET execution.attached = false;
INSERT INTO ...  => async
""")

On the other hand, I think `sql-client.job.detach`
and `TableEnvironment.executeMultiSql()` should be two separate topics,
as Shengkai mentioned above, SQL CLI only depends on
`TableEnvironment#executeSql()` to support multi-line statements.
I'm fine with making `executeMultiSql()` clear but don't want it to block
this FLIP, maybe we can discuss this in another thread.


Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached

On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fs...@gmail.com> wrote:

> Hi, Timo.
> Thanks for your detailed feedback. I have some thoughts about your
> feedback.
>
> *Regarding #1*: I think the main problem is whether the table environment
> has the ability to update itself. Let's take a simple program as an
> example.
>
>
> ```
> TableEnvironment tEnv = TableEnvironment.create(...);
>
> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
>
>
> tEnv.executeSql("...");
>
> ```
>
> If we regard this option as a table option, users don't have to create
> another table environment manually. In that case, tEnv needs to check
> whether the current mode and planner are the same as before when executeSql
> or explainSql. I don't think it's easy work for the table environment,
> especially if users have a StreamExecutionEnvironment but set old planner
> and batch mode. But when we make this option as a sql client option, users
> only use the SET command to change the setting. We can rebuild a new table
> environment when set successes.
>
>
> *Regarding #2*: I think we need to discuss the implementation before
> continuing this topic. In the sql client, we will maintain two parsers. The
> first parser(client parser) will only match the sql client commands. If the
> client parser can't parse the statement, we will leverage the power of the
> table environment to execute. According to our blueprint,
> TableEnvironment#executeSql is enough for the sql client. Therefore,
> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>
> But if we need to introduce the `TableEnvironment.executeMultiSql` in the
> future, I think it's OK to use the option `table.multi-sql-async` rather
> than option `sql-client.job.detach`. But we think the name is not suitable
> because the name is confusing for others. When setting the option false, we
> just mean it will block the execution of the INSERT INTO statement, not DDL
> or others(other sql statements are always executed synchronously). So how
> about `table.job.async`? It only works for the sql-client and the
> executeMultiSql. If we set this value false, the table environment will
> return the result until the job finishes.
>
>
> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
> because HIVE also uses these commands to add the jar into the classpath or
> delete the jar. If we use  such commands, it can reduce our work for hive
> compatibility.
>
> For SHOW JAR, I think the main concern is the jars are not maintained by
> the Catalog. If we really needs to keep consistent with SQL grammar, maybe
> we should use
>
> `ADD JAR` -> `CREATE JAR`,
> `DELETE JAR` -> `DROP JAR`,
> `LIST JAR` -> `SHOW JAR`.
>
> *Regarding #5*: I agree with you that we'd better keep consistent.
>
> *Regarding #6*: Yes. Most of the commands should belong to the table
> environment. In the Summary section, I use the <NOTE> tag to identify which
> commands should belong to the sql client and which commands should belong
> to the table environment. I also add a new section about implementation
> details in the FLIP.
>
> Best,
> Shengkai
>
> Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：
>
> > Thanks for this great proposal Shengkai. This will give the SQL Client a
> > very good update and make it production ready.
> >
> > Here is some feedback from my side:
> >
> > 1) SQL client specific options
> >
> > I don't think that `sql-client.planner` and `sql-client.execution.mode`
> > are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> > `ExecutionConfig#configure` that have been added recently, we should
> > offer a possibility for TableEnvironment. How about we offer
> > `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
> > `table.execution-mode` to
> > `org.apache.flink.table.api.config.TableConfigOptions`?
> >
> > 2) Execution file
> >
> > Did you have a look at the Appendix of FLIP-84 [1] including the mailing
> > list thread at that time? Could you further elaborate how the
> > multi-statement execution should work for a unified batch/streaming
> > story? According to our past discussions, each line in an execution file
> > should be executed blocking which means a streaming query needs a
> > statement set to execute multiple INSERT INTO statement, correct? We
> > should also offer this functionality in
> > `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
> > SQL Client specific needs to be determined, it could also be a general
> > `table.multi-sql-async` option?
> >
> > 3) DELETE JAR
> >
> > Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
> > actively deleting the JAR in the corresponding path.
> >
> > 4) LIST JAR
> >
> > This should be `SHOW JARS` according to other SQL commands such as `SHOW
> > CATALOGS`, `SHOW TABLES`, etc. [2].
> >
> > 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >
> > We should keep the details in sync with
> > `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> > differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> > instead of `COST`. I'm sure the original author had a reason why to call
> > it that way.
> >
> > 6) Implementation details
> >
> > It would be nice to understand how we plan to implement the given
> > features. Most of the commands and config options should go into
> > TableEnvironment and SqlParser directly, correct? This way users have a
> > unified way of using Flink SQL. TableEnvironment would provide a similar
> > user experience in notebooks or interactive programs than the SQL Client.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > [2]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >
> > Regards,
> > Timo
> >
> >
> > On 02.02.21 10:13, Shengkai Fang wrote:
> > > Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> > >
> > > Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > >
> > >> Hi, Jingsong.
> > >>
> > >> Thanks for your reply. I think `UNSET` is much better.
> > >>
> > >> 1. We don't need to introduce another command `UNSET`. `RESET` is
> > >> supported in the current sql client now. Our proposal just extends its
> > >> grammar and allow users to reset the specified keys.
> > >> 2. Hive beeline also uses `RESET` to set the key to the default
> > value[1].
> > >> I think it is more friendly for batch users.
> > >>
> > >> Best,
> > >> Shengkai
> > >>
> > >> [1]
> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > >>
> > >> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > >>
> > >>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> > >>> improving it.
> > >>>
> > >>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > >>>
> > >>> Best,
> > >>> Jingsong
> > >>>
> > >>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com> wrote:
> > >>>
> > >>>> Thanks Shengkai for the update! The proposed changes look good to
> me.
> > >>>>
> > >>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hi, Rui.
> > >>>>> You are right. I have already modified the FLIP.
> > >>>>>
> > >>>>> The main changes:
> > >>>>>
> > >>>>> # -f parameter has no restriction about the statement type.
> > >>>>> Sometimes, users use the pipe to redirect the result of queries to
> > >>>> debug
> > >>>>> when submitting job by -f parameter. It's much convenient comparing
> > to
> > >>>>> writing INSERT INTO statements.
> > >>>>>
> > >>>>> # Add a new sql client option `sql-client.job.detach` .
> > >>>>> Users prefer to execute jobs one by one in the batch mode. Users
> can
> > >>>> set
> > >>>>> this option false and the client will process the next job until
> the
> > >>>>> current job finishes. The default value of this option is false,
> > which
> > >>>>> means the client will execute the next job when the current job is
> > >>>>> submitted.
> > >>>>>
> > >>>>> Best,
> > >>>>> Shengkai
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > >>>>>
> > >>>>>> Hi Shengkai,
> > >>>>>>
> > >>>>>> Regarding #2, maybe the -f options in flink and hive have
> different
> > >>>>>> implications, and we should clarify the behavior. For example, if
> > the
> > >>>>>> client just submits the job and exits, what happens if the file
> > >>>> contains
> > >>>>>> two INSERT statements? I don't think we should treat them as a
> > >>>> statement
> > >>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
> > that
> > >>>>>> case. And the client shouldn't asynchronously submit the two jobs,
> > >>>> because
> > >>>>>> the 2nd may depend on the 1st, right?
> > >>>>>>
> > >>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> Hi Rui,
> > >>>>>>> Thanks for your feedback. I agree with your suggestions.
> > >>>>>>>
> > >>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> > >>>> command. In
> > >>>>>>> the implementation, it will just put the key-value into the
> > >>>>>>> `Configuration`, which will be used to generate the table config.
> > If
> > >>>> hive
> > >>>>>>> supports to read the setting from the table config, users are
> able
> > >>>> to set
> > >>>>>>> the hive-related settings.
> > >>>>>>>
> > >>>>>>> For the suggestion 2: The -f parameter will submit the job and
> > exit.
> > >>>> If
> > >>>>>>> the queries never end, users have to cancel the job by
> themselves,
> > >>>> which is
> > >>>>>>> not reliable(people may forget their jobs). In most case, queries
> > >>>> are used
> > >>>>>>> to analyze the data. Users should use queries in the interactive
> > >>>> mode.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Shengkai
> > >>>>>>>
> > >>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > >>>>>>>
> > >>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> > covers a
> > >>>>>>>> lot of useful features which will dramatically improve the
> > >>>> usability of our
> > >>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > >>>>>>>>
> > >>>>>>>> 1. Do you think we can let users set arbitrary configurations
> via
> > >>>> the
> > >>>>>>>> SET command? A connector may have its own configurations and we
> > >>>> don't have
> > >>>>>>>> a way to dynamically change such configurations in SQL Client.
> For
> > >>>> example,
> > >>>>>>>> users may want to be able to change hive conf when using hive
> > >>>> connector [1].
> > >>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> specified
> > >>>> with
> > >>>>>>>> the -f option? Hive supports a similar -f option but allows
> > queries
> > >>>> in the
> > >>>>>>>> file. And a common use case is to run some query and redirect
> the
> > >>>> results
> > >>>>>>>> to a file. So I think maybe flink users would like to do the
> same,
> > >>>>>>>> especially in batch scenarios.
> > >>>>>>>>
> > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > >>>>>>>>
> > >>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > >>>> liuyang0704@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Shengkai,
> > >>>>>>>>>
> > >>>>>>>>> Glad to see this improvement. And I have some additional
> > >>>> suggestions:
> > >>>>>>>>>
> > >>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > >>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > >>>>>>>>> #2. Improve the way of results retrieval: sql client collect
> the
> > >>>>>>>>> results
> > >>>>>>>>> locally all at once using accumulators at present,
> > >>>>>>>>>        which may have memory issues in JM or Local for the big
> > query
> > >>>>>>>>> result.
> > >>>>>>>>> Accumulator is only suitable for testing purpose.
> > >>>>>>>>>        We may change to use SelectTableSink, which is based
> > >>>>>>>>> on CollectSinkOperatorCoordinator.
> > >>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> FLIP-91.
> > >>>> Seems
> > >>>>>>>>> that this FLIP has not moved forward for a long time.
> > >>>>>>>>>        Provide a long running service out of the box to
> > facilitate
> > >>>> the
> > >>>>>>>>> sql
> > >>>>>>>>> submission is necessary.
> > >>>>>>>>>
> > >>>>>>>>> What do you think of these?
> > >>>>>>>>>
> > >>>>>>>>> [1]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
> > >>>>>>>>>
> > >>>>>>>>>> Hi devs,
> > >>>>>>>>>>
> > >>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> Client
> > >>>>>>>>>> Improvements.
> > >>>>>>>>>>
> > >>>>>>>>>> Many users have complained about the problems of the sql
> client.
> > >>>> For
> > >>>>>>>>>> example, users can not register the table proposed by FLIP-95.
> > >>>>>>>>>>
> > >>>>>>>>>> The main changes in this FLIP:
> > >>>>>>>>>>
> > >>>>>>>>>> - use -i parameter to specify the sql file to initialize the
> > >>>> table
> > >>>>>>>>>> environment and deprecated YAML file;
> > >>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> > >>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > >>>>>>>>>> - support statement set syntax;
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > >>>>>>>>>>
> > >>>>>>>>>> Look forward to your feedback.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Shengkai
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> *With kind regards
> > >>>>>>>>> ------------------------------------------------------------
> > >>>>>>>>> Sebastian Liu 刘洋
> > >>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
> > >>>>>>>>> Mobile\WeChat: +86—15201613655
> > >>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> > >>>>>>>>> QQ: 3239559*
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Best regards!
> > >>>>>>>> Rui Li
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best regards!
> > >>>>>> Rui Li
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>> --
> > >>>> Best regards!
> > >>>> Rui Li
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Best, Jingsong Lee
> > >>>
> > >>
> > >
> >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi, Timo.
Thanks for your detailed feedback. I have some thoughts about your feedback.

*Regarding #1*: I think the main problem is whether the table environment
has the ability to update itself. Let's take a simple program as an example.


```
TableEnvironment tEnv = TableEnvironment.create(...);

tEnv.getConfig.getConfiguration.setString("table.planner", "old");


tEnv.executeSql("...");

```

If we regard this option as a table option, users don't have to create
another table environment manually. In that case, tEnv needs to check
whether the current mode and planner are the same as before when executeSql
or explainSql. I don't think it's easy work for the table environment,
especially if users have a StreamExecutionEnvironment but set old planner
and batch mode. But when we make this option as a sql client option, users
only use the SET command to change the setting. We can rebuild a new table
environment when set successes.


*Regarding #2*: I think we need to discuss the implementation before
continuing this topic. In the sql client, we will maintain two parsers. The
first parser(client parser) will only match the sql client commands. If the
client parser can't parse the statement, we will leverage the power of the
table environment to execute. According to our blueprint,
TableEnvironment#executeSql is enough for the sql client. Therefore,
TableEnvironment#executeMultiSql is out-of-scope for this FLIP.

But if we need to introduce the `TableEnvironment.executeMultiSql` in the
future, I think it's OK to use the option `table.multi-sql-async` rather
than option `sql-client.job.detach`. But we think the name is not suitable
because the name is confusing for others. When setting the option false, we
just mean it will block the execution of the INSERT INTO statement, not DDL
or others(other sql statements are always executed synchronously). So how
about `table.job.async`? It only works for the sql-client and the
executeMultiSql. If we set this value false, the table environment will
return the result until the job finishes.


*Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
because HIVE also uses these commands to add the jar into the classpath or
delete the jar. If we use  such commands, it can reduce our work for hive
compatibility.

For SHOW JAR, I think the main concern is the jars are not maintained by
the Catalog. If we really needs to keep consistent with SQL grammar, maybe
we should use

`ADD JAR` -> `CREATE JAR`,
`DELETE JAR` -> `DROP JAR`,
`LIST JAR` -> `SHOW JAR`.

*Regarding #5*: I agree with you that we'd better keep consistent.

*Regarding #6*: Yes. Most of the commands should belong to the table
environment. In the Summary section, I use the <NOTE> tag to identify which
commands should belong to the sql client and which commands should belong
to the table environment. I also add a new section about implementation
details in the FLIP.

Best,
Shengkai

Timo Walther <tw...@apache.org> 于2021年2月2日周二 下午6:43写道：

> Thanks for this great proposal Shengkai. This will give the SQL Client a
> very good update and make it production ready.
>
> Here is some feedback from my side:
>
> 1) SQL client specific options
>
> I don't think that `sql-client.planner` and `sql-client.execution.mode`
> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> `ExecutionConfig#configure` that have been added recently, we should
> offer a possibility for TableEnvironment. How about we offer
> `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
> `table.execution-mode` to
> `org.apache.flink.table.api.config.TableConfigOptions`?
>
> 2) Execution file
>
> Did you have a look at the Appendix of FLIP-84 [1] including the mailing
> list thread at that time? Could you further elaborate how the
> multi-statement execution should work for a unified batch/streaming
> story? According to our past discussions, each line in an execution file
> should be executed blocking which means a streaming query needs a
> statement set to execute multiple INSERT INTO statement, correct? We
> should also offer this functionality in
> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
> SQL Client specific needs to be determined, it could also be a general
> `table.multi-sql-async` option?
>
> 3) DELETE JAR
>
> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
> actively deleting the JAR in the corresponding path.
>
> 4) LIST JAR
>
> This should be `SHOW JARS` according to other SQL commands such as `SHOW
> CATALOGS`, `SHOW TABLES`, etc. [2].
>
> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>
> We should keep the details in sync with
> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> instead of `COST`. I'm sure the original author had a reason why to call
> it that way.
>
> 6) Implementation details
>
> It would be nice to understand how we plan to implement the given
> features. Most of the commands and config options should go into
> TableEnvironment and SqlParser directly, correct? This way users have a
> unified way of using Flink SQL. TableEnvironment would provide a similar
> user experience in notebooks or interactive programs than the SQL Client.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> [2]
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>
> Regards,
> Timo
>
>
> On 02.02.21 10:13, Shengkai Fang wrote:
> > Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> >
> > Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> >
> >> Hi, Jingsong.
> >>
> >> Thanks for your reply. I think `UNSET` is much better.
> >>
> >> 1. We don't need to introduce another command `UNSET`. `RESET` is
> >> supported in the current sql client now. Our proposal just extends its
> >> grammar and allow users to reset the specified keys.
> >> 2. Hive beeline also uses `RESET` to set the key to the default
> value[1].
> >> I think it is more friendly for batch users.
> >>
> >> Best,
> >> Shengkai
> >>
> >> [1]
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>
> >> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> >>
> >>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> >>> improving it.
> >>>
> >>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> >>>
> >>> Best,
> >>> Jingsong
> >>>
> >>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com> wrote:
> >>>
> >>>> Thanks Shengkai for the update! The proposed changes look good to me.
> >>>>
> >>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi, Rui.
> >>>>> You are right. I have already modified the FLIP.
> >>>>>
> >>>>> The main changes:
> >>>>>
> >>>>> # -f parameter has no restriction about the statement type.
> >>>>> Sometimes, users use the pipe to redirect the result of queries to
> >>>> debug
> >>>>> when submitting job by -f parameter. It's much convenient comparing
> to
> >>>>> writing INSERT INTO statements.
> >>>>>
> >>>>> # Add a new sql client option `sql-client.job.detach` .
> >>>>> Users prefer to execute jobs one by one in the batch mode. Users can
> >>>> set
> >>>>> this option false and the client will process the next job until the
> >>>>> current job finishes. The default value of this option is false,
> which
> >>>>> means the client will execute the next job when the current job is
> >>>>> submitted.
> >>>>>
> >>>>> Best,
> >>>>> Shengkai
> >>>>>
> >>>>>
> >>>>>
> >>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> >>>>>
> >>>>>> Hi Shengkai,
> >>>>>>
> >>>>>> Regarding #2, maybe the -f options in flink and hive have different
> >>>>>> implications, and we should clarify the behavior. For example, if
> the
> >>>>>> client just submits the job and exits, what happens if the file
> >>>> contains
> >>>>>> two INSERT statements? I don't think we should treat them as a
> >>>> statement
> >>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
> that
> >>>>>> case. And the client shouldn't asynchronously submit the two jobs,
> >>>> because
> >>>>>> the 2nd may depend on the 1st, right?
> >>>>>>
> >>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi Rui,
> >>>>>>> Thanks for your feedback. I agree with your suggestions.
> >>>>>>>
> >>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> >>>> command. In
> >>>>>>> the implementation, it will just put the key-value into the
> >>>>>>> `Configuration`, which will be used to generate the table config.
> If
> >>>> hive
> >>>>>>> supports to read the setting from the table config, users are able
> >>>> to set
> >>>>>>> the hive-related settings.
> >>>>>>>
> >>>>>>> For the suggestion 2: The -f parameter will submit the job and
> exit.
> >>>> If
> >>>>>>> the queries never end, users have to cancel the job by themselves,
> >>>> which is
> >>>>>>> not reliable(people may forget their jobs). In most case, queries
> >>>> are used
> >>>>>>> to analyze the data. Users should use queries in the interactive
> >>>> mode.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Shengkai
> >>>>>>>
> >>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> >>>>>>>
> >>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> covers a
> >>>>>>>> lot of useful features which will dramatically improve the
> >>>> usability of our
> >>>>>>>> SQL Client. I have two questions regarding the FLIP.
> >>>>>>>>
> >>>>>>>> 1. Do you think we can let users set arbitrary configurations via
> >>>> the
> >>>>>>>> SET command? A connector may have its own configurations and we
> >>>> don't have
> >>>>>>>> a way to dynamically change such configurations in SQL Client. For
> >>>> example,
> >>>>>>>> users may want to be able to change hive conf when using hive
> >>>> connector [1].
> >>>>>>>> 2. Any reason why we have to forbid queries in SQL files specified
> >>>> with
> >>>>>>>> the -f option? Hive supports a similar -f option but allows
> queries
> >>>> in the
> >>>>>>>> file. And a common use case is to run some query and redirect the
> >>>> results
> >>>>>>>> to a file. So I think maybe flink users would like to do the same,
> >>>>>>>> especially in batch scenarios.
> >>>>>>>>
> >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>
> >>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>> liuyang0704@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Shengkai,
> >>>>>>>>>
> >>>>>>>>> Glad to see this improvement. And I have some additional
> >>>> suggestions:
> >>>>>>>>>
> >>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> >>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> >>>>>>>>> #2. Improve the way of results retrieval: sql client collect the
> >>>>>>>>> results
> >>>>>>>>> locally all at once using accumulators at present,
> >>>>>>>>>        which may have memory issues in JM or Local for the big
> query
> >>>>>>>>> result.
> >>>>>>>>> Accumulator is only suitable for testing purpose.
> >>>>>>>>>        We may change to use SelectTableSink, which is based
> >>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
> >>>> Seems
> >>>>>>>>> that this FLIP has not moved forward for a long time.
> >>>>>>>>>        Provide a long running service out of the box to
> facilitate
> >>>> the
> >>>>>>>>> sql
> >>>>>>>>> submission is necessary.
> >>>>>>>>>
> >>>>>>>>> What do you think of these?
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
> >>>>>>>>>
> >>>>>>>>>> Hi devs,
> >>>>>>>>>>
> >>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL Client
> >>>>>>>>>> Improvements.
> >>>>>>>>>>
> >>>>>>>>>> Many users have complained about the problems of the sql client.
> >>>> For
> >>>>>>>>>> example, users can not register the table proposed by FLIP-95.
> >>>>>>>>>>
> >>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>
> >>>>>>>>>> - use -i parameter to specify the sql file to initialize the
> >>>> table
> >>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> >>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> >>>>>>>>>>
> >>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Shengkai
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> *With kind regards
> >>>>>>>>> ------------------------------------------------------------
> >>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
> >>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> >>>>>>>>> QQ: 3239559*
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best regards!
> >>>>>>>> Rui Li
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best regards!
> >>>>>> Rui Li
> >>>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Best regards!
> >>>> Rui Li
> >>>>
> >>>
> >>>
> >>> --
> >>> Best, Jingsong Lee
> >>>
> >>
> >
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Timo Walther <tw...@apache.org>.

Thanks for this great proposal Shengkai. This will give the SQL Client a 
very good update and make it production ready.

Here is some feedback from my side:

1) SQL client specific options

I don't think that `sql-client.planner` and `sql-client.execution.mode` 
are SQL Client specific. Similar to `StreamExecutionEnvironment` and 
`ExecutionConfig#configure` that have been added recently, we should 
offer a possibility for TableEnvironment. How about we offer 
`TableEnvironment.create(ReadableConfig)` and add a `table.planner` and 
`table.execution-mode` to 
`org.apache.flink.table.api.config.TableConfigOptions`?

2) Execution file

Did you have a look at the Appendix of FLIP-84 [1] including the mailing 
list thread at that time? Could you further elaborate how the 
multi-statement execution should work for a unified batch/streaming 
story? According to our past discussions, each line in an execution file 
should be executed blocking which means a streaming query needs a 
statement set to execute multiple INSERT INTO statement, correct? We 
should also offer this functionality in 
`TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is 
SQL Client specific needs to be determined, it could also be a general 
`table.multi-sql-async` option?

3) DELETE JAR

Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is 
actively deleting the JAR in the corresponding path.

4) LIST JAR

This should be `SHOW JARS` according to other SQL commands such as `SHOW 
CATALOGS`, `SHOW TABLES`, etc. [2].

5) EXPLAIN [ExplainDetail[, ExplainDetail]*]

We should keep the details in sync with 
`org.apache.flink.table.api.ExplainDetail` and avoid confusion about 
differently named ExplainDetails. I would vote for `ESTIMATED_COST` 
instead of `COST`. I'm sure the original author had a reason why to call 
it that way.

6) Implementation details

It would be nice to understand how we plan to implement the given 
features. Most of the commands and config options should go into 
TableEnvironment and SqlParser directly, correct? This way users have a 
unified way of using Flink SQL. TableEnvironment would provide a similar 
user experience in notebooks or interactive programs than the SQL Client.

[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
[2] 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html

Regards,
Timo


On 02.02.21 10:13, Shengkai Fang wrote:
> Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> 
> Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> 
>> Hi, Jingsong.
>>
>> Thanks for your reply. I think `UNSET` is much better.
>>
>> 1. We don't need to introduce another command `UNSET`. `RESET` is
>> supported in the current sql client now. Our proposal just extends its
>> grammar and allow users to reset the specified keys.
>> 2. Hive beeline also uses `RESET` to set the key to the default value[1].
>> I think it is more friendly for batch users.
>>
>> Best,
>> Shengkai
>>
>> [1] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>
>> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
>>
>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
>>> improving it.
>>>
>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>
>>> Best,
>>> Jingsong
>>>
>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com> wrote:
>>>
>>>> Thanks Shengkai for the update! The proposed changes look good to me.
>>>>
>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com> wrote:
>>>>
>>>>> Hi, Rui.
>>>>> You are right. I have already modified the FLIP.
>>>>>
>>>>> The main changes:
>>>>>
>>>>> # -f parameter has no restriction about the statement type.
>>>>> Sometimes, users use the pipe to redirect the result of queries to
>>>> debug
>>>>> when submitting job by -f parameter. It's much convenient comparing to
>>>>> writing INSERT INTO statements.
>>>>>
>>>>> # Add a new sql client option `sql-client.job.detach` .
>>>>> Users prefer to execute jobs one by one in the batch mode. Users can
>>>> set
>>>>> this option false and the client will process the next job until the
>>>>> current job finishes. The default value of this option is false, which
>>>>> means the client will execute the next job when the current job is
>>>>> submitted.
>>>>>
>>>>> Best,
>>>>> Shengkai
>>>>>
>>>>>
>>>>>
>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>>>>>
>>>>>> Hi Shengkai,
>>>>>>
>>>>>> Regarding #2, maybe the -f options in flink and hive have different
>>>>>> implications, and we should clarify the behavior. For example, if the
>>>>>> client just submits the job and exits, what happens if the file
>>>> contains
>>>>>> two INSERT statements? I don't think we should treat them as a
>>>> statement
>>>>>> set, because users should explicitly write BEGIN STATEMENT SET in that
>>>>>> case. And the client shouldn't asynchronously submit the two jobs,
>>>> because
>>>>>> the 2nd may depend on the 1st, right?
>>>>>>
>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>>> Hi Rui,
>>>>>>> Thanks for your feedback. I agree with your suggestions.
>>>>>>>
>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
>>>> command. In
>>>>>>> the implementation, it will just put the key-value into the
>>>>>>> `Configuration`, which will be used to generate the table config. If
>>>> hive
>>>>>>> supports to read the setting from the table config, users are able
>>>> to set
>>>>>>> the hive-related settings.
>>>>>>>
>>>>>>> For the suggestion 2: The -f parameter will submit the job and exit.
>>>> If
>>>>>>> the queries never end, users have to cancel the job by themselves,
>>>> which is
>>>>>>> not reliable(people may forget their jobs). In most case, queries
>>>> are used
>>>>>>> to analyze the data. Users should use queries in the interactive
>>>> mode.
>>>>>>>
>>>>>>> Best,
>>>>>>> Shengkai
>>>>>>>
>>>>>>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>>>>>>>
>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>>>>>>>> lot of useful features which will dramatically improve the
>>>> usability of our
>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>>>>>>>>
>>>>>>>> 1. Do you think we can let users set arbitrary configurations via
>>>> the
>>>>>>>> SET command? A connector may have its own configurations and we
>>>> don't have
>>>>>>>> a way to dynamically change such configurations in SQL Client. For
>>>> example,
>>>>>>>> users may want to be able to change hive conf when using hive
>>>> connector [1].
>>>>>>>> 2. Any reason why we have to forbid queries in SQL files specified
>>>> with
>>>>>>>> the -f option? Hive supports a similar -f option but allows queries
>>>> in the
>>>>>>>> file. And a common use case is to run some query and redirect the
>>>> results
>>>>>>>> to a file. So I think maybe flink users would like to do the same,
>>>>>>>> especially in batch scenarios.
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>
>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>> liuyang0704@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Shengkai,
>>>>>>>>>
>>>>>>>>> Glad to see this improvement. And I have some additional
>>>> suggestions:
>>>>>>>>>
>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
>>>>>>>>> #2. Improve the way of results retrieval: sql client collect the
>>>>>>>>> results
>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>        which may have memory issues in JM or Local for the big query
>>>>>>>>> result.
>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>        We may change to use SelectTableSink, which is based
>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
>>>> Seems
>>>>>>>>> that this FLIP has not moved forward for a long time.
>>>>>>>>>        Provide a long running service out of the box to facilitate
>>>> the
>>>>>>>>> sql
>>>>>>>>> submission is necessary.
>>>>>>>>>
>>>>>>>>> What do you think of these?
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
>>>>>>>>>
>>>>>>>>>> Hi devs,
>>>>>>>>>>
>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL Client
>>>>>>>>>> Improvements.
>>>>>>>>>>
>>>>>>>>>> Many users have complained about the problems of the sql client.
>>>> For
>>>>>>>>>> example, users can not register the table proposed by FLIP-95.
>>>>>>>>>>
>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>
>>>>>>>>>> - use -i parameter to specify the sql file to initialize the
>>>> table
>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
>>>>>>>>>>
>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Shengkai
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *With kind regards
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
>>>>>>>>> QQ: 3239559*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards!
>>>>>>>> Rui Li
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>
>>>
>>> --
>>> Best, Jingsong Lee
>>>
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.

Shengkai Fang <fs...@gmail.com> 于2021年2月2日周二 下午4:44写道：

> Hi, Jingsong.
>
> Thanks for your reply. I think `UNSET` is much better.
>
> 1. We don't need to introduce another command `UNSET`. `RESET` is
> supported in the current sql client now. Our proposal just extends its
> grammar and allow users to reset the specified keys.
> 2. Hive beeline also uses `RESET` to set the key to the default value[1].
> I think it is more friendly for batch users.
>
> Best,
> Shengkai
>
> [1] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>
> Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：
>
>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
>> improving it.
>>
>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>
>> Best,
>> Jingsong
>>
>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com> wrote:
>>
>>> Thanks Shengkai for the update! The proposed changes look good to me.
>>>
>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com> wrote:
>>>
>>> > Hi, Rui.
>>> > You are right. I have already modified the FLIP.
>>> >
>>> > The main changes:
>>> >
>>> > # -f parameter has no restriction about the statement type.
>>> > Sometimes, users use the pipe to redirect the result of queries to
>>> debug
>>> > when submitting job by -f parameter. It's much convenient comparing to
>>> > writing INSERT INTO statements.
>>> >
>>> > # Add a new sql client option `sql-client.job.detach` .
>>> > Users prefer to execute jobs one by one in the batch mode. Users can
>>> set
>>> > this option false and the client will process the next job until the
>>> > current job finishes. The default value of this option is false, which
>>> > means the client will execute the next job when the current job is
>>> > submitted.
>>> >
>>> > Best,
>>> > Shengkai
>>> >
>>> >
>>> >
>>> > Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>>> >
>>> >> Hi Shengkai,
>>> >>
>>> >> Regarding #2, maybe the -f options in flink and hive have different
>>> >> implications, and we should clarify the behavior. For example, if the
>>> >> client just submits the job and exits, what happens if the file
>>> contains
>>> >> two INSERT statements? I don't think we should treat them as a
>>> statement
>>> >> set, because users should explicitly write BEGIN STATEMENT SET in that
>>> >> case. And the client shouldn't asynchronously submit the two jobs,
>>> because
>>> >> the 2nd may depend on the 1st, right?
>>> >>
>>> >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com>
>>> wrote:
>>> >>
>>> >>> Hi Rui,
>>> >>> Thanks for your feedback. I agree with your suggestions.
>>> >>>
>>> >>> For the suggestion 1: Yes. we are plan to strengthen the set
>>> command. In
>>> >>> the implementation, it will just put the key-value into the
>>> >>> `Configuration`, which will be used to generate the table config. If
>>> hive
>>> >>> supports to read the setting from the table config, users are able
>>> to set
>>> >>> the hive-related settings.
>>> >>>
>>> >>> For the suggestion 2: The -f parameter will submit the job and exit.
>>> If
>>> >>> the queries never end, users have to cancel the job by themselves,
>>> which is
>>> >>> not reliable(people may forget their jobs). In most case, queries
>>> are used
>>> >>> to analyze the data. Users should use queries in the interactive
>>> mode.
>>> >>>
>>> >>> Best,
>>> >>> Shengkai
>>> >>>
>>> >>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>>> >>>
>>> >>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>>> >>>> lot of useful features which will dramatically improve the
>>> usability of our
>>> >>>> SQL Client. I have two questions regarding the FLIP.
>>> >>>>
>>> >>>> 1. Do you think we can let users set arbitrary configurations via
>>> the
>>> >>>> SET command? A connector may have its own configurations and we
>>> don't have
>>> >>>> a way to dynamically change such configurations in SQL Client. For
>>> example,
>>> >>>> users may want to be able to change hive conf when using hive
>>> connector [1].
>>> >>>> 2. Any reason why we have to forbid queries in SQL files specified
>>> with
>>> >>>> the -f option? Hive supports a similar -f option but allows queries
>>> in the
>>> >>>> file. And a common use case is to run some query and redirect the
>>> results
>>> >>>> to a file. So I think maybe flink users would like to do the same,
>>> >>>> especially in batch scenarios.
>>> >>>>
>>> >>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>> >>>>
>>> >>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>> liuyang0704@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Hi Shengkai,
>>> >>>>>
>>> >>>>> Glad to see this improvement. And I have some additional
>>> suggestions:
>>> >>>>>
>>> >>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>> >>>>> StreamTableEnvironment for both streaming and batch sql.
>>> >>>>> #2. Improve the way of results retrieval: sql client collect the
>>> >>>>> results
>>> >>>>> locally all at once using accumulators at present,
>>> >>>>>       which may have memory issues in JM or Local for the big query
>>> >>>>> result.
>>> >>>>> Accumulator is only suitable for testing purpose.
>>> >>>>>       We may change to use SelectTableSink, which is based
>>> >>>>> on CollectSinkOperatorCoordinator.
>>> >>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
>>> Seems
>>> >>>>> that this FLIP has not moved forward for a long time.
>>> >>>>>       Provide a long running service out of the box to facilitate
>>> the
>>> >>>>> sql
>>> >>>>> submission is necessary.
>>> >>>>>
>>> >>>>> What do you think of these?
>>> >>>>>
>>> >>>>> [1]
>>> >>>>>
>>> >>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>> >>>>>
>>> >>>>>
>>> >>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
>>> >>>>>
>>> >>>>> > Hi devs,
>>> >>>>> >
>>> >>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>>> >>>>> > Improvements.
>>> >>>>> >
>>> >>>>> > Many users have complained about the problems of the sql client.
>>> For
>>> >>>>> > example, users can not register the table proposed by FLIP-95.
>>> >>>>> >
>>> >>>>> > The main changes in this FLIP:
>>> >>>>> >
>>> >>>>> > - use -i parameter to specify the sql file to initialize the
>>> table
>>> >>>>> > environment and deprecated YAML file;
>>> >>>>> > - add -f to submit sql file and deprecated '-u' parameter;
>>> >>>>> > - add more interactive commands, e.g ADD JAR;
>>> >>>>> > - support statement set syntax;
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > For more detailed changes, please refer to FLIP-163[1].
>>> >>>>> >
>>> >>>>> > Look forward to your feedback.
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > Best,
>>> >>>>> > Shengkai
>>> >>>>> >
>>> >>>>> > [1]
>>> >>>>> >
>>> >>>>> >
>>> >>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>> >>>>> >
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>>
>>> >>>>> *With kind regards
>>> >>>>> ------------------------------------------------------------
>>> >>>>> Sebastian Liu 刘洋
>>> >>>>> Institute of Computing Technology, Chinese Academy of Science
>>> >>>>> Mobile\WeChat: +86—15201613655
>>> >>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
>>> >>>>> QQ: 3239559*
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Best regards!
>>> >>>> Rui Li
>>> >>>>
>>> >>>
>>> >>
>>> >> --
>>> >> Best regards!
>>> >> Rui Li
>>> >>
>>> >
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Shengkai Fang <fs...@gmail.com>.

Hi, Jingsong.

Thanks for your reply. I think `UNSET` is much better.

1. We don't need to introduce another command `UNSET`. `RESET` is supported
in the current sql client now. Our proposal just extends its grammar and
allow users to reset the specified keys.
2. Hive beeline also uses `RESET` to set the key to the default value[1]. I
think it is more friendly for batch users.

Best,
Shengkai

[1] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

Jingsong Li <ji...@gmail.com> 于2021年2月2日周二 下午1:56写道：

> Thanks for the proposal, yes, sql-client is too outdated. +1 for improving
> it.
>
> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>
> Best,
> Jingsong
>
> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com> wrote:
>
>> Thanks Shengkai for the update! The proposed changes look good to me.
>>
>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com> wrote:
>>
>> > Hi, Rui.
>> > You are right. I have already modified the FLIP.
>> >
>> > The main changes:
>> >
>> > # -f parameter has no restriction about the statement type.
>> > Sometimes, users use the pipe to redirect the result of queries to debug
>> > when submitting job by -f parameter. It's much convenient comparing to
>> > writing INSERT INTO statements.
>> >
>> > # Add a new sql client option `sql-client.job.detach` .
>> > Users prefer to execute jobs one by one in the batch mode. Users can set
>> > this option false and the client will process the next job until the
>> > current job finishes. The default value of this option is false, which
>> > means the client will execute the next job when the current job is
>> > submitted.
>> >
>> > Best,
>> > Shengkai
>> >
>> >
>> >
>> > Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
>> >
>> >> Hi Shengkai,
>> >>
>> >> Regarding #2, maybe the -f options in flink and hive have different
>> >> implications, and we should clarify the behavior. For example, if the
>> >> client just submits the job and exits, what happens if the file
>> contains
>> >> two INSERT statements? I don't think we should treat them as a
>> statement
>> >> set, because users should explicitly write BEGIN STATEMENT SET in that
>> >> case. And the client shouldn't asynchronously submit the two jobs,
>> because
>> >> the 2nd may depend on the 1st, right?
>> >>
>> >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com>
>> wrote:
>> >>
>> >>> Hi Rui,
>> >>> Thanks for your feedback. I agree with your suggestions.
>> >>>
>> >>> For the suggestion 1: Yes. we are plan to strengthen the set command.
>> In
>> >>> the implementation, it will just put the key-value into the
>> >>> `Configuration`, which will be used to generate the table config. If
>> hive
>> >>> supports to read the setting from the table config, users are able to
>> set
>> >>> the hive-related settings.
>> >>>
>> >>> For the suggestion 2: The -f parameter will submit the job and exit.
>> If
>> >>> the queries never end, users have to cancel the job by themselves,
>> which is
>> >>> not reliable(people may forget their jobs). In most case, queries are
>> used
>> >>> to analyze the data. Users should use queries in the interactive mode.
>> >>>
>> >>> Best,
>> >>> Shengkai
>> >>>
>> >>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
>> >>>
>> >>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>> >>>> lot of useful features which will dramatically improve the usability
>> of our
>> >>>> SQL Client. I have two questions regarding the FLIP.
>> >>>>
>> >>>> 1. Do you think we can let users set arbitrary configurations via the
>> >>>> SET command? A connector may have its own configurations and we
>> don't have
>> >>>> a way to dynamically change such configurations in SQL Client. For
>> example,
>> >>>> users may want to be able to change hive conf when using hive
>> connector [1].
>> >>>> 2. Any reason why we have to forbid queries in SQL files specified
>> with
>> >>>> the -f option? Hive supports a similar -f option but allows queries
>> in the
>> >>>> file. And a common use case is to run some query and redirect the
>> results
>> >>>> to a file. So I think maybe flink users would like to do the same,
>> >>>> especially in batch scenarios.
>> >>>>
>> >>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>> >>>>
>> >>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>> liuyang0704@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Shengkai,
>> >>>>>
>> >>>>> Glad to see this improvement. And I have some additional
>> suggestions:
>> >>>>>
>> >>>>> #1. Unify the TableEnvironment in ExecutionContext to
>> >>>>> StreamTableEnvironment for both streaming and batch sql.
>> >>>>> #2. Improve the way of results retrieval: sql client collect the
>> >>>>> results
>> >>>>> locally all at once using accumulators at present,
>> >>>>>       which may have memory issues in JM or Local for the big query
>> >>>>> result.
>> >>>>> Accumulator is only suitable for testing purpose.
>> >>>>>       We may change to use SelectTableSink, which is based
>> >>>>> on CollectSinkOperatorCoordinator.
>> >>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
>> Seems
>> >>>>> that this FLIP has not moved forward for a long time.
>> >>>>>       Provide a long running service out of the box to facilitate
>> the
>> >>>>> sql
>> >>>>> submission is necessary.
>> >>>>>
>> >>>>> What do you think of these?
>> >>>>>
>> >>>>> [1]
>> >>>>>
>> >>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>> >>>>>
>> >>>>>
>> >>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
>> >>>>>
>> >>>>> > Hi devs,
>> >>>>> >
>> >>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>> >>>>> > Improvements.
>> >>>>> >
>> >>>>> > Many users have complained about the problems of the sql client.
>> For
>> >>>>> > example, users can not register the table proposed by FLIP-95.
>> >>>>> >
>> >>>>> > The main changes in this FLIP:
>> >>>>> >
>> >>>>> > - use -i parameter to specify the sql file to initialize the table
>> >>>>> > environment and deprecated YAML file;
>> >>>>> > - add -f to submit sql file and deprecated '-u' parameter;
>> >>>>> > - add more interactive commands, e.g ADD JAR;
>> >>>>> > - support statement set syntax;
>> >>>>> >
>> >>>>> >
>> >>>>> > For more detailed changes, please refer to FLIP-163[1].
>> >>>>> >
>> >>>>> > Look forward to your feedback.
>> >>>>> >
>> >>>>> >
>> >>>>> > Best,
>> >>>>> > Shengkai
>> >>>>> >
>> >>>>> > [1]
>> >>>>> >
>> >>>>> >
>> >>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>> >>>>> >
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> *With kind regards
>> >>>>> ------------------------------------------------------------
>> >>>>> Sebastian Liu 刘洋
>> >>>>> Institute of Computing Technology, Chinese Academy of Science
>> >>>>> Mobile\WeChat: +86—15201613655
>> >>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
>> >>>>> QQ: 3239559*
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best regards!
>> >>>> Rui Li
>> >>>>
>> >>>
>> >>
>> >> --
>> >> Best regards!
>> >> Rui Li
>> >>
>> >
>>
>> --
>> Best regards!
>> Rui Li
>>
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Posted by Jingsong Li <ji...@gmail.com>.

Thanks for the proposal, yes, sql-client is too outdated. +1 for improving
it.

About "SET"  and "RESET", Why not be "SET" and "UNSET"?

Best,
Jingsong

On Mon, Feb 1, 2021 at 2:46 PM Rui Li <li...@gmail.com> wrote:

> Thanks Shengkai for the update! The proposed changes look good to me.
>
> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <fs...@gmail.com> wrote:
>
> > Hi, Rui.
> > You are right. I have already modified the FLIP.
> >
> > The main changes:
> >
> > # -f parameter has no restriction about the statement type.
> > Sometimes, users use the pipe to redirect the result of queries to debug
> > when submitting job by -f parameter. It's much convenient comparing to
> > writing INSERT INTO statements.
> >
> > # Add a new sql client option `sql-client.job.detach` .
> > Users prefer to execute jobs one by one in the batch mode. Users can set
> > this option false and the client will process the next job until the
> > current job finishes. The default value of this option is false, which
> > means the client will execute the next job when the current job is
> > submitted.
> >
> > Best,
> > Shengkai
> >
> >
> >
> > Rui Li <li...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> >
> >> Hi Shengkai,
> >>
> >> Regarding #2, maybe the -f options in flink and hive have different
> >> implications, and we should clarify the behavior. For example, if the
> >> client just submits the job and exits, what happens if the file contains
> >> two INSERT statements? I don't think we should treat them as a statement
> >> set, because users should explicitly write BEGIN STATEMENT SET in that
> >> case. And the client shouldn't asynchronously submit the two jobs,
> because
> >> the 2nd may depend on the 1st, right?
> >>
> >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <fs...@gmail.com>
> wrote:
> >>
> >>> Hi Rui,
> >>> Thanks for your feedback. I agree with your suggestions.
> >>>
> >>> For the suggestion 1: Yes. we are plan to strengthen the set command.
> In
> >>> the implementation, it will just put the key-value into the
> >>> `Configuration`, which will be used to generate the table config. If
> hive
> >>> supports to read the setting from the table config, users are able to
> set
> >>> the hive-related settings.
> >>>
> >>> For the suggestion 2: The -f parameter will submit the job and exit. If
> >>> the queries never end, users have to cancel the job by themselves,
> which is
> >>> not reliable(people may forget their jobs). In most case, queries are
> used
> >>> to analyze the data. Users should use queries in the interactive mode.
> >>>
> >>> Best,
> >>> Shengkai
> >>>
> >>> Rui Li <li...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> >>>
> >>>> Thanks Shengkai for bringing up this discussion. I think it covers a
> >>>> lot of useful features which will dramatically improve the usability
> of our
> >>>> SQL Client. I have two questions regarding the FLIP.
> >>>>
> >>>> 1. Do you think we can let users set arbitrary configurations via the
> >>>> SET command? A connector may have its own configurations and we don't
> have
> >>>> a way to dynamically change such configurations in SQL Client. For
> example,
> >>>> users may want to be able to change hive conf when using hive
> connector [1].
> >>>> 2. Any reason why we have to forbid queries in SQL files specified
> with
> >>>> the -f option? Hive supports a similar -f option but allows queries
> in the
> >>>> file. And a common use case is to run some query and redirect the
> results
> >>>> to a file. So I think maybe flink users would like to do the same,
> >>>> especially in batch scenarios.
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> >>>>
> >>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <liuyang0704@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>>> Hi Shengkai,
> >>>>>
> >>>>> Glad to see this improvement. And I have some additional suggestions:
> >>>>>
> >>>>> #1. Unify the TableEnvironment in ExecutionContext to
> >>>>> StreamTableEnvironment for both streaming and batch sql.
> >>>>> #2. Improve the way of results retrieval: sql client collect the
> >>>>> results
> >>>>> locally all at once using accumulators at present,
> >>>>>       which may have memory issues in JM or Local for the big query
> >>>>> result.
> >>>>> Accumulator is only suitable for testing purpose.
> >>>>>       We may change to use SelectTableSink, which is based
> >>>>> on CollectSinkOperatorCoordinator.
> >>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
> Seems
> >>>>> that this FLIP has not moved forward for a long time.
> >>>>>       Provide a long running service out of the box to facilitate the
> >>>>> sql
> >>>>> submission is necessary.
> >>>>>
> >>>>> What do you think of these?
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>
> >>>>>
> >>>>> Shengkai Fang <fs...@gmail.com> 于2021年1月28日周四 下午8:54写道：
> >>>>>
> >>>>> > Hi devs,
> >>>>> >
> >>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
> >>>>> > Improvements.
> >>>>> >
> >>>>> > Many users have complained about the problems of the sql client.
> For
> >>>>> > example, users can not register the table proposed by FLIP-95.
> >>>>> >
> >>>>> > The main changes in this FLIP:
> >>>>> >
> >>>>> > - use -i parameter to specify the sql file to initialize the table
> >>>>> > environment and deprecated YAML file;
> >>>>> > - add -f to submit sql file and deprecated '-u' parameter;
> >>>>> > - add more interactive commands, e.g ADD JAR;
> >>>>> > - support statement set syntax;
> >>>>> >
> >>>>> >
> >>>>> > For more detailed changes, please refer to FLIP-163[1].
> >>>>> >
> >>>>> > Look forward to your feedback.
> >>>>> >
> >>>>> >
> >>>>> > Best,
> >>>>> > Shengkai
> >>>>> >
> >>>>> > [1]
> >>>>> >
> >>>>> >
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>> >
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> *With kind regards
> >>>>> ------------------------------------------------------------
> >>>>> Sebastian Liu 刘洋
> >>>>> Institute of Computing Technology, Chinese Academy of Science
> >>>>> Mobile\WeChat: +86—15201613655
> >>>>> E-mail: liuyang0704@gmail.com <li...@gmail.com>
> >>>>> QQ: 3239559*
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards!
> >>>> Rui Li
> >>>>
> >>>
> >>
> >> --
> >> Best regards!
> >> Rui Li
> >>
> >
>
> --
> Best regards!
> Rui Li
>


-- 
Best, Jingsong Lee