You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Jark Wu <im...@gmail.com> on 2022/07/04 09:32:20 UTC

Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Hi Mang,

I'm not sure whether your response has addressed Yuxia's concern or not.
Would be better to receive a confirmation from participants before starting
the vote.

Actually, I have the same feeling with Yuxia's reply.

1) RTAS
If it's hard to have a consistent behavior for RTAS between streaming mode
and batch mode,
it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
suitable and may need to
change in the future. If the RTAS will not be supported in this version and
the configuration
may be not suitable in the future, how about removing the "rtas" from the
config? We can
still evolve the config to "table.ctas-rtas" if the semantics are the same,
and still keeps backward compatibility.

2) AtomicCatalog
We won't add other methods to `AtomicCatalog` in the future, because new
methods required for isolation doesn't
belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
`TransactionalCatalog` or `StagingCalalog`.
So, I think Yuxia's concern is reasonable that it's confusing an atomic
catalog is just a serializable catalog.
How about just adding more javadocs on the `Catalog` interface to implement
`Serializable` and make the catalog
instances can be de/serialized using Java Serialization in case of
supporting CTAS for the catalog. The planner
should check the serialization for the catalog and throw an instruction for
users on how to adapt the catalog to support
CTAS. In this way, we don't need to introduce a new interface
`AtomicCatalog` or else.


Best,
Jark


On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:

> Hi Martijn,
> Thank you for your reply, these are two good questions.
> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >in the query of the sink table, it will be assumed that the user wants to
> >create a managed table. What will happen if the user doesn't have Table
> >Store configured/installed? Will we throw an error?
>
> If it is a Catalog that does not support managed table and no `connector`
> is specified, then the corresponding TableSink cannot be generated, will
> fail.
>
> If it is a Catalog that supports managed table and no `connector` is
> specified, then it will fail because the table store related configuration
> is not set and there is no table store related jar.
>
>
> >2. Will there be support included for FLIP-190 (version upgrades)?
> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
> scenarios more in Batch mode.
> CTAS atomicity implementation requires serialization support for Catalog
> and hook, which currently cannot be serialized into json, so they cannot be
> supported FLIP-190.
> Non-atomic implementations are able to support FLIP-190.
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
> >Hi Mang,
> >
> >I have two questions/remarks:
> >
> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >in the query of the sink table, it will be assumed that the user wants to
> >create a managed table. What will happen if the user doesn't have Table
> >Store configured/installed? Will we throw an error?
> >
> >2. Will there be support included for FLIP-190 (version upgrades)?
> >
> >Best regards,
> >
> >Martijn
> >
> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
> >
> >> Hi everyone,
> >> Thank you to all those who participated in the discussion, we have
> >> discussed many rounds, the program has been gradually revised and
> improved,
> >> looking forward to further feedback, we will launch a vote in the next
> day
> >> or two.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Mang Zhang
> >>
> >>
> >>
> >>
> >>
> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
> >> >Hi Yuxia,
> >> >Thank you very much for your reply.
> >> >
> >> >
> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> we're not to implement rtas in this FLIP, it may be better not to refer
> it
> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >Currently does not support RTAS because in the stream mode and batch
> mode
> >> semantic unification issues and specific business scenarios are not very
> >> clear, the future we will support, if in support of rtas and then modify
> >> the option name, then it will bring the cost of modifying the
> configuration
> >> to the user.
> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> Could you please explain about it. Some pseudocode will be much better
> if
> >> it's possible. I'm lost in this part.
> >> >
> >> >
> >> >
> >> >
> >> >This part is too much of an implementation detail, and of course we had
> >> to make some changes to achieve this. FLIP focuses on semantic
> consistency
> >> in stream and batch mode, and can provide optional atomicity support.
> >> >
> >> >
> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> naming is to implement atomic for ctas, we propose a interface for
> catalog
> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> the
> >> interface is for the atomic of ctas. But if we want to implement other
> >> features like isolate which may also require serializable catalog in the
> >> future, should we introduce a new interface naming `IsolateCatalog`?
> Have
> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> public interface, maybe we should be careful about the name.
> >> >Regarding the definition of the Catalog name, we have also discussed
> the
> >> name `SerializableCatalog`, which is too specific and does not relate to
> >> the atomic functionality we want to express. CTAS/RTAS want to support
> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
> >> straightforward to understand.
> >> >
> >> >
> >> >Hope this answers your question.
> >> >
> >> >
> >> >
> >> >
> >> >--
> >> >
> >> >Best regards,
> >> >Mang Zhang
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
> >> minor questions:
> >> >>
> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> we're not to implement rtas in this FLIP, it may be better not to refer
> it
> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >>
> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> Could you please explain about it. Some pseudocode will be much better
> if
> >> it's possible.  I'm lost in this part.
> >> >>
> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> naming is to implement atomic for ctas, we propose a interface for
> catalog
> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> the
> >> interface is for the atomic of ctas. But if we want to implement other
> >> features like isolate which may also require serializable catalog in the
> >> future, should we introduce a new interface naming `IsolateCatalog`?
> Have
> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> public interface, maybe we should be careful about the name.
> >> >>
> >> >>
> >> >>Best regards,
> >> >>Yuxia
> >> >>
> >> >>----- 原始邮件 -----
> >> >>发件人: "Mang Zhang" <zh...@163.com>
> >> >>收件人: "dev" <de...@flink.apache.org>
> >> >>抄送: imjark@gmail.com
> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
> clause
> >> in CREATE TABLE(CTAS)
> >> >>
> >> >>Hi Jark,
> >> >>First of all, thank you for your very good advice!
> >> >>The RTAS point you mentioned is a good one, and we should support it
> as
> >> well.
> >> >>However, by investigating the semantics of RTAS and how RTAS is used
> >> within the company, I found that:
> >> >>1. The semantics of RTAS says that if the table exists, need to delete
> >> the old data and use the new data.
> >> >>This semantics is better implemented in Batch mode, for example, if
> the
> >> target table is a Hive table, old data file can be deleted directly.
> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
> >> can't delete the data.
> >> >>So the semantics in streaming and batch scenarios are not well
> >> guaranteed to be consistent.
> >> >>2. I checked the SQL for big data in the company in the last week and
> >> found that RTAS was not used.
> >> >>No users in the company have mentioned the need for RTAS yet. So this
> >> application scenario is not very clear.
> >> >>
> >> >>
> >> >>It is not clear what kind of semantics RTAS should provide in
> streaming
> >> mode, and the user's business scenarios are not very clear.
> >> >>Maybe We don't have to support RTAS soon, but we can leave the
> >> possibility of supporting RTAS in the future in the interface
> definition.
> >> >>What do you think? Looking forward to your response!
> >> >>
> >> >>
> >> >>By the way, the other points raised have been updated. thanks.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>--
> >> >>
> >> >>Best regards,
> >> >>Mang Zhang
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
> >> >>>Thanks for the update, Mang and Ron,
> >> >>>
> >> >>>The new proposal looks good to me in general, especially keeping the
> >> >>>behavior
> >> >>>consistent between batch and streaming mode by default. This is how
> we
> >> do
> >> >>>it
> >> >>>in the previous "table.dml-sync" option on ML [1].
> >> >>>
> >> >>>Besides that, I just have some final minor comments regarding some
> >> >>>interfaces.
> >> >>>
> >> >>>1) table.ctas-or-rtas.atomicity-enabled
> >> >>>The "OR" keyword sounds like this configuration can only take effect
> on
> >> one
> >> >>>of CTAS and RTAS.
> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
> >> >>>
> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
> to
> >> >>>support it.
> >> >>>RTAS is another widely used statement similar to CTAS. It seems
> there is
> >> >>>not much difference
> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
> >> configurations,
> >> >>>is it possible
> >> >>> to support RTAS in this FLIP as well?
> >> >>>
> >> >>>3) connector.type
> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
> >> them
> >> >>>with 'connector'?
> >> >>>
> >> >>>4) SupportsAtomicCatalog
> >> >>>I have some concerns about using "Supports.." prefix which is known
> as
> >> the
> >> >>>ability extension for
> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
> >> enough?
> >> >>>
> >> >>>Best,
> >> >>>Jark
> >> >>>
> >> >>>[1]:
> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
> >> >>>
> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
> >> >>>
> >> >>>> Hi all,
> >> >>>> Thank you to all those who participated in the discussion and made
> >> >>>> suggestions!
> >> >>>> After several rounds of online and offline discussions, the
> solution
> >> in
> >> >>>> FLIP has been updated.
> >> >>>> Looking forward to more feedback from everyone.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Best regards,
> >> >>>> Mang Zhang
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
> >> >>>> >Hi godfrey and ron,
> >> >>>> >Thank you very much for your replies and suggestions.
> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
> >> >>>> >Looking forward to further feedback from others.
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >--
> >> >>>> >
> >> >>>> >Best regards,
> >> >>>> >Mang Zhang
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
> good
> >> to
> >> >>>> me, the FLIP has updated according to your feedback. It will be
> very
> >> good
> >> >>>> if you look at it again。
> >> >>>> >>
> >> >>>> >>Also looking forward to further feedback from others.
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>> -----原始邮件-----
> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> CREATE
> >> >>>> TABLE(CTAS)
> >> >>>> >>>
> >> >>>> >>> Hi all,
> >> >>>> >>>
> >> >>>> >>> Sorry for the late reply.
> >> >>>> >>>
> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
> >> >>>> >>>
> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
> >> serializability
> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
> need
> >> to try
> >> >>>> to serialize the catalog when compile SQL in planner and if it
> fails,
> >> an
> >> >>>> exception will be >thrown to indicate to user that the catalog does
> >> not
> >> >>>> support CTAS because it cannot be serialized.
> >> >>>> >>> This behavior is too cryptic, and will break the current
> catalog
> >> >>>> >>> behavior when using 1.16.
> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
> >> >>>> >>> implements Serializable.
> >> >>>> >>>  The existent catalogs can choose whether implements the new
> >> catalog
> >> >>>> interface.
> >> >>>> >>>
> >> >>>> >>> > Catalog#inferTableOptions
> >> >>>> >>> I strongly recommend not introducing this feature now, because
> the
> >> >>>> >>> behavior is unclear.
> >> >>>> >>> 1) if the catalog support managed table, the connector option
> is
> >> >>>> >>> empty. but if user forget to
> >> >>>> >>> set connector option for CTAS statement, the created table
> will be
> >> >>>> >>> managed table.
> >> >>>> >>> 2) the options and its values for catalog and for connector
> may be
> >> >>>> different,
> >> >>>> >>> so use the catalog option may cause expected errors.
> >> >>>> >>>
> >> >>>> >>> > StreamGraph#addJobStatusHook
> >> >>>> >>> I prefer `registerJobStatusHook`
> >> >>>> >>>
> >> >>>> >>> Best,
> >> >>>> >>> Godfrey
> >> >>>> >>>
> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
> >> >>>> >>> >
> >> >>>> >>> > Hi Yun,
> >> >>>> >>> > Thanks for your reply!
> >> >>>> >>> > Through offline communication with Dalong, I updated the
> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> > --
> >> >>>> >>> >
> >> >>>> >>> > Best regards,
> >> >>>> >>> > Mang Zhang
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
> <yungao.gy@aliyun.com.INVALID
> >> >
> >> >>>> wrote:
> >> >>>> >>> > >Hi,
> >> >>>> >>> > >
> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
> with
> >> >>>> Dalong and Zhu,
> >> >>>> >>> > >we think that listening in the client side might be
> problematic
> >> >>>> since it would exit
> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
> operation
> >> >>>> might need to
> >> >>>> >>> > >be in the JobMaster side.
> >> >>>> >>> > >
> >> >>>> >>> > >For the listener interface, currently JobListener only
> resides
> >> in
> >> >>>> the client side
> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
> >> >>>> scenario, and
> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
> >> JM and
> >> >>>> is not
> >> >>>> >>> > >serializable, thus we tend to add a new interface
> >> JobStatusHook,
> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
> >> >>>> JobMaster.
> >> >>>> >>> > >The interface will also be marked as Internal.
> >> >>>> >>> > >
> >> >>>> >>> > >Best,
> >> >>>> >>> > >Yun
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> >
> >> >------------------------------------------------------------------
> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
> >> >>>> >>> > >To:dev <de...@flink.apache.org>
> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> CREATE
> >> >>>> TABLE(CTAS)
> >> >>>> >>> > >
> >> >>>> >>> > >Hi, Martijn
> >> >>>> >>> > >Thanks for your reply!
> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
> standard.
> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >--
> >> >>>> >>> > >
> >> >>>> >>> > >Best regards,
> >> >>>> >>> > >Mang Zhang
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
> >> martijnvisser@apache.org>
> >> >>>> wrote:
> >> >>>> >>> > >>Hi everyone,
> >> >>>> >>> > >>
> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
> >> >>>> standard?
> >> >>>> >>> > >>
> >> >>>> >>> > >>Best regards,
> >> >>>> >>> > >>
> >> >>>> >>> > >>Martijn Visser
> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
> >> >>>> >>> > >>https://github.com/MartijnVisser
> >> >>>> >>> > >>
> >> >>>> >>> > >>
> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
> >> luoyuxia@alumni.sjtu.edu.cn>
> >> >>>> wrote:
> >> >>>> >>> > >>
> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
> >> feature.
> >> >>>> >>> > >>> About the flip-218, I have some questions.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
> >> schema
> >> >>>> including
> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
> fature
> >> in
> >> >>>> case we want
> >> >>>> >>> > >>> to change the data types in target table instead of
> always
> >> copy
> >> >>>> the source
> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
> support
> >> this
> >> >>>> feature.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
> interface
> >> to
> >> >>>> drop table,
> >> >>>> >>> > >>> so what's the interface will look like?
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> [1]
> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> Best regards,
> >> >>>> >>> > >>> Yuxia
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> ----- 原始邮件 -----
> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> >> >>>> TABLE(CTAS)
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> Hi, everyone
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> I would like to open a discussion for support select
> clause
> >> in
> >> >>>> CREATE
> >> >>>> >>> > >>> TABLE(CTAS),
> >> >>>> >>> > >>> With the development of business and the enhancement of
> >> flink sql
> >> >>>> >>> > >>> capabilities, queries become more and more complex.
> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
> >> create
> >> >>>> the target
> >> >>>> >>> > >>> table first, and then execute the insert statement.
> >> >>>> >>> > >>> However, the target table may have many columns, which
> will
> >> >>>> bring a lot of
> >> >>>> >>> > >>> work outside the business logic to the user.
> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
> >> target
> >> >>>> table is
> >> >>>> >>> > >>> consistent with the schema of the query result.
> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
> facilitate
> >> the
> >> >>>> user.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
> forward to
> >> >>>> your feedback.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> [1]
> >> >>>> >>> > >>>
> >> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> --
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> Best regards,
> >> >>>> >>> > >>> Mang Zhang
> >> >>>> >>> > >>>
> >> >>>> >>> > >
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>------------------------------
> >> >>>> >>Best,
> >> >>>> >>Ron
> >> >>>>
> >>
>

Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi everyone, Thank you to all those who participated in the discussion, the program has been gradually revised and improved, everyone has reached a consensus.
I will relaunch vote soon.







--

Best regards,
Mang Zhang




At 2022-07-05 11:54:07, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:

Thanks for updating.
Now, the FLIP looks good to me.


Best regards,
Yuxia


发件人: "zhangmang1" <zh...@163.com>
收件人: luoyuxia@alumni.sjtu.edu.cn
抄送: "dev" <de...@flink.apache.org>, "Martijn Visser" <ma...@apache.org>, imjark@gmail.com
发送时间: 星期二, 2022年 7 月 05日 上午 11:35:35
主题: Re:Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)



Hi yuxia,
I updated the FLIP and adjusted your concern for RTAS and AtomicCatalog, not sure if it solved your concern, looking forward to your reply!







--

Best regards,
Mang Zhang





At 2022-07-05 11:26:22, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, the FLIP looks good to me now.
>
>Best,
>Jark
>
>On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Jark,
>> Regarding the two issues of concern to yuxia, we did some offline
>> discussions and adjusted the implementation plan.
>>
>> >1) RTAS
>> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>>
>> >2) AtomicCatalog
>>
>> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
>> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>>
>>
>> What do you think? Looking forward to your feedback!
>>
>> --
>>
>> Best regards,
>>
>> Mang Zhang
>>
>>
>>
>> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>> >Hi Mang,
>> >
>> >I'm not sure whether your response has addressed Yuxia's concern or not.
>> >Would be better to receive a confirmation from participants before starting
>> >the vote.
>> >
>> >Actually, I have the same feeling with Yuxia's reply.
>> >
>> >1) RTAS
>> >If it's hard to have a consistent behavior for RTAS between streaming mode
>> >and batch mode,
>> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>> >suitable and may need to
>> >change in the future. If the RTAS will not be supported in this version and
>> >the configuration
>> >may be not suitable in the future, how about removing the "rtas" from the
>> >config? We can
>> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
>> >and still keeps backward compatibility.
>> >
>> >2) AtomicCatalog
>> >We won't add other methods to `AtomicCatalog` in the future, because new
>> >methods required for isolation doesn't
>> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>> >`TransactionalCatalog` or `StagingCalalog`.
>> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
>> >catalog is just a serializable catalog.
>> >How about just adding more javadocs on the `Catalog` interface to implement
>> >`Serializable` and make the catalog
>> >instances can be de/serialized using Java Serialization in case of
>> >supporting CTAS for the catalog. The planner
>> >should check the serialization for the catalog and throw an instruction for
>> >users on how to adapt the catalog to support
>> >CTAS. In this way, we don't need to introduce a new interface
>> >`AtomicCatalog` or else.
>> >
>> >
>> >Best,
>> >Jark
>> >
>> >
>> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>> >
>> >> Hi Martijn,
>> >> Thank you for your reply, these are two good questions.
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >>
>> >> If it is a Catalog that does not support managed table and no `connector`
>> >> is specified, then the corresponding TableSink cannot be generated, will
>> >> fail.
>> >>
>> >> If it is a Catalog that supports managed table and no `connector` is
>> >> specified, then it will fail because the table store related configuration
>> >> is not set and there is no table store related jar.
>> >>
>> >>
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> >> scenarios more in Batch mode.
>> >> CTAS atomicity implementation requires serialization support for Catalog
>> >> and hook, which currently cannot be serialized into json, so they cannot be
>> >> supported FLIP-190.
>> >> Non-atomic implementations are able to support FLIP-190.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >> >Hi Mang,
>> >> >
>> >> >I have two questions/remarks:
>> >> >
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >> >
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> >
>> >> >Best regards,
>> >> >
>> >> >Martijn
>> >> >
>> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >> >
>> >> >> Hi everyone,
>> >> >> Thank you to all those who participated in the discussion, we have
>> >> >> discussed many rounds, the program has been gradually revised and
>> >> improved,
>> >> >> looking forward to further feedback, we will launch a vote in the next
>> >> day
>> >> >> or two.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best regards,
>> >> >> Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >Hi Yuxia,
>> >> >> >Thank you very much for your reply.
>> >> >> >
>> >> >> >
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >Currently does not support RTAS because in the stream mode and batch
>> >> mode
>> >> >> semantic unification issues and specific business scenarios are not very
>> >> >> clear, the future we will support, if in support of rtas and then modify
>> >> >> the option name, then it will bring the cost of modifying the
>> >> configuration
>> >> >> to the user.
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible. I'm lost in this part.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >This part is too much of an implementation detail, and of course we had
>> >> >> to make some changes to achieve this. FLIP focuses on semantic
>> >> consistency
>> >> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >> >
>> >> >> >
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >Regarding the definition of the Catalog name, we have also discussed
>> >> the
>> >> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> >> straightforward to understand.
>> >> >> >
>> >> >> >
>> >> >> >Hope this answers your question.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >
>> >> >> >Best regards,
>> >> >> >Mang Zhang
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> >> minor questions:
>> >> >> >>
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >>
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible.  I'm lost in this part.
>> >> >> >>
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >>
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Yuxia
>> >> >> >>
>> >> >> >>----- 原始邮件 -----
>> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >> >>抄送: imjark@gmail.com
>> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> >> clause
>> >> >> in CREATE TABLE(CTAS)
>> >> >> >>
>> >> >> >>Hi Jark,
>> >> >> >>First of all, thank you for your very good advice!
>> >> >> >>The RTAS point you mentioned is a good one, and we should support it
>> >> as
>> >> >> well.
>> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> >> within the company, I found that:
>> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> >> the old data and use the new data.
>> >> >> >>This semantics is better implemented in Batch mode, for example, if
>> >> the
>> >> >> target table is a Hive table, old data file can be deleted directly.
>> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> >> can't delete the data.
>> >> >> >>So the semantics in streaming and batch scenarios are not well
>> >> >> guaranteed to be consistent.
>> >> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> >> found that RTAS was not used.
>> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> >> application scenario is not very clear.
>> >> >> >>
>> >> >> >>
>> >> >> >>It is not clear what kind of semantics RTAS should provide in
>> >> streaming
>> >> >> mode, and the user's business scenarios are not very clear.
>> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> >> possibility of supporting RTAS in the future in the interface
>> >> definition.
>> >> >> >>What do you think? Looking forward to your response!
>> >> >> >>
>> >> >> >>
>> >> >> >>By the way, the other points raised have been updated. thanks.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>--
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Mang Zhang
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >> >>>Thanks for the update, Mang and Ron,
>> >> >> >>>
>> >> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >> >>>behavior
>> >> >> >>>consistent between batch and streaming mode by default. This is how
>> >> we
>> >> >> do
>> >> >> >>>it
>> >> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >> >>>
>> >> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >> >>>interfaces.
>> >> >> >>>
>> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> >> on
>> >> >> one
>> >> >> >>>of CTAS and RTAS.
>> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >> >>>
>> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> >> to
>> >> >> >>>support it.
>> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> >> there is
>> >> >> >>>not much difference
>> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> >> configurations,
>> >> >> >>>is it possible
>> >> >> >>> to support RTAS in this FLIP as well?
>> >> >> >>>
>> >> >> >>>3) connector.type
>> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> >> them
>> >> >> >>>with 'connector'?
>> >> >> >>>
>> >> >> >>>4) SupportsAtomicCatalog
>> >> >> >>>I have some concerns about using "Supports.." prefix which is known
>> >> as
>> >> >> the
>> >> >> >>>ability extension for
>> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> >> enough?
>> >> >> >>>
>> >> >> >>>Best,
>> >> >> >>>Jark
>> >> >> >>>
>> >> >> >>>[1]:
>> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >> >>>
>> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >> >>>
>> >> >> >>>> Hi all,
>> >> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >> >>>> suggestions!
>> >> >> >>>> After several rounds of online and offline discussions, the
>> >> solution
>> >> >> in
>> >> >> >>>> FLIP has been updated.
>> >> >> >>>> Looking forward to more feedback from everyone.
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>>
>> >> >> >>>> Best regards,
>> >> >> >>>> Mang Zhang
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >>>> >Hi godfrey and ron,
>> >> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >> >>>> >Looking forward to further feedback from others.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >--
>> >> >> >>>> >
>> >> >> >>>> >Best regards,
>> >> >> >>>> >Mang Zhang
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> >> good
>> >> >> to
>> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> >> very
>> >> >> good
>> >> >> >>>> if you look at it again。
>> >> >> >>>> >>
>> >> >> >>>> >>Also looking forward to further feedback from others.
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>> -----原始邮件-----
>> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>>
>> >> >> >>>> >>> Hi all,
>> >> >> >>>> >>>
>> >> >> >>>> >>> Sorry for the late reply.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> >> serializability
>> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> >> need
>> >> >> to try
>> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> >> fails,
>> >> >> an
>> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> >> not
>> >> >> >>>> support CTAS because it cannot be serialized.
>> >> >> >>>> >>> This behavior is too cryptic, and will break the current
>> >> catalog
>> >> >> >>>> >>> behavior when using 1.16.
>> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >> >>>> >>> implements Serializable.
>> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> >> catalog
>> >> >> >>>> interface.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > Catalog#inferTableOptions
>> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> >> the
>> >> >> >>>> >>> behavior is unclear.
>> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> >> is
>> >> >> >>>> >>> empty. but if user forget to
>> >> >> >>>> >>> set connector option for CTAS statement, the created table
>> >> will be
>> >> >> >>>> >>> managed table.
>> >> >> >>>> >>> 2) the options and its values for catalog and for connector
>> >> may be
>> >> >> >>>> different,
>> >> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >> >>>> >>>
>> >> >> >>>> >>> Best,
>> >> >> >>>> >>> Godfrey
>> >> >> >>>> >>>
>> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Hi Yun,
>> >> >> >>>> >>> > Thanks for your reply!
>> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > --
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Best regards,
>> >> >> >>>> >>> > Mang Zhang
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> >> <yungao.gy@aliyun.com.INVALID
>> >> >> >
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >Hi,
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> >> with
>> >> >> >>>> Dalong and Zhu,
>> >> >> >>>> >>> > >we think that listening in the client side might be
>> >> problematic
>> >> >> >>>> since it would exit
>> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> >> operation
>> >> >> >>>> might need to
>> >> >> >>>> >>> > >be in the JobMaster side.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >For the listener interface, currently JobListener only
>> >> resides
>> >> >> in
>> >> >> >>>> the client side
>> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >> >>>> scenario, and
>> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> >> JM and
>> >> >> >>>> is not
>> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> >> JobStatusHook,
>> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >> >>>> JobMaster.
>> >> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best,
>> >> >> >>>> >>> > >Yun
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> >
>> >> >> >------------------------------------------------------------------
>> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Hi, Martijn
>> >> >> >>>> >>> > >Thanks for your reply!
>> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> >> standard.
>> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >--
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best regards,
>> >> >> >>>> >>> > >Mang Zhang
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> >> martijnvisser@apache.org>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>Hi everyone,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >> >>>> standard?
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Best regards,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Martijn Visser
>> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> >> feature.
>> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> >> schema
>> >> >> >>>> including
>> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> >> fature
>> >> >> in
>> >> >> >>>> case we want
>> >> >> >>>> >>> > >>> to change the data types in target table instead of
>> >> always
>> >> >> copy
>> >> >> >>>> the source
>> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> >> support
>> >> >> this
>> >> >> >>>> feature.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> >> interface
>> >> >> to
>> >> >> >>>> drop table,
>> >> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Yuxia
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Hi, everyone
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> I would like to open a discussion for support select
>> >> clause
>> >> >> in
>> >> >> >>>> CREATE
>> >> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> >> flink sql
>> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> >> create
>> >> >> >>>> the target
>> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >> >>>> >>> > >>> However, the target table may have many columns, which
>> >> will
>> >> >> >>>> bring a lot of
>> >> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> >> target
>> >> >> >>>> table is
>> >> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> >> facilitate
>> >> >> the
>> >> >> >>>> user.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> >> forward to
>> >> >> >>>> your feedback.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> >>> > >>>
>> >> >> >>>>
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> --
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Mang Zhang
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>------------------------------
>> >> >> >>>> >>Best,
>> >> >> >>>> >>Ron
>> >> >> >>>>
>> >> >>
>> >>
>>
>>



Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Thanks for updating. 
Now, the FLIP looks good to me. 

Best regards, 
Yuxia 


发件人: "zhangmang1" <zh...@163.com> 
收件人: luoyuxia@alumni.sjtu.edu.cn 
抄送: "dev" <de...@flink.apache.org>, "Martijn Visser" <ma...@apache.org>, imjark@gmail.com 
发送时间: 星期二, 2022年 7 月 05日 上午 11:35:35 
主题: Re:Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS) 

Hi yuxia, 
I updated the FLIP and adjusted your concern for RTAS and AtomicCatalog, not sure if it solved your concern, looking forward to your reply! 








-- 
Best regards, 
Mang Zhang 




At 2022-07-05 11:26:22, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, the FLIP looks good to me now.
>
>Best,
>Jark
>
>On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Jark,
>> Regarding the two issues of concern to yuxia, we did some offline
>> discussions and adjusted the implementation plan.
>>
>> >1) RTAS
>> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>>
>> >2) AtomicCatalog
>>
>> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
>> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>>
>>
>> What do you think? Looking forward to your feedback!
>>
>> --
>>
>> Best regards,
>>
>> Mang Zhang
>>
>>
>>
>> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>> >Hi Mang,
>> >
>> >I'm not sure whether your response has addressed Yuxia's concern or not.
>> >Would be better to receive a confirmation from participants before starting
>> >the vote.
>> >
>> >Actually, I have the same feeling with Yuxia's reply.
>> >
>> >1) RTAS
>> >If it's hard to have a consistent behavior for RTAS between streaming mode
>> >and batch mode,
>> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>> >suitable and may need to
>> >change in the future. If the RTAS will not be supported in this version and
>> >the configuration
>> >may be not suitable in the future, how about removing the "rtas" from the
>> >config? We can
>> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
>> >and still keeps backward compatibility.
>> >
>> >2) AtomicCatalog
>> >We won't add other methods to `AtomicCatalog` in the future, because new
>> >methods required for isolation doesn't
>> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>> >`TransactionalCatalog` or `StagingCalalog`.
>> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
>> >catalog is just a serializable catalog.
>> >How about just adding more javadocs on the `Catalog` interface to implement
>> >`Serializable` and make the catalog
>> >instances can be de/serialized using Java Serialization in case of
>> >supporting CTAS for the catalog. The planner
>> >should check the serialization for the catalog and throw an instruction for
>> >users on how to adapt the catalog to support
>> >CTAS. In this way, we don't need to introduce a new interface
>> >`AtomicCatalog` or else.
>> >
>> >
>> >Best,
>> >Jark
>> >
>> >
>> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>> >
>> >> Hi Martijn,
>> >> Thank you for your reply, these are two good questions.
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >>
>> >> If it is a Catalog that does not support managed table and no `connector`
>> >> is specified, then the corresponding TableSink cannot be generated, will
>> >> fail.
>> >>
>> >> If it is a Catalog that supports managed table and no `connector` is
>> >> specified, then it will fail because the table store related configuration
>> >> is not set and there is no table store related jar.
>> >>
>> >>
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> >> scenarios more in Batch mode.
>> >> CTAS atomicity implementation requires serialization support for Catalog
>> >> and hook, which currently cannot be serialized into json, so they cannot be
>> >> supported FLIP-190.
>> >> Non-atomic implementations are able to support FLIP-190.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >> >Hi Mang,
>> >> >
>> >> >I have two questions/remarks:
>> >> >
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >> >
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> >
>> >> >Best regards,
>> >> >
>> >> >Martijn
>> >> >
>> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >> >
>> >> >> Hi everyone,
>> >> >> Thank you to all those who participated in the discussion, we have
>> >> >> discussed many rounds, the program has been gradually revised and
>> >> improved,
>> >> >> looking forward to further feedback, we will launch a vote in the next
>> >> day
>> >> >> or two.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best regards,
>> >> >> Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >Hi Yuxia,
>> >> >> >Thank you very much for your reply.
>> >> >> >
>> >> >> >
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >Currently does not support RTAS because in the stream mode and batch
>> >> mode
>> >> >> semantic unification issues and specific business scenarios are not very
>> >> >> clear, the future we will support, if in support of rtas and then modify
>> >> >> the option name, then it will bring the cost of modifying the
>> >> configuration
>> >> >> to the user.
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible. I'm lost in this part.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >This part is too much of an implementation detail, and of course we had
>> >> >> to make some changes to achieve this. FLIP focuses on semantic
>> >> consistency
>> >> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >> >
>> >> >> >
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >Regarding the definition of the Catalog name, we have also discussed
>> >> the
>> >> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> >> straightforward to understand.
>> >> >> >
>> >> >> >
>> >> >> >Hope this answers your question.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >
>> >> >> >Best regards,
>> >> >> >Mang Zhang
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> >> minor questions:
>> >> >> >>
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >>
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible.  I'm lost in this part.
>> >> >> >>
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >>
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Yuxia
>> >> >> >>
>> >> >> >>----- 原始邮件 -----
>> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >> >>抄送: imjark@gmail.com
>> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> >> clause
>> >> >> in CREATE TABLE(CTAS)
>> >> >> >>
>> >> >> >>Hi Jark,
>> >> >> >>First of all, thank you for your very good advice!
>> >> >> >>The RTAS point you mentioned is a good one, and we should support it
>> >> as
>> >> >> well.
>> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> >> within the company, I found that:
>> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> >> the old data and use the new data.
>> >> >> >>This semantics is better implemented in Batch mode, for example, if
>> >> the
>> >> >> target table is a Hive table, old data file can be deleted directly.
>> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> >> can't delete the data.
>> >> >> >>So the semantics in streaming and batch scenarios are not well
>> >> >> guaranteed to be consistent.
>> >> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> >> found that RTAS was not used.
>> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> >> application scenario is not very clear.
>> >> >> >>
>> >> >> >>
>> >> >> >>It is not clear what kind of semantics RTAS should provide in
>> >> streaming
>> >> >> mode, and the user's business scenarios are not very clear.
>> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> >> possibility of supporting RTAS in the future in the interface
>> >> definition.
>> >> >> >>What do you think? Looking forward to your response!
>> >> >> >>
>> >> >> >>
>> >> >> >>By the way, the other points raised have been updated. thanks.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>--
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Mang Zhang
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >> >>>Thanks for the update, Mang and Ron,
>> >> >> >>>
>> >> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >> >>>behavior
>> >> >> >>>consistent between batch and streaming mode by default. This is how
>> >> we
>> >> >> do
>> >> >> >>>it
>> >> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >> >>>
>> >> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >> >>>interfaces.
>> >> >> >>>
>> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> >> on
>> >> >> one
>> >> >> >>>of CTAS and RTAS.
>> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >> >>>
>> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> >> to
>> >> >> >>>support it.
>> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> >> there is
>> >> >> >>>not much difference
>> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> >> configurations,
>> >> >> >>>is it possible
>> >> >> >>> to support RTAS in this FLIP as well?
>> >> >> >>>
>> >> >> >>>3) connector.type
>> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> >> them
>> >> >> >>>with 'connector'?
>> >> >> >>>
>> >> >> >>>4) SupportsAtomicCatalog
>> >> >> >>>I have some concerns about using "Supports.." prefix which is known
>> >> as
>> >> >> the
>> >> >> >>>ability extension for
>> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> >> enough?
>> >> >> >>>
>> >> >> >>>Best,
>> >> >> >>>Jark
>> >> >> >>>
>> >> >> >>>[1]:
>> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >> >>>
>> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >> >>>
>> >> >> >>>> Hi all,
>> >> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >> >>>> suggestions!
>> >> >> >>>> After several rounds of online and offline discussions, the
>> >> solution
>> >> >> in
>> >> >> >>>> FLIP has been updated.
>> >> >> >>>> Looking forward to more feedback from everyone.
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>>
>> >> >> >>>> Best regards,
>> >> >> >>>> Mang Zhang
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >>>> >Hi godfrey and ron,
>> >> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >> >>>> >Looking forward to further feedback from others.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >--
>> >> >> >>>> >
>> >> >> >>>> >Best regards,
>> >> >> >>>> >Mang Zhang
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> >> good
>> >> >> to
>> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> >> very
>> >> >> good
>> >> >> >>>> if you look at it again。
>> >> >> >>>> >>
>> >> >> >>>> >>Also looking forward to further feedback from others.
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>> -----原始邮件-----
>> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>>
>> >> >> >>>> >>> Hi all,
>> >> >> >>>> >>>
>> >> >> >>>> >>> Sorry for the late reply.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> >> serializability
>> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> >> need
>> >> >> to try
>> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> >> fails,
>> >> >> an
>> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> >> not
>> >> >> >>>> support CTAS because it cannot be serialized.
>> >> >> >>>> >>> This behavior is too cryptic, and will break the current
>> >> catalog
>> >> >> >>>> >>> behavior when using 1.16.
>> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >> >>>> >>> implements Serializable.
>> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> >> catalog
>> >> >> >>>> interface.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > Catalog#inferTableOptions
>> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> >> the
>> >> >> >>>> >>> behavior is unclear.
>> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> >> is
>> >> >> >>>> >>> empty. but if user forget to
>> >> >> >>>> >>> set connector option for CTAS statement, the created table
>> >> will be
>> >> >> >>>> >>> managed table.
>> >> >> >>>> >>> 2) the options and its values for catalog and for connector
>> >> may be
>> >> >> >>>> different,
>> >> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >> >>>> >>>
>> >> >> >>>> >>> Best,
>> >> >> >>>> >>> Godfrey
>> >> >> >>>> >>>
>> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Hi Yun,
>> >> >> >>>> >>> > Thanks for your reply!
>> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > --
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Best regards,
>> >> >> >>>> >>> > Mang Zhang
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> >> <yungao.gy@aliyun.com.INVALID
>> >> >> >
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >Hi,
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> >> with
>> >> >> >>>> Dalong and Zhu,
>> >> >> >>>> >>> > >we think that listening in the client side might be
>> >> problematic
>> >> >> >>>> since it would exit
>> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> >> operation
>> >> >> >>>> might need to
>> >> >> >>>> >>> > >be in the JobMaster side.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >For the listener interface, currently JobListener only
>> >> resides
>> >> >> in
>> >> >> >>>> the client side
>> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >> >>>> scenario, and
>> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> >> JM and
>> >> >> >>>> is not
>> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> >> JobStatusHook,
>> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >> >>>> JobMaster.
>> >> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best,
>> >> >> >>>> >>> > >Yun
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> >
>> >> >> >------------------------------------------------------------------
>> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Hi, Martijn
>> >> >> >>>> >>> > >Thanks for your reply!
>> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> >> standard.
>> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >--
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best regards,
>> >> >> >>>> >>> > >Mang Zhang
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> >> martijnvisser@apache.org>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>Hi everyone,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >> >>>> standard?
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Best regards,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Martijn Visser
>> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> >> feature.
>> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> >> schema
>> >> >> >>>> including
>> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> >> fature
>> >> >> in
>> >> >> >>>> case we want
>> >> >> >>>> >>> > >>> to change the data types in target table instead of
>> >> always
>> >> >> copy
>> >> >> >>>> the source
>> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> >> support
>> >> >> this
>> >> >> >>>> feature.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> >> interface
>> >> >> to
>> >> >> >>>> drop table,
>> >> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Yuxia
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Hi, everyone
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> I would like to open a discussion for support select
>> >> clause
>> >> >> in
>> >> >> >>>> CREATE
>> >> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> >> flink sql
>> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> >> create
>> >> >> >>>> the target
>> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >> >>>> >>> > >>> However, the target table may have many columns, which
>> >> will
>> >> >> >>>> bring a lot of
>> >> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> >> target
>> >> >> >>>> table is
>> >> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> >> facilitate
>> >> >> the
>> >> >> >>>> user.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> >> forward to
>> >> >> >>>> your feedback.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> >>> > >>>
>> >> >> >>>>
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> --
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Mang Zhang
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>------------------------------
>> >> >> >>>> >>Best,
>> >> >> >>>> >>Ron
>> >> >> >>>>
>> >> >>
>> >>
>>
>> 


Re:Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi yuxia,
I updated the FLIP and adjusted your concern for RTAS and AtomicCatalog, not sure if it solved your concern, looking forward to your reply!







--

Best regards,
Mang Zhang





At 2022-07-05 11:26:22, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, the FLIP looks good to me now.
>
>Best,
>Jark
>
>On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Jark,
>> Regarding the two issues of concern to yuxia, we did some offline
>> discussions and adjusted the implementation plan.
>>
>> >1) RTAS
>> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>>
>> >2) AtomicCatalog
>>
>> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
>> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>>
>>
>> What do you think? Looking forward to your feedback!
>>
>> --
>>
>> Best regards,
>>
>> Mang Zhang
>>
>>
>>
>> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>> >Hi Mang,
>> >
>> >I'm not sure whether your response has addressed Yuxia's concern or not.
>> >Would be better to receive a confirmation from participants before starting
>> >the vote.
>> >
>> >Actually, I have the same feeling with Yuxia's reply.
>> >
>> >1) RTAS
>> >If it's hard to have a consistent behavior for RTAS between streaming mode
>> >and batch mode,
>> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>> >suitable and may need to
>> >change in the future. If the RTAS will not be supported in this version and
>> >the configuration
>> >may be not suitable in the future, how about removing the "rtas" from the
>> >config? We can
>> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
>> >and still keeps backward compatibility.
>> >
>> >2) AtomicCatalog
>> >We won't add other methods to `AtomicCatalog` in the future, because new
>> >methods required for isolation doesn't
>> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>> >`TransactionalCatalog` or `StagingCalalog`.
>> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
>> >catalog is just a serializable catalog.
>> >How about just adding more javadocs on the `Catalog` interface to implement
>> >`Serializable` and make the catalog
>> >instances can be de/serialized using Java Serialization in case of
>> >supporting CTAS for the catalog. The planner
>> >should check the serialization for the catalog and throw an instruction for
>> >users on how to adapt the catalog to support
>> >CTAS. In this way, we don't need to introduce a new interface
>> >`AtomicCatalog` or else.
>> >
>> >
>> >Best,
>> >Jark
>> >
>> >
>> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>> >
>> >> Hi Martijn,
>> >> Thank you for your reply, these are two good questions.
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >>
>> >> If it is a Catalog that does not support managed table and no `connector`
>> >> is specified, then the corresponding TableSink cannot be generated, will
>> >> fail.
>> >>
>> >> If it is a Catalog that supports managed table and no `connector` is
>> >> specified, then it will fail because the table store related configuration
>> >> is not set and there is no table store related jar.
>> >>
>> >>
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> >> scenarios more in Batch mode.
>> >> CTAS atomicity implementation requires serialization support for Catalog
>> >> and hook, which currently cannot be serialized into json, so they cannot be
>> >> supported FLIP-190.
>> >> Non-atomic implementations are able to support FLIP-190.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >> >Hi Mang,
>> >> >
>> >> >I have two questions/remarks:
>> >> >
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >> >
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> >
>> >> >Best regards,
>> >> >
>> >> >Martijn
>> >> >
>> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >> >
>> >> >> Hi everyone,
>> >> >> Thank you to all those who participated in the discussion, we have
>> >> >> discussed many rounds, the program has been gradually revised and
>> >> improved,
>> >> >> looking forward to further feedback, we will launch a vote in the next
>> >> day
>> >> >> or two.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best regards,
>> >> >> Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >Hi Yuxia,
>> >> >> >Thank you very much for your reply.
>> >> >> >
>> >> >> >
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >Currently does not support RTAS because in the stream mode and batch
>> >> mode
>> >> >> semantic unification issues and specific business scenarios are not very
>> >> >> clear, the future we will support, if in support of rtas and then modify
>> >> >> the option name, then it will bring the cost of modifying the
>> >> configuration
>> >> >> to the user.
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible. I'm lost in this part.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >This part is too much of an implementation detail, and of course we had
>> >> >> to make some changes to achieve this. FLIP focuses on semantic
>> >> consistency
>> >> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >> >
>> >> >> >
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >Regarding the definition of the Catalog name, we have also discussed
>> >> the
>> >> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> >> straightforward to understand.
>> >> >> >
>> >> >> >
>> >> >> >Hope this answers your question.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >
>> >> >> >Best regards,
>> >> >> >Mang Zhang
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> >> minor questions:
>> >> >> >>
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >>
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible.  I'm lost in this part.
>> >> >> >>
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >>
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Yuxia
>> >> >> >>
>> >> >> >>----- 原始邮件 -----
>> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >> >>抄送: imjark@gmail.com
>> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> >> clause
>> >> >> in CREATE TABLE(CTAS)
>> >> >> >>
>> >> >> >>Hi Jark,
>> >> >> >>First of all, thank you for your very good advice!
>> >> >> >>The RTAS point you mentioned is a good one, and we should support it
>> >> as
>> >> >> well.
>> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> >> within the company, I found that:
>> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> >> the old data and use the new data.
>> >> >> >>This semantics is better implemented in Batch mode, for example, if
>> >> the
>> >> >> target table is a Hive table, old data file can be deleted directly.
>> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> >> can't delete the data.
>> >> >> >>So the semantics in streaming and batch scenarios are not well
>> >> >> guaranteed to be consistent.
>> >> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> >> found that RTAS was not used.
>> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> >> application scenario is not very clear.
>> >> >> >>
>> >> >> >>
>> >> >> >>It is not clear what kind of semantics RTAS should provide in
>> >> streaming
>> >> >> mode, and the user's business scenarios are not very clear.
>> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> >> possibility of supporting RTAS in the future in the interface
>> >> definition.
>> >> >> >>What do you think? Looking forward to your response!
>> >> >> >>
>> >> >> >>
>> >> >> >>By the way, the other points raised have been updated. thanks.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>--
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Mang Zhang
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >> >>>Thanks for the update, Mang and Ron,
>> >> >> >>>
>> >> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >> >>>behavior
>> >> >> >>>consistent between batch and streaming mode by default. This is how
>> >> we
>> >> >> do
>> >> >> >>>it
>> >> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >> >>>
>> >> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >> >>>interfaces.
>> >> >> >>>
>> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> >> on
>> >> >> one
>> >> >> >>>of CTAS and RTAS.
>> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >> >>>
>> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> >> to
>> >> >> >>>support it.
>> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> >> there is
>> >> >> >>>not much difference
>> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> >> configurations,
>> >> >> >>>is it possible
>> >> >> >>> to support RTAS in this FLIP as well?
>> >> >> >>>
>> >> >> >>>3) connector.type
>> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> >> them
>> >> >> >>>with 'connector'?
>> >> >> >>>
>> >> >> >>>4) SupportsAtomicCatalog
>> >> >> >>>I have some concerns about using "Supports.." prefix which is known
>> >> as
>> >> >> the
>> >> >> >>>ability extension for
>> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> >> enough?
>> >> >> >>>
>> >> >> >>>Best,
>> >> >> >>>Jark
>> >> >> >>>
>> >> >> >>>[1]:
>> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >> >>>
>> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >> >>>
>> >> >> >>>> Hi all,
>> >> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >> >>>> suggestions!
>> >> >> >>>> After several rounds of online and offline discussions, the
>> >> solution
>> >> >> in
>> >> >> >>>> FLIP has been updated.
>> >> >> >>>> Looking forward to more feedback from everyone.
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>>
>> >> >> >>>> Best regards,
>> >> >> >>>> Mang Zhang
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >>>> >Hi godfrey and ron,
>> >> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >> >>>> >Looking forward to further feedback from others.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >--
>> >> >> >>>> >
>> >> >> >>>> >Best regards,
>> >> >> >>>> >Mang Zhang
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> >> good
>> >> >> to
>> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> >> very
>> >> >> good
>> >> >> >>>> if you look at it again。
>> >> >> >>>> >>
>> >> >> >>>> >>Also looking forward to further feedback from others.
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>> -----原始邮件-----
>> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>>
>> >> >> >>>> >>> Hi all,
>> >> >> >>>> >>>
>> >> >> >>>> >>> Sorry for the late reply.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> >> serializability
>> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> >> need
>> >> >> to try
>> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> >> fails,
>> >> >> an
>> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> >> not
>> >> >> >>>> support CTAS because it cannot be serialized.
>> >> >> >>>> >>> This behavior is too cryptic, and will break the current
>> >> catalog
>> >> >> >>>> >>> behavior when using 1.16.
>> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >> >>>> >>> implements Serializable.
>> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> >> catalog
>> >> >> >>>> interface.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > Catalog#inferTableOptions
>> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> >> the
>> >> >> >>>> >>> behavior is unclear.
>> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> >> is
>> >> >> >>>> >>> empty. but if user forget to
>> >> >> >>>> >>> set connector option for CTAS statement, the created table
>> >> will be
>> >> >> >>>> >>> managed table.
>> >> >> >>>> >>> 2) the options and its values for catalog and for connector
>> >> may be
>> >> >> >>>> different,
>> >> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >> >>>> >>>
>> >> >> >>>> >>> Best,
>> >> >> >>>> >>> Godfrey
>> >> >> >>>> >>>
>> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Hi Yun,
>> >> >> >>>> >>> > Thanks for your reply!
>> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > --
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Best regards,
>> >> >> >>>> >>> > Mang Zhang
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> >> <yungao.gy@aliyun.com.INVALID
>> >> >> >
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >Hi,
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> >> with
>> >> >> >>>> Dalong and Zhu,
>> >> >> >>>> >>> > >we think that listening in the client side might be
>> >> problematic
>> >> >> >>>> since it would exit
>> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> >> operation
>> >> >> >>>> might need to
>> >> >> >>>> >>> > >be in the JobMaster side.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >For the listener interface, currently JobListener only
>> >> resides
>> >> >> in
>> >> >> >>>> the client side
>> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >> >>>> scenario, and
>> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> >> JM and
>> >> >> >>>> is not
>> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> >> JobStatusHook,
>> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >> >>>> JobMaster.
>> >> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best,
>> >> >> >>>> >>> > >Yun
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> >
>> >> >> >------------------------------------------------------------------
>> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Hi, Martijn
>> >> >> >>>> >>> > >Thanks for your reply!
>> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> >> standard.
>> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >--
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best regards,
>> >> >> >>>> >>> > >Mang Zhang
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> >> martijnvisser@apache.org>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>Hi everyone,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >> >>>> standard?
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Best regards,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Martijn Visser
>> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> >> feature.
>> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> >> schema
>> >> >> >>>> including
>> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> >> fature
>> >> >> in
>> >> >> >>>> case we want
>> >> >> >>>> >>> > >>> to change the data types in target table instead of
>> >> always
>> >> >> copy
>> >> >> >>>> the source
>> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> >> support
>> >> >> this
>> >> >> >>>> feature.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> >> interface
>> >> >> to
>> >> >> >>>> drop table,
>> >> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Yuxia
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Hi, everyone
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> I would like to open a discussion for support select
>> >> clause
>> >> >> in
>> >> >> >>>> CREATE
>> >> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> >> flink sql
>> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> >> create
>> >> >> >>>> the target
>> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >> >>>> >>> > >>> However, the target table may have many columns, which
>> >> will
>> >> >> >>>> bring a lot of
>> >> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> >> target
>> >> >> >>>> table is
>> >> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> >> facilitate
>> >> >> the
>> >> >> >>>> user.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> >> forward to
>> >> >> >>>> your feedback.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> >>> > >>>
>> >> >> >>>>
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> --
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Mang Zhang
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>------------------------------
>> >> >> >>>> >>Best,
>> >> >> >>>> >>Ron
>> >> >> >>>>
>> >> >>
>> >>
>>
>>

Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Jark Wu <im...@gmail.com>.
Thanks for the update, the FLIP looks good to me now.

Best,
Jark

On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:

> Hi Jark,
> Regarding the two issues of concern to yuxia, we did some offline
> discussions and adjusted the implementation plan.
>
> >1) RTAS
> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>
> >2) AtomicCatalog
>
> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>
>
> What do you think? Looking forward to your feedback!
>
> --
>
> Best regards,
>
> Mang Zhang
>
>
>
> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
> >Hi Mang,
> >
> >I'm not sure whether your response has addressed Yuxia's concern or not.
> >Would be better to receive a confirmation from participants before starting
> >the vote.
> >
> >Actually, I have the same feeling with Yuxia's reply.
> >
> >1) RTAS
> >If it's hard to have a consistent behavior for RTAS between streaming mode
> >and batch mode,
> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
> >suitable and may need to
> >change in the future. If the RTAS will not be supported in this version and
> >the configuration
> >may be not suitable in the future, how about removing the "rtas" from the
> >config? We can
> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
> >and still keeps backward compatibility.
> >
> >2) AtomicCatalog
> >We won't add other methods to `AtomicCatalog` in the future, because new
> >methods required for isolation doesn't
> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
> >`TransactionalCatalog` or `StagingCalalog`.
> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
> >catalog is just a serializable catalog.
> >How about just adding more javadocs on the `Catalog` interface to implement
> >`Serializable` and make the catalog
> >instances can be de/serialized using Java Serialization in case of
> >supporting CTAS for the catalog. The planner
> >should check the serialization for the catalog and throw an instruction for
> >users on how to adapt the catalog to support
> >CTAS. In this way, we don't need to introduce a new interface
> >`AtomicCatalog` or else.
> >
> >
> >Best,
> >Jark
> >
> >
> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
> >
> >> Hi Martijn,
> >> Thank you for your reply, these are two good questions.
> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >> >in the query of the sink table, it will be assumed that the user wants to
> >> >create a managed table. What will happen if the user doesn't have Table
> >> >Store configured/installed? Will we throw an error?
> >>
> >> If it is a Catalog that does not support managed table and no `connector`
> >> is specified, then the corresponding TableSink cannot be generated, will
> >> fail.
> >>
> >> If it is a Catalog that supports managed table and no `connector` is
> >> specified, then it will fail because the table store related configuration
> >> is not set and there is no table store related jar.
> >>
> >>
> >> >2. Will there be support included for FLIP-190 (version upgrades)?
> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
> >> scenarios more in Batch mode.
> >> CTAS atomicity implementation requires serialization support for Catalog
> >> and hook, which currently cannot be serialized into json, so they cannot be
> >> supported FLIP-190.
> >> Non-atomic implementations are able to support FLIP-190.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Mang Zhang
> >>
> >>
> >>
> >>
> >>
> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
> >> >Hi Mang,
> >> >
> >> >I have two questions/remarks:
> >> >
> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >> >in the query of the sink table, it will be assumed that the user wants to
> >> >create a managed table. What will happen if the user doesn't have Table
> >> >Store configured/installed? Will we throw an error?
> >> >
> >> >2. Will there be support included for FLIP-190 (version upgrades)?
> >> >
> >> >Best regards,
> >> >
> >> >Martijn
> >> >
> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
> >> >
> >> >> Hi everyone,
> >> >> Thank you to all those who participated in the discussion, we have
> >> >> discussed many rounds, the program has been gradually revised and
> >> improved,
> >> >> looking forward to further feedback, we will launch a vote in the next
> >> day
> >> >> or two.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Best regards,
> >> >> Mang Zhang
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
> >> >> >Hi Yuxia,
> >> >> >Thank you very much for your reply.
> >> >> >
> >> >> >
> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
> >> it
> >> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >> >Currently does not support RTAS because in the stream mode and batch
> >> mode
> >> >> semantic unification issues and specific business scenarios are not very
> >> >> clear, the future we will support, if in support of rtas and then modify
> >> >> the option name, then it will bring the cost of modifying the
> >> configuration
> >> >> to the user.
> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> >> Could you please explain about it. Some pseudocode will be much better
> >> if
> >> >> it's possible. I'm lost in this part.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >This part is too much of an implementation detail, and of course we had
> >> >> to make some changes to achieve this. FLIP focuses on semantic
> >> consistency
> >> >> in stream and batch mode, and can provide optional atomicity support.
> >> >> >
> >> >> >
> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> >> naming is to implement atomic for ctas, we propose a interface for
> >> catalog
> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> >> the
> >> >> interface is for the atomic of ctas. But if we want to implement other
> >> >> features like isolate which may also require serializable catalog in the
> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
> >> Have
> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> >> public interface, maybe we should be careful about the name.
> >> >> >Regarding the definition of the Catalog name, we have also discussed
> >> the
> >> >> name `SerializableCatalog`, which is too specific and does not relate to
> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
> >> >> straightforward to understand.
> >> >> >
> >> >> >
> >> >> >Hope this answers your question.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >--
> >> >> >
> >> >> >Best regards,
> >> >> >Mang Zhang
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
> >> >> minor questions:
> >> >> >>
> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
> >> it
> >> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >> >>
> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> >> Could you please explain about it. Some pseudocode will be much better
> >> if
> >> >> it's possible.  I'm lost in this part.
> >> >> >>
> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> >> naming is to implement atomic for ctas, we propose a interface for
> >> catalog
> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> >> the
> >> >> interface is for the atomic of ctas. But if we want to implement other
> >> >> features like isolate which may also require serializable catalog in the
> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
> >> Have
> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> >> public interface, maybe we should be careful about the name.
> >> >> >>
> >> >> >>
> >> >> >>Best regards,
> >> >> >>Yuxia
> >> >> >>
> >> >> >>----- 原始邮件 -----
> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
> >> >> >>收件人: "dev" <de...@flink.apache.org>
> >> >> >>抄送: imjark@gmail.com
> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
> >> clause
> >> >> in CREATE TABLE(CTAS)
> >> >> >>
> >> >> >>Hi Jark,
> >> >> >>First of all, thank you for your very good advice!
> >> >> >>The RTAS point you mentioned is a good one, and we should support it
> >> as
> >> >> well.
> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
> >> >> within the company, I found that:
> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
> >> >> the old data and use the new data.
> >> >> >>This semantics is better implemented in Batch mode, for example, if
> >> the
> >> >> target table is a Hive table, old data file can be deleted directly.
> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
> >> >> can't delete the data.
> >> >> >>So the semantics in streaming and batch scenarios are not well
> >> >> guaranteed to be consistent.
> >> >> >>2. I checked the SQL for big data in the company in the last week and
> >> >> found that RTAS was not used.
> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
> >> >> application scenario is not very clear.
> >> >> >>
> >> >> >>
> >> >> >>It is not clear what kind of semantics RTAS should provide in
> >> streaming
> >> >> mode, and the user's business scenarios are not very clear.
> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
> >> >> possibility of supporting RTAS in the future in the interface
> >> definition.
> >> >> >>What do you think? Looking forward to your response!
> >> >> >>
> >> >> >>
> >> >> >>By the way, the other points raised have been updated. thanks.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>--
> >> >> >>
> >> >> >>Best regards,
> >> >> >>Mang Zhang
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
> >> >> >>>Thanks for the update, Mang and Ron,
> >> >> >>>
> >> >> >>>The new proposal looks good to me in general, especially keeping the
> >> >> >>>behavior
> >> >> >>>consistent between batch and streaming mode by default. This is how
> >> we
> >> >> do
> >> >> >>>it
> >> >> >>>in the previous "table.dml-sync" option on ML [1].
> >> >> >>>
> >> >> >>>Besides that, I just have some final minor comments regarding some
> >> >> >>>interfaces.
> >> >> >>>
> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
> >> on
> >> >> one
> >> >> >>>of CTAS and RTAS.
> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
> >> >> >>>
> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
> >> to
> >> >> >>>support it.
> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
> >> there is
> >> >> >>>not much difference
> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
> >> >> configurations,
> >> >> >>>is it possible
> >> >> >>> to support RTAS in this FLIP as well?
> >> >> >>>
> >> >> >>>3) connector.type
> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
> >> >> them
> >> >> >>>with 'connector'?
> >> >> >>>
> >> >> >>>4) SupportsAtomicCatalog
> >> >> >>>I have some concerns about using "Supports.." prefix which is known
> >> as
> >> >> the
> >> >> >>>ability extension for
> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
> >> >> enough?
> >> >> >>>
> >> >> >>>Best,
> >> >> >>>Jark
> >> >> >>>
> >> >> >>>[1]:
> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
> >> >> >>>
> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
> >> >> >>>
> >> >> >>>> Hi all,
> >> >> >>>> Thank you to all those who participated in the discussion and made
> >> >> >>>> suggestions!
> >> >> >>>> After several rounds of online and offline discussions, the
> >> solution
> >> >> in
> >> >> >>>> FLIP has been updated.
> >> >> >>>> Looking forward to more feedback from everyone.
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> --
> >> >> >>>>
> >> >> >>>> Best regards,
> >> >> >>>> Mang Zhang
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
> >> >> >>>> >Hi godfrey and ron,
> >> >> >>>> >Thank you very much for your replies and suggestions.
> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
> >> >> >>>> >Looking forward to further feedback from others.
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >--
> >> >> >>>> >
> >> >> >>>> >Best regards,
> >> >> >>>> >Mang Zhang
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
> >> good
> >> >> to
> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
> >> very
> >> >> good
> >> >> >>>> if you look at it again。
> >> >> >>>> >>
> >> >> >>>> >>Also looking forward to further feedback from others.
> >> >> >>>> >>
> >> >> >>>> >>
> >> >> >>>> >>> -----原始邮件-----
> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> >> CREATE
> >> >> >>>> TABLE(CTAS)
> >> >> >>>> >>>
> >> >> >>>> >>> Hi all,
> >> >> >>>> >>>
> >> >> >>>> >>> Sorry for the late reply.
> >> >> >>>> >>>
> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
> >> >> >>>> >>>
> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
> >> >> serializability
> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
> >> need
> >> >> to try
> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
> >> fails,
> >> >> an
> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
> >> >> not
> >> >> >>>> support CTAS because it cannot be serialized.
> >> >> >>>> >>> This behavior is too cryptic, and will break the current
> >> catalog
> >> >> >>>> >>> behavior when using 1.16.
> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
> >> >> >>>> >>> implements Serializable.
> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
> >> >> catalog
> >> >> >>>> interface.
> >> >> >>>> >>>
> >> >> >>>> >>> > Catalog#inferTableOptions
> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
> >> the
> >> >> >>>> >>> behavior is unclear.
> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
> >> is
> >> >> >>>> >>> empty. but if user forget to
> >> >> >>>> >>> set connector option for CTAS statement, the created table
> >> will be
> >> >> >>>> >>> managed table.
> >> >> >>>> >>> 2) the options and its values for catalog and for connector
> >> may be
> >> >> >>>> different,
> >> >> >>>> >>> so use the catalog option may cause expected errors.
> >> >> >>>> >>>
> >> >> >>>> >>> > StreamGraph#addJobStatusHook
> >> >> >>>> >>> I prefer `registerJobStatusHook`
> >> >> >>>> >>>
> >> >> >>>> >>> Best,
> >> >> >>>> >>> Godfrey
> >> >> >>>> >>>
> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
> >> >> >>>> >>> >
> >> >> >>>> >>> > Hi Yun,
> >> >> >>>> >>> > Thanks for your reply!
> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> > --
> >> >> >>>> >>> >
> >> >> >>>> >>> > Best regards,
> >> >> >>>> >>> > Mang Zhang
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
> >> <yungao.gy@aliyun.com.INVALID
> >> >> >
> >> >> >>>> wrote:
> >> >> >>>> >>> > >Hi,
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
> >> with
> >> >> >>>> Dalong and Zhu,
> >> >> >>>> >>> > >we think that listening in the client side might be
> >> problematic
> >> >> >>>> since it would exit
> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
> >> operation
> >> >> >>>> might need to
> >> >> >>>> >>> > >be in the JobMaster side.
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >For the listener interface, currently JobListener only
> >> resides
> >> >> in
> >> >> >>>> the client side
> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
> >> >> >>>> scenario, and
> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
> >> >> JM and
> >> >> >>>> is not
> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
> >> >> JobStatusHook,
> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
> >> >> >>>> JobMaster.
> >> >> >>>> >>> > >The interface will also be marked as Internal.
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Best,
> >> >> >>>> >>> > >Yun
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> >
> >> >> >------------------------------------------------------------------
> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> >> CREATE
> >> >> >>>> TABLE(CTAS)
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Hi, Martijn
> >> >> >>>> >>> > >Thanks for your reply!
> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
> >> standard.
> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >--
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Best regards,
> >> >> >>>> >>> > >Mang Zhang
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
> >> >> martijnvisser@apache.org>
> >> >> >>>> wrote:
> >> >> >>>> >>> > >>Hi everyone,
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
> >> >> >>>> standard?
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>Best regards,
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>Martijn Visser
> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
> >> >> >>>> >>> > >>https://github.com/MartijnVisser
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
> >> >> luoyuxia@alumni.sjtu.edu.cn>
> >> >> >>>> wrote:
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
> >> >> feature.
> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
> >> >> schema
> >> >> >>>> including
> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
> >> fature
> >> >> in
> >> >> >>>> case we want
> >> >> >>>> >>> > >>> to change the data types in target table instead of
> >> always
> >> >> copy
> >> >> >>>> the source
> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
> >> support
> >> >> this
> >> >> >>>> feature.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
> >> interface
> >> >> to
> >> >> >>>> drop table,
> >> >> >>>> >>> > >>> so what's the interface will look like?
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> [1]
> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> Best regards,
> >> >> >>>> >>> > >>> Yuxia
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> ----- 原始邮件 -----
> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> >> >> >>>> TABLE(CTAS)
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> Hi, everyone
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> I would like to open a discussion for support select
> >> clause
> >> >> in
> >> >> >>>> CREATE
> >> >> >>>> >>> > >>> TABLE(CTAS),
> >> >> >>>> >>> > >>> With the development of business and the enhancement of
> >> >> flink sql
> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
> >> >> create
> >> >> >>>> the target
> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
> >> >> >>>> >>> > >>> However, the target table may have many columns, which
> >> will
> >> >> >>>> bring a lot of
> >> >> >>>> >>> > >>> work outside the business logic to the user.
> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
> >> >> target
> >> >> >>>> table is
> >> >> >>>> >>> > >>> consistent with the schema of the query result.
> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
> >> facilitate
> >> >> the
> >> >> >>>> user.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
> >> forward to
> >> >> >>>> your feedback.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> [1]
> >> >> >>>> >>> > >>>
> >> >> >>>>
> >> >>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> --
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> Best regards,
> >> >> >>>> >>> > >>> Mang Zhang
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >
> >> >> >>>> >>
> >> >> >>>> >>
> >> >> >>>> >>------------------------------
> >> >> >>>> >>Best,
> >> >> >>>> >>Ron
> >> >> >>>>
> >> >>
> >>
>
>

Re:Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi Jark,
Regarding the two issues of concern to yuxia, we did some offline discussions and adjusted the implementation plan.


>1) RTAS

RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.


>2) AtomicCatalog
AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.


What do you think? Looking forward to your feedback!

--

Best regards,
Mang Zhang





At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>Hi Mang,
>
>I'm not sure whether your response has addressed Yuxia's concern or not.
>Would be better to receive a confirmation from participants before starting
>the vote.
>
>Actually, I have the same feeling with Yuxia's reply.
>
>1) RTAS
>If it's hard to have a consistent behavior for RTAS between streaming mode
>and batch mode,
>it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>suitable and may need to
>change in the future. If the RTAS will not be supported in this version and
>the configuration
>may be not suitable in the future, how about removing the "rtas" from the
>config? We can
>still evolve the config to "table.ctas-rtas" if the semantics are the same,
>and still keeps backward compatibility.
>
>2) AtomicCatalog
>We won't add other methods to `AtomicCatalog` in the future, because new
>methods required for isolation doesn't
>belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>`TransactionalCatalog` or `StagingCalalog`.
>So, I think Yuxia's concern is reasonable that it's confusing an atomic
>catalog is just a serializable catalog.
>How about just adding more javadocs on the `Catalog` interface to implement
>`Serializable` and make the catalog
>instances can be de/serialized using Java Serialization in case of
>supporting CTAS for the catalog. The planner
>should check the serialization for the catalog and throw an instruction for
>users on how to adapt the catalog to support
>CTAS. In this way, we don't need to introduce a new interface
>`AtomicCatalog` or else.
>
>
>Best,
>Jark
>
>
>On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Martijn,
>> Thank you for your reply, these are two good questions.
>> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >in the query of the sink table, it will be assumed that the user wants to
>> >create a managed table. What will happen if the user doesn't have Table
>> >Store configured/installed? Will we throw an error?
>>
>> If it is a Catalog that does not support managed table and no `connector`
>> is specified, then the corresponding TableSink cannot be generated, will
>> fail.
>>
>> If it is a Catalog that supports managed table and no `connector` is
>> specified, then it will fail because the table store related configuration
>> is not set and there is no table store related jar.
>>
>>
>> >2. Will there be support included for FLIP-190 (version upgrades)?
>> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> scenarios more in Batch mode.
>> CTAS atomicity implementation requires serialization support for Catalog
>> and hook, which currently cannot be serialized into json, so they cannot be
>> supported FLIP-190.
>> Non-atomic implementations are able to support FLIP-190.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>
>>
>>
>>
>>
>> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >Hi Mang,
>> >
>> >I have two questions/remarks:
>> >
>> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >in the query of the sink table, it will be assumed that the user wants to
>> >create a managed table. What will happen if the user doesn't have Table
>> >Store configured/installed? Will we throw an error?
>> >
>> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >
>> >Best regards,
>> >
>> >Martijn
>> >
>> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >
>> >> Hi everyone,
>> >> Thank you to all those who participated in the discussion, we have
>> >> discussed many rounds, the program has been gradually revised and
>> improved,
>> >> looking forward to further feedback, we will launch a vote in the next
>> day
>> >> or two.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >Hi Yuxia,
>> >> >Thank you very much for your reply.
>> >> >
>> >> >
>> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> it
>> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >Currently does not support RTAS because in the stream mode and batch
>> mode
>> >> semantic unification issues and specific business scenarios are not very
>> >> clear, the future we will support, if in support of rtas and then modify
>> >> the option name, then it will bring the cost of modifying the
>> configuration
>> >> to the user.
>> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> Could you please explain about it. Some pseudocode will be much better
>> if
>> >> it's possible. I'm lost in this part.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >This part is too much of an implementation detail, and of course we had
>> >> to make some changes to achieve this. FLIP focuses on semantic
>> consistency
>> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >
>> >> >
>> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> naming is to implement atomic for ctas, we propose a interface for
>> catalog
>> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> the
>> >> interface is for the atomic of ctas. But if we want to implement other
>> >> features like isolate which may also require serializable catalog in the
>> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> Have
>> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> public interface, maybe we should be careful about the name.
>> >> >Regarding the definition of the Catalog name, we have also discussed
>> the
>> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> straightforward to understand.
>> >> >
>> >> >
>> >> >Hope this answers your question.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >--
>> >> >
>> >> >Best regards,
>> >> >Mang Zhang
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> minor questions:
>> >> >>
>> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> it
>> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >>
>> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> Could you please explain about it. Some pseudocode will be much better
>> if
>> >> it's possible.  I'm lost in this part.
>> >> >>
>> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> naming is to implement atomic for ctas, we propose a interface for
>> catalog
>> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> the
>> >> interface is for the atomic of ctas. But if we want to implement other
>> >> features like isolate which may also require serializable catalog in the
>> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> Have
>> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> public interface, maybe we should be careful about the name.
>> >> >>
>> >> >>
>> >> >>Best regards,
>> >> >>Yuxia
>> >> >>
>> >> >>----- 原始邮件 -----
>> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >>抄送: imjark@gmail.com
>> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> clause
>> >> in CREATE TABLE(CTAS)
>> >> >>
>> >> >>Hi Jark,
>> >> >>First of all, thank you for your very good advice!
>> >> >>The RTAS point you mentioned is a good one, and we should support it
>> as
>> >> well.
>> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> within the company, I found that:
>> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> the old data and use the new data.
>> >> >>This semantics is better implemented in Batch mode, for example, if
>> the
>> >> target table is a Hive table, old data file can be deleted directly.
>> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> can't delete the data.
>> >> >>So the semantics in streaming and batch scenarios are not well
>> >> guaranteed to be consistent.
>> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> found that RTAS was not used.
>> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> application scenario is not very clear.
>> >> >>
>> >> >>
>> >> >>It is not clear what kind of semantics RTAS should provide in
>> streaming
>> >> mode, and the user's business scenarios are not very clear.
>> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> possibility of supporting RTAS in the future in the interface
>> definition.
>> >> >>What do you think? Looking forward to your response!
>> >> >>
>> >> >>
>> >> >>By the way, the other points raised have been updated. thanks.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>--
>> >> >>
>> >> >>Best regards,
>> >> >>Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >>>Thanks for the update, Mang and Ron,
>> >> >>>
>> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >>>behavior
>> >> >>>consistent between batch and streaming mode by default. This is how
>> we
>> >> do
>> >> >>>it
>> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >>>
>> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >>>interfaces.
>> >> >>>
>> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> on
>> >> one
>> >> >>>of CTAS and RTAS.
>> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >>>
>> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> to
>> >> >>>support it.
>> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> there is
>> >> >>>not much difference
>> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> configurations,
>> >> >>>is it possible
>> >> >>> to support RTAS in this FLIP as well?
>> >> >>>
>> >> >>>3) connector.type
>> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> them
>> >> >>>with 'connector'?
>> >> >>>
>> >> >>>4) SupportsAtomicCatalog
>> >> >>>I have some concerns about using "Supports.." prefix which is known
>> as
>> >> the
>> >> >>>ability extension for
>> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> enough?
>> >> >>>
>> >> >>>Best,
>> >> >>>Jark
>> >> >>>
>> >> >>>[1]:
>> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >>>
>> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >>>
>> >> >>>> Hi all,
>> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >>>> suggestions!
>> >> >>>> After several rounds of online and offline discussions, the
>> solution
>> >> in
>> >> >>>> FLIP has been updated.
>> >> >>>> Looking forward to more feedback from everyone.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>>
>> >> >>>> Best regards,
>> >> >>>> Mang Zhang
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >>>> >Hi godfrey and ron,
>> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >>>> >Looking forward to further feedback from others.
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >--
>> >> >>>> >
>> >> >>>> >Best regards,
>> >> >>>> >Mang Zhang
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> good
>> >> to
>> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> very
>> >> good
>> >> >>>> if you look at it again。
>> >> >>>> >>
>> >> >>>> >>Also looking forward to further feedback from others.
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >>> -----原始邮件-----
>> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> CREATE
>> >> >>>> TABLE(CTAS)
>> >> >>>> >>>
>> >> >>>> >>> Hi all,
>> >> >>>> >>>
>> >> >>>> >>> Sorry for the late reply.
>> >> >>>> >>>
>> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >>>> >>>
>> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> serializability
>> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> need
>> >> to try
>> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> fails,
>> >> an
>> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> not
>> >> >>>> support CTAS because it cannot be serialized.
>> >> >>>> >>> This behavior is too cryptic, and will break the current
>> catalog
>> >> >>>> >>> behavior when using 1.16.
>> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >>>> >>> implements Serializable.
>> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> catalog
>> >> >>>> interface.
>> >> >>>> >>>
>> >> >>>> >>> > Catalog#inferTableOptions
>> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> the
>> >> >>>> >>> behavior is unclear.
>> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> is
>> >> >>>> >>> empty. but if user forget to
>> >> >>>> >>> set connector option for CTAS statement, the created table
>> will be
>> >> >>>> >>> managed table.
>> >> >>>> >>> 2) the options and its values for catalog and for connector
>> may be
>> >> >>>> different,
>> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >>>> >>>
>> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >>>> >>>
>> >> >>>> >>> Best,
>> >> >>>> >>> Godfrey
>> >> >>>> >>>
>> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >>>> >>> >
>> >> >>>> >>> > Hi Yun,
>> >> >>>> >>> > Thanks for your reply!
>> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> > --
>> >> >>>> >>> >
>> >> >>>> >>> > Best regards,
>> >> >>>> >>> > Mang Zhang
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> <yungao.gy@aliyun.com.INVALID
>> >> >
>> >> >>>> wrote:
>> >> >>>> >>> > >Hi,
>> >> >>>> >>> > >
>> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> with
>> >> >>>> Dalong and Zhu,
>> >> >>>> >>> > >we think that listening in the client side might be
>> problematic
>> >> >>>> since it would exit
>> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> operation
>> >> >>>> might need to
>> >> >>>> >>> > >be in the JobMaster side.
>> >> >>>> >>> > >
>> >> >>>> >>> > >For the listener interface, currently JobListener only
>> resides
>> >> in
>> >> >>>> the client side
>> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >>>> scenario, and
>> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> JM and
>> >> >>>> is not
>> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> JobStatusHook,
>> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >>>> JobMaster.
>> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >>>> >>> > >
>> >> >>>> >>> > >Best,
>> >> >>>> >>> > >Yun
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> >
>> >> >------------------------------------------------------------------
>> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> CREATE
>> >> >>>> TABLE(CTAS)
>> >> >>>> >>> > >
>> >> >>>> >>> > >Hi, Martijn
>> >> >>>> >>> > >Thanks for your reply!
>> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> standard.
>> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >--
>> >> >>>> >>> > >
>> >> >>>> >>> > >Best regards,
>> >> >>>> >>> > >Mang Zhang
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> martijnvisser@apache.org>
>> >> >>>> wrote:
>> >> >>>> >>> > >>Hi everyone,
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >>>> standard?
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>Best regards,
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>Martijn Visser
>> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >>>> wrote:
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> feature.
>> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> schema
>> >> >>>> including
>> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> fature
>> >> in
>> >> >>>> case we want
>> >> >>>> >>> > >>> to change the data types in target table instead of
>> always
>> >> copy
>> >> >>>> the source
>> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> support
>> >> this
>> >> >>>> feature.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> interface
>> >> to
>> >> >>>> drop table,
>> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> [1]
>> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> Best regards,
>> >> >>>> >>> > >>> Yuxia
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >>>> TABLE(CTAS)
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> Hi, everyone
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> I would like to open a discussion for support select
>> clause
>> >> in
>> >> >>>> CREATE
>> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> flink sql
>> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> create
>> >> >>>> the target
>> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >>>> >>> > >>> However, the target table may have many columns, which
>> will
>> >> >>>> bring a lot of
>> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> target
>> >> >>>> table is
>> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> facilitate
>> >> the
>> >> >>>> user.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> forward to
>> >> >>>> your feedback.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> [1]
>> >> >>>> >>> > >>>
>> >> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> --
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> Best regards,
>> >> >>>> >>> > >>> Mang Zhang
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >>------------------------------
>> >> >>>> >>Best,
>> >> >>>> >>Ron
>> >> >>>>
>> >>
>>