You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Martijn Visser <ma...@apache.org> on 2022/05/04 13:49:00 UTC

Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Hi everyone,

Can we identify if this proposed syntax is part of the SQL standard?

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:

> Thanks for for driving this work, it's to be a useful feature.
> About the flip-218, I have some questions.
>
> 1: Does our CTAS syntax support specify target table's schema including
> column name and data type? I think it maybe a useful fature in case we want
> to change the data types in target table instead of always copy the source
> table's schema. It'll be more flexible with this feature.
> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
>
> 2: Seems it'll requre sink to implement an public interface to drop table,
> so what's the interface will look like?
>
> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Mang Zhang" <zh...@163.com>
> 收件人: "dev" <de...@flink.apache.org>
> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>
> Hi, everyone
>
>
> I would like to open a discussion for support select clause in CREATE
> TABLE(CTAS),
> With the development of business and the enhancement of flink sql
> capabilities, queries become more and more complex.
> Now the user needs to use the Create Table statement to create the target
> table first, and then execute the insert statement.
> However, the target table may have many columns, which will bring a lot of
> work outside the business logic to the user.
> At the same time, ensure that the schema of the created target table is
> consistent with the schema of the query result.
> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>
>
>
> You can find more details in FLIP-218[1]. Looking forward to your feedback.
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>

Re:Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi Jing,



Thanks for your reply!
I recently updated the FLIP documentation, when you have time, you can look at it again.
About Atomicity and Isolation, the main discussion point is on batch mode, to be more specific is Hive/Spark.
My point of consideration is that according to the needs of the actual business scenario and the current implementation in spark, 
I decided to keep an approximation with the spark implementation first.
In our company, spark sql is currently the main engine for offline computing, so I currently choose the Spark DataSource v1 solution.



--

Best regards,
Mang Zhang





At 2022-06-04 03:35:59, "Jing Ge" <ji...@ververica.com> wrote:
>Hi Mang,
>
>Thanks for driving this! Finally there is a discussion about CTAS. It is
>one of the most important features.
>
>I agree with Jark that it would be good to be able to take care of both
>atomicity and isolation, which lead to Spark DataSource v2. Would you like
>to help me understand the connection between the reason and your decision
>to pick Spark DataSource V1?
>
>*Reasons:*
>
>   - *Streaming mode requires the table to be created first, downstream
>   jobs can consume in real time. *(both Spark DataSource V1 and V2)
>   - *In most cases, Streaming jobs do not need to be cleaned up even if
>   the job fails. * (both Spark DataSource V1 and V2)
>   - *Flink has a rich connector ecosystem, and the capabilities provided
>   by external storage systems are different, Flink needs to behave
>   consistently. *(how will it lead to Spark DataSource V1)
>   - *Batch jobs try to ensure final atomicity. *(means to choose Spark
>   DataSource V2 right?)
>
>
>I think there are some differences between Spark DataSource V1 and V2, e.g.
>when will the sind table be visible. Whether the result will be written to
>a temporary directory or directly to the sink table, etc.
>
>It would be great if you could update the reasons that led to your
>decision. thanks!
>
>Best regards,
>Jing
>
>On Tue, May 31, 2022 at 8:34 AM Yun Gao <yu...@aliyun.com.invalid>
>wrote:
>
>> Hi,
>>
>> Regarding the drop operation, with some offline discussion with Dalong and
>> Zhu,
>> we think that listening in the client side might be problematic since it
>> would exit
>> after submitting the jobs in detached mode, thus the operation might need
>> to
>> be in the JobMaster side.
>>
>> For the listener interface, currently JobListener only resides in the
>> client side
>> and contains unsuitable methods like onJobSubmitted for this scenario, and
>> the internal JobStatusListener is designed to be used inside JM and is not
>> serializable, thus we tend to add a new interface JobStatusHook,
>> which could be attached to the JobGraph and executed in the JobMaster.
>> The interface will also be marked as Internal.
>>
>> Best,
>> Yun
>>
>>
>> ------------------------------------------------------------------
>> From:Mang Zhang <zh...@163.com>
>> Send Time:2022 May 25 (Wed.) 10:24
>> To:dev <de...@flink.apache.org>
>> Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> TABLE(CTAS)
>>
>> Hi, Martijn
>> Thanks for your reply!
>> I looked at the SQL standard, CTAS is part of the SQL standard.
>> Feature T172 is "AS subquery clause in table definition".
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>
>>
>>
>>
>>
>> At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
>> >Hi everyone,
>> >
>> >Can we identify if this proposed syntax is part of the SQL standard?
>> >
>> >Best regards,
>> >
>> >Martijn Visser
>> >https://twitter.com/MartijnVisser82
>> >https://github.com/MartijnVisser
>> >
>> >
>> >On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
>> >
>> >> Thanks for for driving this work, it's to be a useful feature.
>> >> About the flip-218, I have some questions.
>> >>
>> >> 1: Does our CTAS syntax support specify target table's schema including
>> >> column name and data type? I think it maybe a useful fature in case we
>> want
>> >> to change the data types in target table instead of always copy the
>> source
>> >> table's schema. It'll be more flexible with this feature.
>> >> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this
>> feature.
>> >>
>> >> 2: Seems it'll requre sink to implement an public interface to drop
>> table,
>> >> so what's the interface will look like?
>> >>
>> >> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >>
>> >> Best regards,
>> >> Yuxia
>> >>
>> >> ----- 原始邮件 -----
>> >> 发件人: "Mang Zhang" <zh...@163.com>
>> >> 收件人: "dev" <de...@flink.apache.org>
>> >> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>> >>
>> >> Hi, everyone
>> >>
>> >>
>> >> I would like to open a discussion for support select clause in CREATE
>> >> TABLE(CTAS),
>> >> With the development of business and the enhancement of flink sql
>> >> capabilities, queries become more and more complex.
>> >> Now the user needs to use the Create Table statement to create the
>> target
>> >> table first, and then execute the insert statement.
>> >> However, the target table may have many columns, which will bring a lot
>> of
>> >> work outside the business logic to the user.
>> >> At the same time, ensure that the schema of the created target table is
>> >> consistent with the schema of the query result.
>> >> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>> >>
>> >>
>> >>
>> >> You can find more details in FLIP-218[1]. Looking forward to your
>> feedback.
>> >>
>> >>
>> >>
>> >> [1]
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>>
>>

Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Jing Ge <ji...@ververica.com>.
Hi Mang,

Thanks for driving this! Finally there is a discussion about CTAS. It is
one of the most important features.

I agree with Jark that it would be good to be able to take care of both
atomicity and isolation, which lead to Spark DataSource v2. Would you like
to help me understand the connection between the reason and your decision
to pick Spark DataSource V1?

*Reasons:*

   - *Streaming mode requires the table to be created first, downstream
   jobs can consume in real time. *(both Spark DataSource V1 and V2)
   - *In most cases, Streaming jobs do not need to be cleaned up even if
   the job fails. * (both Spark DataSource V1 and V2)
   - *Flink has a rich connector ecosystem, and the capabilities provided
   by external storage systems are different, Flink needs to behave
   consistently. *(how will it lead to Spark DataSource V1)
   - *Batch jobs try to ensure final atomicity. *(means to choose Spark
   DataSource V2 right?)


I think there are some differences between Spark DataSource V1 and V2, e.g.
when will the sind table be visible. Whether the result will be written to
a temporary directory or directly to the sink table, etc.

It would be great if you could update the reasons that led to your
decision. thanks!

Best regards,
Jing

On Tue, May 31, 2022 at 8:34 AM Yun Gao <yu...@aliyun.com.invalid>
wrote:

> Hi,
>
> Regarding the drop operation, with some offline discussion with Dalong and
> Zhu,
> we think that listening in the client side might be problematic since it
> would exit
> after submitting the jobs in detached mode, thus the operation might need
> to
> be in the JobMaster side.
>
> For the listener interface, currently JobListener only resides in the
> client side
> and contains unsuitable methods like onJobSubmitted for this scenario, and
> the internal JobStatusListener is designed to be used inside JM and is not
> serializable, thus we tend to add a new interface JobStatusHook,
> which could be attached to the JobGraph and executed in the JobMaster.
> The interface will also be marked as Internal.
>
> Best,
> Yun
>
>
> ------------------------------------------------------------------
> From:Mang Zhang <zh...@163.com>
> Send Time:2022 May 25 (Wed.) 10:24
> To:dev <de...@flink.apache.org>
> Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> TABLE(CTAS)
>
> Hi, Martijn
> Thanks for your reply!
> I looked at the SQL standard, CTAS is part of the SQL standard.
> Feature T172 is "AS subquery clause in table definition".
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
> >Hi everyone,
> >
> >Can we identify if this proposed syntax is part of the SQL standard?
> >
> >Best regards,
> >
> >Martijn Visser
> >https://twitter.com/MartijnVisser82
> >https://github.com/MartijnVisser
> >
> >
> >On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
> >
> >> Thanks for for driving this work, it's to be a useful feature.
> >> About the flip-218, I have some questions.
> >>
> >> 1: Does our CTAS syntax support specify target table's schema including
> >> column name and data type? I think it maybe a useful fature in case we
> want
> >> to change the data types in target table instead of always copy the
> source
> >> table's schema. It'll be more flexible with this feature.
> >> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this
> feature.
> >>
> >> 2: Seems it'll requre sink to implement an public interface to drop
> table,
> >> so what's the interface will look like?
> >>
> >> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >>
> >> Best regards,
> >> Yuxia
> >>
> >> ----- 原始邮件 -----
> >> 发件人: "Mang Zhang" <zh...@163.com>
> >> 收件人: "dev" <de...@flink.apache.org>
> >> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
> >>
> >> Hi, everyone
> >>
> >>
> >> I would like to open a discussion for support select clause in CREATE
> >> TABLE(CTAS),
> >> With the development of business and the enhancement of flink sql
> >> capabilities, queries become more and more complex.
> >> Now the user needs to use the Create Table statement to create the
> target
> >> table first, and then execute the insert statement.
> >> However, the target table may have many columns, which will bring a lot
> of
> >> work outside the business logic to the user.
> >> At the same time, ensure that the schema of the created target table is
> >> consistent with the schema of the query result.
> >> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
> >>
> >>
> >>
> >> You can find more details in FLIP-218[1]. Looking forward to your
> feedback.
> >>
> >>
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Mang Zhang
> >>
>
>

Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi everyone, Thank you to all those who participated in the discussion, the program has been gradually revised and improved, everyone has reached a consensus.
I will relaunch vote soon.







--

Best regards,
Mang Zhang




At 2022-07-05 11:54:07, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:

Thanks for updating.
Now, the FLIP looks good to me.


Best regards,
Yuxia


发件人: "zhangmang1" <zh...@163.com>
收件人: luoyuxia@alumni.sjtu.edu.cn
抄送: "dev" <de...@flink.apache.org>, "Martijn Visser" <ma...@apache.org>, imjark@gmail.com
发送时间: 星期二, 2022年 7 月 05日 上午 11:35:35
主题: Re:Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)



Hi yuxia,
I updated the FLIP and adjusted your concern for RTAS and AtomicCatalog, not sure if it solved your concern, looking forward to your reply!







--

Best regards,
Mang Zhang





At 2022-07-05 11:26:22, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, the FLIP looks good to me now.
>
>Best,
>Jark
>
>On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Jark,
>> Regarding the two issues of concern to yuxia, we did some offline
>> discussions and adjusted the implementation plan.
>>
>> >1) RTAS
>> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>>
>> >2) AtomicCatalog
>>
>> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
>> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>>
>>
>> What do you think? Looking forward to your feedback!
>>
>> --
>>
>> Best regards,
>>
>> Mang Zhang
>>
>>
>>
>> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>> >Hi Mang,
>> >
>> >I'm not sure whether your response has addressed Yuxia's concern or not.
>> >Would be better to receive a confirmation from participants before starting
>> >the vote.
>> >
>> >Actually, I have the same feeling with Yuxia's reply.
>> >
>> >1) RTAS
>> >If it's hard to have a consistent behavior for RTAS between streaming mode
>> >and batch mode,
>> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>> >suitable and may need to
>> >change in the future. If the RTAS will not be supported in this version and
>> >the configuration
>> >may be not suitable in the future, how about removing the "rtas" from the
>> >config? We can
>> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
>> >and still keeps backward compatibility.
>> >
>> >2) AtomicCatalog
>> >We won't add other methods to `AtomicCatalog` in the future, because new
>> >methods required for isolation doesn't
>> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>> >`TransactionalCatalog` or `StagingCalalog`.
>> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
>> >catalog is just a serializable catalog.
>> >How about just adding more javadocs on the `Catalog` interface to implement
>> >`Serializable` and make the catalog
>> >instances can be de/serialized using Java Serialization in case of
>> >supporting CTAS for the catalog. The planner
>> >should check the serialization for the catalog and throw an instruction for
>> >users on how to adapt the catalog to support
>> >CTAS. In this way, we don't need to introduce a new interface
>> >`AtomicCatalog` or else.
>> >
>> >
>> >Best,
>> >Jark
>> >
>> >
>> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>> >
>> >> Hi Martijn,
>> >> Thank you for your reply, these are two good questions.
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >>
>> >> If it is a Catalog that does not support managed table and no `connector`
>> >> is specified, then the corresponding TableSink cannot be generated, will
>> >> fail.
>> >>
>> >> If it is a Catalog that supports managed table and no `connector` is
>> >> specified, then it will fail because the table store related configuration
>> >> is not set and there is no table store related jar.
>> >>
>> >>
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> >> scenarios more in Batch mode.
>> >> CTAS atomicity implementation requires serialization support for Catalog
>> >> and hook, which currently cannot be serialized into json, so they cannot be
>> >> supported FLIP-190.
>> >> Non-atomic implementations are able to support FLIP-190.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >> >Hi Mang,
>> >> >
>> >> >I have two questions/remarks:
>> >> >
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >> >
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> >
>> >> >Best regards,
>> >> >
>> >> >Martijn
>> >> >
>> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >> >
>> >> >> Hi everyone,
>> >> >> Thank you to all those who participated in the discussion, we have
>> >> >> discussed many rounds, the program has been gradually revised and
>> >> improved,
>> >> >> looking forward to further feedback, we will launch a vote in the next
>> >> day
>> >> >> or two.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best regards,
>> >> >> Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >Hi Yuxia,
>> >> >> >Thank you very much for your reply.
>> >> >> >
>> >> >> >
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >Currently does not support RTAS because in the stream mode and batch
>> >> mode
>> >> >> semantic unification issues and specific business scenarios are not very
>> >> >> clear, the future we will support, if in support of rtas and then modify
>> >> >> the option name, then it will bring the cost of modifying the
>> >> configuration
>> >> >> to the user.
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible. I'm lost in this part.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >This part is too much of an implementation detail, and of course we had
>> >> >> to make some changes to achieve this. FLIP focuses on semantic
>> >> consistency
>> >> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >> >
>> >> >> >
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >Regarding the definition of the Catalog name, we have also discussed
>> >> the
>> >> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> >> straightforward to understand.
>> >> >> >
>> >> >> >
>> >> >> >Hope this answers your question.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >
>> >> >> >Best regards,
>> >> >> >Mang Zhang
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> >> minor questions:
>> >> >> >>
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >>
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible.  I'm lost in this part.
>> >> >> >>
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >>
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Yuxia
>> >> >> >>
>> >> >> >>----- 原始邮件 -----
>> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >> >>抄送: imjark@gmail.com
>> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> >> clause
>> >> >> in CREATE TABLE(CTAS)
>> >> >> >>
>> >> >> >>Hi Jark,
>> >> >> >>First of all, thank you for your very good advice!
>> >> >> >>The RTAS point you mentioned is a good one, and we should support it
>> >> as
>> >> >> well.
>> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> >> within the company, I found that:
>> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> >> the old data and use the new data.
>> >> >> >>This semantics is better implemented in Batch mode, for example, if
>> >> the
>> >> >> target table is a Hive table, old data file can be deleted directly.
>> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> >> can't delete the data.
>> >> >> >>So the semantics in streaming and batch scenarios are not well
>> >> >> guaranteed to be consistent.
>> >> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> >> found that RTAS was not used.
>> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> >> application scenario is not very clear.
>> >> >> >>
>> >> >> >>
>> >> >> >>It is not clear what kind of semantics RTAS should provide in
>> >> streaming
>> >> >> mode, and the user's business scenarios are not very clear.
>> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> >> possibility of supporting RTAS in the future in the interface
>> >> definition.
>> >> >> >>What do you think? Looking forward to your response!
>> >> >> >>
>> >> >> >>
>> >> >> >>By the way, the other points raised have been updated. thanks.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>--
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Mang Zhang
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >> >>>Thanks for the update, Mang and Ron,
>> >> >> >>>
>> >> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >> >>>behavior
>> >> >> >>>consistent between batch and streaming mode by default. This is how
>> >> we
>> >> >> do
>> >> >> >>>it
>> >> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >> >>>
>> >> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >> >>>interfaces.
>> >> >> >>>
>> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> >> on
>> >> >> one
>> >> >> >>>of CTAS and RTAS.
>> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >> >>>
>> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> >> to
>> >> >> >>>support it.
>> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> >> there is
>> >> >> >>>not much difference
>> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> >> configurations,
>> >> >> >>>is it possible
>> >> >> >>> to support RTAS in this FLIP as well?
>> >> >> >>>
>> >> >> >>>3) connector.type
>> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> >> them
>> >> >> >>>with 'connector'?
>> >> >> >>>
>> >> >> >>>4) SupportsAtomicCatalog
>> >> >> >>>I have some concerns about using "Supports.." prefix which is known
>> >> as
>> >> >> the
>> >> >> >>>ability extension for
>> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> >> enough?
>> >> >> >>>
>> >> >> >>>Best,
>> >> >> >>>Jark
>> >> >> >>>
>> >> >> >>>[1]:
>> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >> >>>
>> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >> >>>
>> >> >> >>>> Hi all,
>> >> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >> >>>> suggestions!
>> >> >> >>>> After several rounds of online and offline discussions, the
>> >> solution
>> >> >> in
>> >> >> >>>> FLIP has been updated.
>> >> >> >>>> Looking forward to more feedback from everyone.
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>>
>> >> >> >>>> Best regards,
>> >> >> >>>> Mang Zhang
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >>>> >Hi godfrey and ron,
>> >> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >> >>>> >Looking forward to further feedback from others.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >--
>> >> >> >>>> >
>> >> >> >>>> >Best regards,
>> >> >> >>>> >Mang Zhang
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> >> good
>> >> >> to
>> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> >> very
>> >> >> good
>> >> >> >>>> if you look at it again。
>> >> >> >>>> >>
>> >> >> >>>> >>Also looking forward to further feedback from others.
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>> -----原始邮件-----
>> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>>
>> >> >> >>>> >>> Hi all,
>> >> >> >>>> >>>
>> >> >> >>>> >>> Sorry for the late reply.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> >> serializability
>> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> >> need
>> >> >> to try
>> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> >> fails,
>> >> >> an
>> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> >> not
>> >> >> >>>> support CTAS because it cannot be serialized.
>> >> >> >>>> >>> This behavior is too cryptic, and will break the current
>> >> catalog
>> >> >> >>>> >>> behavior when using 1.16.
>> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >> >>>> >>> implements Serializable.
>> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> >> catalog
>> >> >> >>>> interface.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > Catalog#inferTableOptions
>> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> >> the
>> >> >> >>>> >>> behavior is unclear.
>> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> >> is
>> >> >> >>>> >>> empty. but if user forget to
>> >> >> >>>> >>> set connector option for CTAS statement, the created table
>> >> will be
>> >> >> >>>> >>> managed table.
>> >> >> >>>> >>> 2) the options and its values for catalog and for connector
>> >> may be
>> >> >> >>>> different,
>> >> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >> >>>> >>>
>> >> >> >>>> >>> Best,
>> >> >> >>>> >>> Godfrey
>> >> >> >>>> >>>
>> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Hi Yun,
>> >> >> >>>> >>> > Thanks for your reply!
>> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > --
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Best regards,
>> >> >> >>>> >>> > Mang Zhang
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> >> <yungao.gy@aliyun.com.INVALID
>> >> >> >
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >Hi,
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> >> with
>> >> >> >>>> Dalong and Zhu,
>> >> >> >>>> >>> > >we think that listening in the client side might be
>> >> problematic
>> >> >> >>>> since it would exit
>> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> >> operation
>> >> >> >>>> might need to
>> >> >> >>>> >>> > >be in the JobMaster side.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >For the listener interface, currently JobListener only
>> >> resides
>> >> >> in
>> >> >> >>>> the client side
>> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >> >>>> scenario, and
>> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> >> JM and
>> >> >> >>>> is not
>> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> >> JobStatusHook,
>> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >> >>>> JobMaster.
>> >> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best,
>> >> >> >>>> >>> > >Yun
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> >
>> >> >> >------------------------------------------------------------------
>> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Hi, Martijn
>> >> >> >>>> >>> > >Thanks for your reply!
>> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> >> standard.
>> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >--
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best regards,
>> >> >> >>>> >>> > >Mang Zhang
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> >> martijnvisser@apache.org>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>Hi everyone,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >> >>>> standard?
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Best regards,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Martijn Visser
>> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> >> feature.
>> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> >> schema
>> >> >> >>>> including
>> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> >> fature
>> >> >> in
>> >> >> >>>> case we want
>> >> >> >>>> >>> > >>> to change the data types in target table instead of
>> >> always
>> >> >> copy
>> >> >> >>>> the source
>> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> >> support
>> >> >> this
>> >> >> >>>> feature.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> >> interface
>> >> >> to
>> >> >> >>>> drop table,
>> >> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Yuxia
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Hi, everyone
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> I would like to open a discussion for support select
>> >> clause
>> >> >> in
>> >> >> >>>> CREATE
>> >> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> >> flink sql
>> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> >> create
>> >> >> >>>> the target
>> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >> >>>> >>> > >>> However, the target table may have many columns, which
>> >> will
>> >> >> >>>> bring a lot of
>> >> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> >> target
>> >> >> >>>> table is
>> >> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> >> facilitate
>> >> >> the
>> >> >> >>>> user.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> >> forward to
>> >> >> >>>> your feedback.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> >>> > >>>
>> >> >> >>>>
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> --
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Mang Zhang
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>------------------------------
>> >> >> >>>> >>Best,
>> >> >> >>>> >>Ron
>> >> >> >>>>
>> >> >>
>> >>
>>
>>



Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Thanks for updating. 
Now, the FLIP looks good to me. 

Best regards, 
Yuxia 


发件人: "zhangmang1" <zh...@163.com> 
收件人: luoyuxia@alumni.sjtu.edu.cn 
抄送: "dev" <de...@flink.apache.org>, "Martijn Visser" <ma...@apache.org>, imjark@gmail.com 
发送时间: 星期二, 2022年 7 月 05日 上午 11:35:35 
主题: Re:Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS) 

Hi yuxia, 
I updated the FLIP and adjusted your concern for RTAS and AtomicCatalog, not sure if it solved your concern, looking forward to your reply! 








-- 
Best regards, 
Mang Zhang 




At 2022-07-05 11:26:22, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, the FLIP looks good to me now.
>
>Best,
>Jark
>
>On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Jark,
>> Regarding the two issues of concern to yuxia, we did some offline
>> discussions and adjusted the implementation plan.
>>
>> >1) RTAS
>> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>>
>> >2) AtomicCatalog
>>
>> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
>> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>>
>>
>> What do you think? Looking forward to your feedback!
>>
>> --
>>
>> Best regards,
>>
>> Mang Zhang
>>
>>
>>
>> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>> >Hi Mang,
>> >
>> >I'm not sure whether your response has addressed Yuxia's concern or not.
>> >Would be better to receive a confirmation from participants before starting
>> >the vote.
>> >
>> >Actually, I have the same feeling with Yuxia's reply.
>> >
>> >1) RTAS
>> >If it's hard to have a consistent behavior for RTAS between streaming mode
>> >and batch mode,
>> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>> >suitable and may need to
>> >change in the future. If the RTAS will not be supported in this version and
>> >the configuration
>> >may be not suitable in the future, how about removing the "rtas" from the
>> >config? We can
>> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
>> >and still keeps backward compatibility.
>> >
>> >2) AtomicCatalog
>> >We won't add other methods to `AtomicCatalog` in the future, because new
>> >methods required for isolation doesn't
>> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>> >`TransactionalCatalog` or `StagingCalalog`.
>> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
>> >catalog is just a serializable catalog.
>> >How about just adding more javadocs on the `Catalog` interface to implement
>> >`Serializable` and make the catalog
>> >instances can be de/serialized using Java Serialization in case of
>> >supporting CTAS for the catalog. The planner
>> >should check the serialization for the catalog and throw an instruction for
>> >users on how to adapt the catalog to support
>> >CTAS. In this way, we don't need to introduce a new interface
>> >`AtomicCatalog` or else.
>> >
>> >
>> >Best,
>> >Jark
>> >
>> >
>> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>> >
>> >> Hi Martijn,
>> >> Thank you for your reply, these are two good questions.
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >>
>> >> If it is a Catalog that does not support managed table and no `connector`
>> >> is specified, then the corresponding TableSink cannot be generated, will
>> >> fail.
>> >>
>> >> If it is a Catalog that supports managed table and no `connector` is
>> >> specified, then it will fail because the table store related configuration
>> >> is not set and there is no table store related jar.
>> >>
>> >>
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> >> scenarios more in Batch mode.
>> >> CTAS atomicity implementation requires serialization support for Catalog
>> >> and hook, which currently cannot be serialized into json, so they cannot be
>> >> supported FLIP-190.
>> >> Non-atomic implementations are able to support FLIP-190.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >> >Hi Mang,
>> >> >
>> >> >I have two questions/remarks:
>> >> >
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >> >
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> >
>> >> >Best regards,
>> >> >
>> >> >Martijn
>> >> >
>> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >> >
>> >> >> Hi everyone,
>> >> >> Thank you to all those who participated in the discussion, we have
>> >> >> discussed many rounds, the program has been gradually revised and
>> >> improved,
>> >> >> looking forward to further feedback, we will launch a vote in the next
>> >> day
>> >> >> or two.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best regards,
>> >> >> Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >Hi Yuxia,
>> >> >> >Thank you very much for your reply.
>> >> >> >
>> >> >> >
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >Currently does not support RTAS because in the stream mode and batch
>> >> mode
>> >> >> semantic unification issues and specific business scenarios are not very
>> >> >> clear, the future we will support, if in support of rtas and then modify
>> >> >> the option name, then it will bring the cost of modifying the
>> >> configuration
>> >> >> to the user.
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible. I'm lost in this part.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >This part is too much of an implementation detail, and of course we had
>> >> >> to make some changes to achieve this. FLIP focuses on semantic
>> >> consistency
>> >> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >> >
>> >> >> >
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >Regarding the definition of the Catalog name, we have also discussed
>> >> the
>> >> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> >> straightforward to understand.
>> >> >> >
>> >> >> >
>> >> >> >Hope this answers your question.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >
>> >> >> >Best regards,
>> >> >> >Mang Zhang
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> >> minor questions:
>> >> >> >>
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >>
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible.  I'm lost in this part.
>> >> >> >>
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >>
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Yuxia
>> >> >> >>
>> >> >> >>----- 原始邮件 -----
>> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >> >>抄送: imjark@gmail.com
>> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> >> clause
>> >> >> in CREATE TABLE(CTAS)
>> >> >> >>
>> >> >> >>Hi Jark,
>> >> >> >>First of all, thank you for your very good advice!
>> >> >> >>The RTAS point you mentioned is a good one, and we should support it
>> >> as
>> >> >> well.
>> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> >> within the company, I found that:
>> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> >> the old data and use the new data.
>> >> >> >>This semantics is better implemented in Batch mode, for example, if
>> >> the
>> >> >> target table is a Hive table, old data file can be deleted directly.
>> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> >> can't delete the data.
>> >> >> >>So the semantics in streaming and batch scenarios are not well
>> >> >> guaranteed to be consistent.
>> >> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> >> found that RTAS was not used.
>> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> >> application scenario is not very clear.
>> >> >> >>
>> >> >> >>
>> >> >> >>It is not clear what kind of semantics RTAS should provide in
>> >> streaming
>> >> >> mode, and the user's business scenarios are not very clear.
>> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> >> possibility of supporting RTAS in the future in the interface
>> >> definition.
>> >> >> >>What do you think? Looking forward to your response!
>> >> >> >>
>> >> >> >>
>> >> >> >>By the way, the other points raised have been updated. thanks.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>--
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Mang Zhang
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >> >>>Thanks for the update, Mang and Ron,
>> >> >> >>>
>> >> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >> >>>behavior
>> >> >> >>>consistent between batch and streaming mode by default. This is how
>> >> we
>> >> >> do
>> >> >> >>>it
>> >> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >> >>>
>> >> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >> >>>interfaces.
>> >> >> >>>
>> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> >> on
>> >> >> one
>> >> >> >>>of CTAS and RTAS.
>> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >> >>>
>> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> >> to
>> >> >> >>>support it.
>> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> >> there is
>> >> >> >>>not much difference
>> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> >> configurations,
>> >> >> >>>is it possible
>> >> >> >>> to support RTAS in this FLIP as well?
>> >> >> >>>
>> >> >> >>>3) connector.type
>> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> >> them
>> >> >> >>>with 'connector'?
>> >> >> >>>
>> >> >> >>>4) SupportsAtomicCatalog
>> >> >> >>>I have some concerns about using "Supports.." prefix which is known
>> >> as
>> >> >> the
>> >> >> >>>ability extension for
>> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> >> enough?
>> >> >> >>>
>> >> >> >>>Best,
>> >> >> >>>Jark
>> >> >> >>>
>> >> >> >>>[1]:
>> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >> >>>
>> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >> >>>
>> >> >> >>>> Hi all,
>> >> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >> >>>> suggestions!
>> >> >> >>>> After several rounds of online and offline discussions, the
>> >> solution
>> >> >> in
>> >> >> >>>> FLIP has been updated.
>> >> >> >>>> Looking forward to more feedback from everyone.
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>>
>> >> >> >>>> Best regards,
>> >> >> >>>> Mang Zhang
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >>>> >Hi godfrey and ron,
>> >> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >> >>>> >Looking forward to further feedback from others.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >--
>> >> >> >>>> >
>> >> >> >>>> >Best regards,
>> >> >> >>>> >Mang Zhang
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> >> good
>> >> >> to
>> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> >> very
>> >> >> good
>> >> >> >>>> if you look at it again。
>> >> >> >>>> >>
>> >> >> >>>> >>Also looking forward to further feedback from others.
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>> -----原始邮件-----
>> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>>
>> >> >> >>>> >>> Hi all,
>> >> >> >>>> >>>
>> >> >> >>>> >>> Sorry for the late reply.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> >> serializability
>> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> >> need
>> >> >> to try
>> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> >> fails,
>> >> >> an
>> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> >> not
>> >> >> >>>> support CTAS because it cannot be serialized.
>> >> >> >>>> >>> This behavior is too cryptic, and will break the current
>> >> catalog
>> >> >> >>>> >>> behavior when using 1.16.
>> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >> >>>> >>> implements Serializable.
>> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> >> catalog
>> >> >> >>>> interface.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > Catalog#inferTableOptions
>> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> >> the
>> >> >> >>>> >>> behavior is unclear.
>> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> >> is
>> >> >> >>>> >>> empty. but if user forget to
>> >> >> >>>> >>> set connector option for CTAS statement, the created table
>> >> will be
>> >> >> >>>> >>> managed table.
>> >> >> >>>> >>> 2) the options and its values for catalog and for connector
>> >> may be
>> >> >> >>>> different,
>> >> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >> >>>> >>>
>> >> >> >>>> >>> Best,
>> >> >> >>>> >>> Godfrey
>> >> >> >>>> >>>
>> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Hi Yun,
>> >> >> >>>> >>> > Thanks for your reply!
>> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > --
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Best regards,
>> >> >> >>>> >>> > Mang Zhang
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> >> <yungao.gy@aliyun.com.INVALID
>> >> >> >
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >Hi,
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> >> with
>> >> >> >>>> Dalong and Zhu,
>> >> >> >>>> >>> > >we think that listening in the client side might be
>> >> problematic
>> >> >> >>>> since it would exit
>> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> >> operation
>> >> >> >>>> might need to
>> >> >> >>>> >>> > >be in the JobMaster side.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >For the listener interface, currently JobListener only
>> >> resides
>> >> >> in
>> >> >> >>>> the client side
>> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >> >>>> scenario, and
>> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> >> JM and
>> >> >> >>>> is not
>> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> >> JobStatusHook,
>> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >> >>>> JobMaster.
>> >> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best,
>> >> >> >>>> >>> > >Yun
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> >
>> >> >> >------------------------------------------------------------------
>> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Hi, Martijn
>> >> >> >>>> >>> > >Thanks for your reply!
>> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> >> standard.
>> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >--
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best regards,
>> >> >> >>>> >>> > >Mang Zhang
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> >> martijnvisser@apache.org>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>Hi everyone,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >> >>>> standard?
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Best regards,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Martijn Visser
>> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> >> feature.
>> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> >> schema
>> >> >> >>>> including
>> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> >> fature
>> >> >> in
>> >> >> >>>> case we want
>> >> >> >>>> >>> > >>> to change the data types in target table instead of
>> >> always
>> >> >> copy
>> >> >> >>>> the source
>> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> >> support
>> >> >> this
>> >> >> >>>> feature.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> >> interface
>> >> >> to
>> >> >> >>>> drop table,
>> >> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Yuxia
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Hi, everyone
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> I would like to open a discussion for support select
>> >> clause
>> >> >> in
>> >> >> >>>> CREATE
>> >> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> >> flink sql
>> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> >> create
>> >> >> >>>> the target
>> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >> >>>> >>> > >>> However, the target table may have many columns, which
>> >> will
>> >> >> >>>> bring a lot of
>> >> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> >> target
>> >> >> >>>> table is
>> >> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> >> facilitate
>> >> >> the
>> >> >> >>>> user.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> >> forward to
>> >> >> >>>> your feedback.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> >>> > >>>
>> >> >> >>>>
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> --
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Mang Zhang
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>------------------------------
>> >> >> >>>> >>Best,
>> >> >> >>>> >>Ron
>> >> >> >>>>
>> >> >>
>> >>
>>
>> 


Re:Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi yuxia,
I updated the FLIP and adjusted your concern for RTAS and AtomicCatalog, not sure if it solved your concern, looking forward to your reply!







--

Best regards,
Mang Zhang





At 2022-07-05 11:26:22, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, the FLIP looks good to me now.
>
>Best,
>Jark
>
>On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Jark,
>> Regarding the two issues of concern to yuxia, we did some offline
>> discussions and adjusted the implementation plan.
>>
>> >1) RTAS
>> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>>
>> >2) AtomicCatalog
>>
>> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
>> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>>
>>
>> What do you think? Looking forward to your feedback!
>>
>> --
>>
>> Best regards,
>>
>> Mang Zhang
>>
>>
>>
>> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>> >Hi Mang,
>> >
>> >I'm not sure whether your response has addressed Yuxia's concern or not.
>> >Would be better to receive a confirmation from participants before starting
>> >the vote.
>> >
>> >Actually, I have the same feeling with Yuxia's reply.
>> >
>> >1) RTAS
>> >If it's hard to have a consistent behavior for RTAS between streaming mode
>> >and batch mode,
>> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>> >suitable and may need to
>> >change in the future. If the RTAS will not be supported in this version and
>> >the configuration
>> >may be not suitable in the future, how about removing the "rtas" from the
>> >config? We can
>> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
>> >and still keeps backward compatibility.
>> >
>> >2) AtomicCatalog
>> >We won't add other methods to `AtomicCatalog` in the future, because new
>> >methods required for isolation doesn't
>> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>> >`TransactionalCatalog` or `StagingCalalog`.
>> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
>> >catalog is just a serializable catalog.
>> >How about just adding more javadocs on the `Catalog` interface to implement
>> >`Serializable` and make the catalog
>> >instances can be de/serialized using Java Serialization in case of
>> >supporting CTAS for the catalog. The planner
>> >should check the serialization for the catalog and throw an instruction for
>> >users on how to adapt the catalog to support
>> >CTAS. In this way, we don't need to introduce a new interface
>> >`AtomicCatalog` or else.
>> >
>> >
>> >Best,
>> >Jark
>> >
>> >
>> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>> >
>> >> Hi Martijn,
>> >> Thank you for your reply, these are two good questions.
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >>
>> >> If it is a Catalog that does not support managed table and no `connector`
>> >> is specified, then the corresponding TableSink cannot be generated, will
>> >> fail.
>> >>
>> >> If it is a Catalog that supports managed table and no `connector` is
>> >> specified, then it will fail because the table store related configuration
>> >> is not set and there is no table store related jar.
>> >>
>> >>
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> >> scenarios more in Batch mode.
>> >> CTAS atomicity implementation requires serialization support for Catalog
>> >> and hook, which currently cannot be serialized into json, so they cannot be
>> >> supported FLIP-190.
>> >> Non-atomic implementations are able to support FLIP-190.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >> >Hi Mang,
>> >> >
>> >> >I have two questions/remarks:
>> >> >
>> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >> >in the query of the sink table, it will be assumed that the user wants to
>> >> >create a managed table. What will happen if the user doesn't have Table
>> >> >Store configured/installed? Will we throw an error?
>> >> >
>> >> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >> >
>> >> >Best regards,
>> >> >
>> >> >Martijn
>> >> >
>> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >> >
>> >> >> Hi everyone,
>> >> >> Thank you to all those who participated in the discussion, we have
>> >> >> discussed many rounds, the program has been gradually revised and
>> >> improved,
>> >> >> looking forward to further feedback, we will launch a vote in the next
>> >> day
>> >> >> or two.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best regards,
>> >> >> Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >Hi Yuxia,
>> >> >> >Thank you very much for your reply.
>> >> >> >
>> >> >> >
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >Currently does not support RTAS because in the stream mode and batch
>> >> mode
>> >> >> semantic unification issues and specific business scenarios are not very
>> >> >> clear, the future we will support, if in support of rtas and then modify
>> >> >> the option name, then it will bring the cost of modifying the
>> >> configuration
>> >> >> to the user.
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible. I'm lost in this part.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >This part is too much of an implementation detail, and of course we had
>> >> >> to make some changes to achieve this. FLIP focuses on semantic
>> >> consistency
>> >> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >> >
>> >> >> >
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >Regarding the definition of the Catalog name, we have also discussed
>> >> the
>> >> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> >> straightforward to understand.
>> >> >> >
>> >> >> >
>> >> >> >Hope this answers your question.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >
>> >> >> >Best regards,
>> >> >> >Mang Zhang
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> >> minor questions:
>> >> >> >>
>> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> >> it
>> >> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >> >>
>> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> >> Could you please explain about it. Some pseudocode will be much better
>> >> if
>> >> >> it's possible.  I'm lost in this part.
>> >> >> >>
>> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> >> naming is to implement atomic for ctas, we propose a interface for
>> >> catalog
>> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> >> the
>> >> >> interface is for the atomic of ctas. But if we want to implement other
>> >> >> features like isolate which may also require serializable catalog in the
>> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> >> Have
>> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> >> public interface, maybe we should be careful about the name.
>> >> >> >>
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Yuxia
>> >> >> >>
>> >> >> >>----- 原始邮件 -----
>> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >> >>抄送: imjark@gmail.com
>> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> >> clause
>> >> >> in CREATE TABLE(CTAS)
>> >> >> >>
>> >> >> >>Hi Jark,
>> >> >> >>First of all, thank you for your very good advice!
>> >> >> >>The RTAS point you mentioned is a good one, and we should support it
>> >> as
>> >> >> well.
>> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> >> within the company, I found that:
>> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> >> the old data and use the new data.
>> >> >> >>This semantics is better implemented in Batch mode, for example, if
>> >> the
>> >> >> target table is a Hive table, old data file can be deleted directly.
>> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> >> can't delete the data.
>> >> >> >>So the semantics in streaming and batch scenarios are not well
>> >> >> guaranteed to be consistent.
>> >> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> >> found that RTAS was not used.
>> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> >> application scenario is not very clear.
>> >> >> >>
>> >> >> >>
>> >> >> >>It is not clear what kind of semantics RTAS should provide in
>> >> streaming
>> >> >> mode, and the user's business scenarios are not very clear.
>> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> >> possibility of supporting RTAS in the future in the interface
>> >> definition.
>> >> >> >>What do you think? Looking forward to your response!
>> >> >> >>
>> >> >> >>
>> >> >> >>By the way, the other points raised have been updated. thanks.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>--
>> >> >> >>
>> >> >> >>Best regards,
>> >> >> >>Mang Zhang
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >> >>>Thanks for the update, Mang and Ron,
>> >> >> >>>
>> >> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >> >>>behavior
>> >> >> >>>consistent between batch and streaming mode by default. This is how
>> >> we
>> >> >> do
>> >> >> >>>it
>> >> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >> >>>
>> >> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >> >>>interfaces.
>> >> >> >>>
>> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> >> on
>> >> >> one
>> >> >> >>>of CTAS and RTAS.
>> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >> >>>
>> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> >> to
>> >> >> >>>support it.
>> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> >> there is
>> >> >> >>>not much difference
>> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> >> configurations,
>> >> >> >>>is it possible
>> >> >> >>> to support RTAS in this FLIP as well?
>> >> >> >>>
>> >> >> >>>3) connector.type
>> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> >> them
>> >> >> >>>with 'connector'?
>> >> >> >>>
>> >> >> >>>4) SupportsAtomicCatalog
>> >> >> >>>I have some concerns about using "Supports.." prefix which is known
>> >> as
>> >> >> the
>> >> >> >>>ability extension for
>> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> >> enough?
>> >> >> >>>
>> >> >> >>>Best,
>> >> >> >>>Jark
>> >> >> >>>
>> >> >> >>>[1]:
>> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >> >>>
>> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >> >>>
>> >> >> >>>> Hi all,
>> >> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >> >>>> suggestions!
>> >> >> >>>> After several rounds of online and offline discussions, the
>> >> solution
>> >> >> in
>> >> >> >>>> FLIP has been updated.
>> >> >> >>>> Looking forward to more feedback from everyone.
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> --
>> >> >> >>>>
>> >> >> >>>> Best regards,
>> >> >> >>>> Mang Zhang
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >> >>>> >Hi godfrey and ron,
>> >> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >> >>>> >Looking forward to further feedback from others.
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >--
>> >> >> >>>> >
>> >> >> >>>> >Best regards,
>> >> >> >>>> >Mang Zhang
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >
>> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> >> good
>> >> >> to
>> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> >> very
>> >> >> good
>> >> >> >>>> if you look at it again。
>> >> >> >>>> >>
>> >> >> >>>> >>Also looking forward to further feedback from others.
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>> -----原始邮件-----
>> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>>
>> >> >> >>>> >>> Hi all,
>> >> >> >>>> >>>
>> >> >> >>>> >>> Sorry for the late reply.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >> >>>> >>>
>> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> >> serializability
>> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> >> need
>> >> >> to try
>> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> >> fails,
>> >> >> an
>> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> >> not
>> >> >> >>>> support CTAS because it cannot be serialized.
>> >> >> >>>> >>> This behavior is too cryptic, and will break the current
>> >> catalog
>> >> >> >>>> >>> behavior when using 1.16.
>> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >> >>>> >>> implements Serializable.
>> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> >> catalog
>> >> >> >>>> interface.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > Catalog#inferTableOptions
>> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> >> the
>> >> >> >>>> >>> behavior is unclear.
>> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> >> is
>> >> >> >>>> >>> empty. but if user forget to
>> >> >> >>>> >>> set connector option for CTAS statement, the created table
>> >> will be
>> >> >> >>>> >>> managed table.
>> >> >> >>>> >>> 2) the options and its values for catalog and for connector
>> >> may be
>> >> >> >>>> different,
>> >> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >> >>>> >>>
>> >> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >> >>>> >>>
>> >> >> >>>> >>> Best,
>> >> >> >>>> >>> Godfrey
>> >> >> >>>> >>>
>> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Hi Yun,
>> >> >> >>>> >>> > Thanks for your reply!
>> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > --
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > Best regards,
>> >> >> >>>> >>> > Mang Zhang
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> >
>> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> >> <yungao.gy@aliyun.com.INVALID
>> >> >> >
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >Hi,
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> >> with
>> >> >> >>>> Dalong and Zhu,
>> >> >> >>>> >>> > >we think that listening in the client side might be
>> >> problematic
>> >> >> >>>> since it would exit
>> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> >> operation
>> >> >> >>>> might need to
>> >> >> >>>> >>> > >be in the JobMaster side.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >For the listener interface, currently JobListener only
>> >> resides
>> >> >> in
>> >> >> >>>> the client side
>> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >> >>>> scenario, and
>> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> >> JM and
>> >> >> >>>> is not
>> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> >> JobStatusHook,
>> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >> >>>> JobMaster.
>> >> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best,
>> >> >> >>>> >>> > >Yun
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> >
>> >> >> >------------------------------------------------------------------
>> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> >> CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Hi, Martijn
>> >> >> >>>> >>> > >Thanks for your reply!
>> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> >> standard.
>> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >--
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >Best regards,
>> >> >> >>>> >>> > >Mang Zhang
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >
>> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> >> martijnvisser@apache.org>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>Hi everyone,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >> >>>> standard?
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Best regards,
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>Martijn Visser
>> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >> >>>> wrote:
>> >> >> >>>> >>> > >>
>> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> >> feature.
>> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> >> schema
>> >> >> >>>> including
>> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> >> fature
>> >> >> in
>> >> >> >>>> case we want
>> >> >> >>>> >>> > >>> to change the data types in target table instead of
>> >> always
>> >> >> copy
>> >> >> >>>> the source
>> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> >> support
>> >> >> this
>> >> >> >>>> feature.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> >> interface
>> >> >> to
>> >> >> >>>> drop table,
>> >> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Yuxia
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >> >>>> TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Hi, everyone
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> I would like to open a discussion for support select
>> >> clause
>> >> >> in
>> >> >> >>>> CREATE
>> >> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> >> flink sql
>> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> >> create
>> >> >> >>>> the target
>> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >> >>>> >>> > >>> However, the target table may have many columns, which
>> >> will
>> >> >> >>>> bring a lot of
>> >> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> >> target
>> >> >> >>>> table is
>> >> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> >> facilitate
>> >> >> the
>> >> >> >>>> user.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> >> forward to
>> >> >> >>>> your feedback.
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> [1]
>> >> >> >>>> >>> > >>>
>> >> >> >>>>
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> --
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >>> Best regards,
>> >> >> >>>> >>> > >>> Mang Zhang
>> >> >> >>>> >>> > >>>
>> >> >> >>>> >>> > >
>> >> >> >>>> >>
>> >> >> >>>> >>
>> >> >> >>>> >>------------------------------
>> >> >> >>>> >>Best,
>> >> >> >>>> >>Ron
>> >> >> >>>>
>> >> >>
>> >>
>>
>>

Re: Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Jark Wu <im...@gmail.com>.
Thanks for the update, the FLIP looks good to me now.

Best,
Jark

On Tue, 5 Jul 2022 at 10:57, Mang Zhang <zh...@163.com> wrote:

> Hi Jark,
> Regarding the two issues of concern to yuxia, we did some offline
> discussions and adjusted the implementation plan.
>
> >1) RTAS
> RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.
>
> >2) AtomicCatalog
>
> AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
> So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.
>
>
> What do you think? Looking forward to your feedback!
>
> --
>
> Best regards,
>
> Mang Zhang
>
>
>
> At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
> >Hi Mang,
> >
> >I'm not sure whether your response has addressed Yuxia's concern or not.
> >Would be better to receive a confirmation from participants before starting
> >the vote.
> >
> >Actually, I have the same feeling with Yuxia's reply.
> >
> >1) RTAS
> >If it's hard to have a consistent behavior for RTAS between streaming mode
> >and batch mode,
> >it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
> >suitable and may need to
> >change in the future. If the RTAS will not be supported in this version and
> >the configuration
> >may be not suitable in the future, how about removing the "rtas" from the
> >config? We can
> >still evolve the config to "table.ctas-rtas" if the semantics are the same,
> >and still keeps backward compatibility.
> >
> >2) AtomicCatalog
> >We won't add other methods to `AtomicCatalog` in the future, because new
> >methods required for isolation doesn't
> >belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
> >`TransactionalCatalog` or `StagingCalalog`.
> >So, I think Yuxia's concern is reasonable that it's confusing an atomic
> >catalog is just a serializable catalog.
> >How about just adding more javadocs on the `Catalog` interface to implement
> >`Serializable` and make the catalog
> >instances can be de/serialized using Java Serialization in case of
> >supporting CTAS for the catalog. The planner
> >should check the serialization for the catalog and throw an instruction for
> >users on how to adapt the catalog to support
> >CTAS. In this way, we don't need to introduce a new interface
> >`AtomicCatalog` or else.
> >
> >
> >Best,
> >Jark
> >
> >
> >On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
> >
> >> Hi Martijn,
> >> Thank you for your reply, these are two good questions.
> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >> >in the query of the sink table, it will be assumed that the user wants to
> >> >create a managed table. What will happen if the user doesn't have Table
> >> >Store configured/installed? Will we throw an error?
> >>
> >> If it is a Catalog that does not support managed table and no `connector`
> >> is specified, then the corresponding TableSink cannot be generated, will
> >> fail.
> >>
> >> If it is a Catalog that supports managed table and no `connector` is
> >> specified, then it will fail because the table store related configuration
> >> is not set and there is no table store related jar.
> >>
> >>
> >> >2. Will there be support included for FLIP-190 (version upgrades)?
> >> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
> >> scenarios more in Batch mode.
> >> CTAS atomicity implementation requires serialization support for Catalog
> >> and hook, which currently cannot be serialized into json, so they cannot be
> >> supported FLIP-190.
> >> Non-atomic implementations are able to support FLIP-190.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Mang Zhang
> >>
> >>
> >>
> >>
> >>
> >> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
> >> >Hi Mang,
> >> >
> >> >I have two questions/remarks:
> >> >
> >> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >> >in the query of the sink table, it will be assumed that the user wants to
> >> >create a managed table. What will happen if the user doesn't have Table
> >> >Store configured/installed? Will we throw an error?
> >> >
> >> >2. Will there be support included for FLIP-190 (version upgrades)?
> >> >
> >> >Best regards,
> >> >
> >> >Martijn
> >> >
> >> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
> >> >
> >> >> Hi everyone,
> >> >> Thank you to all those who participated in the discussion, we have
> >> >> discussed many rounds, the program has been gradually revised and
> >> improved,
> >> >> looking forward to further feedback, we will launch a vote in the next
> >> day
> >> >> or two.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Best regards,
> >> >> Mang Zhang
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
> >> >> >Hi Yuxia,
> >> >> >Thank you very much for your reply.
> >> >> >
> >> >> >
> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
> >> it
> >> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >> >Currently does not support RTAS because in the stream mode and batch
> >> mode
> >> >> semantic unification issues and specific business scenarios are not very
> >> >> clear, the future we will support, if in support of rtas and then modify
> >> >> the option name, then it will bring the cost of modifying the
> >> configuration
> >> >> to the user.
> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> >> Could you please explain about it. Some pseudocode will be much better
> >> if
> >> >> it's possible. I'm lost in this part.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >This part is too much of an implementation detail, and of course we had
> >> >> to make some changes to achieve this. FLIP focuses on semantic
> >> consistency
> >> >> in stream and batch mode, and can provide optional atomicity support.
> >> >> >
> >> >> >
> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> >> naming is to implement atomic for ctas, we propose a interface for
> >> catalog
> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> >> the
> >> >> interface is for the atomic of ctas. But if we want to implement other
> >> >> features like isolate which may also require serializable catalog in the
> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
> >> Have
> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> >> public interface, maybe we should be careful about the name.
> >> >> >Regarding the definition of the Catalog name, we have also discussed
> >> the
> >> >> name `SerializableCatalog`, which is too specific and does not relate to
> >> >> the atomic functionality we want to express. CTAS/RTAS want to support
> >> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
> >> >> straightforward to understand.
> >> >> >
> >> >> >
> >> >> >Hope this answers your question.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >--
> >> >> >
> >> >> >Best regards,
> >> >> >Mang Zhang
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
> >> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
> >> >> minor questions:
> >> >> >>
> >> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> >> we're not to implement rtas in this FLIP, it may be better not to refer
> >> it
> >> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >> >>
> >> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> >> Could you please explain about it. Some pseudocode will be much better
> >> if
> >> >> it's possible.  I'm lost in this part.
> >> >> >>
> >> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> >> naming is to implement atomic for ctas, we propose a interface for
> >> catalog
> >> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> >> the
> >> >> interface is for the atomic of ctas. But if we want to implement other
> >> >> features like isolate which may also require serializable catalog in the
> >> >> future, should we introduce a new interface naming `IsolateCatalog`?
> >> Have
> >> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> >> public interface, maybe we should be careful about the name.
> >> >> >>
> >> >> >>
> >> >> >>Best regards,
> >> >> >>Yuxia
> >> >> >>
> >> >> >>----- 原始邮件 -----
> >> >> >>发件人: "Mang Zhang" <zh...@163.com>
> >> >> >>收件人: "dev" <de...@flink.apache.org>
> >> >> >>抄送: imjark@gmail.com
> >> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
> >> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
> >> clause
> >> >> in CREATE TABLE(CTAS)
> >> >> >>
> >> >> >>Hi Jark,
> >> >> >>First of all, thank you for your very good advice!
> >> >> >>The RTAS point you mentioned is a good one, and we should support it
> >> as
> >> >> well.
> >> >> >>However, by investigating the semantics of RTAS and how RTAS is used
> >> >> within the company, I found that:
> >> >> >>1. The semantics of RTAS says that if the table exists, need to delete
> >> >> the old data and use the new data.
> >> >> >>This semantics is better implemented in Batch mode, for example, if
> >> the
> >> >> target table is a Hive table, old data file can be deleted directly.
> >> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
> >> >> can't delete the data.
> >> >> >>So the semantics in streaming and batch scenarios are not well
> >> >> guaranteed to be consistent.
> >> >> >>2. I checked the SQL for big data in the company in the last week and
> >> >> found that RTAS was not used.
> >> >> >>No users in the company have mentioned the need for RTAS yet. So this
> >> >> application scenario is not very clear.
> >> >> >>
> >> >> >>
> >> >> >>It is not clear what kind of semantics RTAS should provide in
> >> streaming
> >> >> mode, and the user's business scenarios are not very clear.
> >> >> >>Maybe We don't have to support RTAS soon, but we can leave the
> >> >> possibility of supporting RTAS in the future in the interface
> >> definition.
> >> >> >>What do you think? Looking forward to your response!
> >> >> >>
> >> >> >>
> >> >> >>By the way, the other points raised have been updated. thanks.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>--
> >> >> >>
> >> >> >>Best regards,
> >> >> >>Mang Zhang
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
> >> >> >>>Thanks for the update, Mang and Ron,
> >> >> >>>
> >> >> >>>The new proposal looks good to me in general, especially keeping the
> >> >> >>>behavior
> >> >> >>>consistent between batch and streaming mode by default. This is how
> >> we
> >> >> do
> >> >> >>>it
> >> >> >>>in the previous "table.dml-sync" option on ML [1].
> >> >> >>>
> >> >> >>>Besides that, I just have some final minor comments regarding some
> >> >> >>>interfaces.
> >> >> >>>
> >> >> >>>1) table.ctas-or-rtas.atomicity-enabled
> >> >> >>>The "OR" keyword sounds like this configuration can only take effect
> >> on
> >> >> one
> >> >> >>>of CTAS and RTAS.
> >> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
> >> >> >>>
> >> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
> >> to
> >> >> >>>support it.
> >> >> >>>RTAS is another widely used statement similar to CTAS. It seems
> >> there is
> >> >> >>>not much difference
> >> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
> >> >> configurations,
> >> >> >>>is it possible
> >> >> >>> to support RTAS in this FLIP as well?
> >> >> >>>
> >> >> >>>3) connector.type
> >> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
> >> >> them
> >> >> >>>with 'connector'?
> >> >> >>>
> >> >> >>>4) SupportsAtomicCatalog
> >> >> >>>I have some concerns about using "Supports.." prefix which is known
> >> as
> >> >> the
> >> >> >>>ability extension for
> >> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
> >> >> enough?
> >> >> >>>
> >> >> >>>Best,
> >> >> >>>Jark
> >> >> >>>
> >> >> >>>[1]:
> >> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
> >> >> >>>
> >> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
> >> >> >>>
> >> >> >>>> Hi all,
> >> >> >>>> Thank you to all those who participated in the discussion and made
> >> >> >>>> suggestions!
> >> >> >>>> After several rounds of online and offline discussions, the
> >> solution
> >> >> in
> >> >> >>>> FLIP has been updated.
> >> >> >>>> Looking forward to more feedback from everyone.
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> --
> >> >> >>>>
> >> >> >>>> Best regards,
> >> >> >>>> Mang Zhang
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
> >> >> >>>> >Hi godfrey and ron,
> >> >> >>>> >Thank you very much for your replies and suggestions.
> >> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
> >> >> >>>> >Looking forward to further feedback from others.
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >--
> >> >> >>>> >
> >> >> >>>> >Best regards,
> >> >> >>>> >Mang Zhang
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >
> >> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
> >> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
> >> good
> >> >> to
> >> >> >>>> me, the FLIP has updated according to your feedback. It will be
> >> very
> >> >> good
> >> >> >>>> if you look at it again。
> >> >> >>>> >>
> >> >> >>>> >>Also looking forward to further feedback from others.
> >> >> >>>> >>
> >> >> >>>> >>
> >> >> >>>> >>> -----原始邮件-----
> >> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
> >> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
> >> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
> >> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
> >> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> >> CREATE
> >> >> >>>> TABLE(CTAS)
> >> >> >>>> >>>
> >> >> >>>> >>> Hi all,
> >> >> >>>> >>>
> >> >> >>>> >>> Sorry for the late reply.
> >> >> >>>> >>>
> >> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
> >> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
> >> >> >>>> >>>
> >> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
> >> >> serializability
> >> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
> >> need
> >> >> to try
> >> >> >>>> to serialize the catalog when compile SQL in planner and if it
> >> fails,
> >> >> an
> >> >> >>>> exception will be >thrown to indicate to user that the catalog does
> >> >> not
> >> >> >>>> support CTAS because it cannot be serialized.
> >> >> >>>> >>> This behavior is too cryptic, and will break the current
> >> catalog
> >> >> >>>> >>> behavior when using 1.16.
> >> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
> >> >> >>>> >>> implements Serializable.
> >> >> >>>> >>>  The existent catalogs can choose whether implements the new
> >> >> catalog
> >> >> >>>> interface.
> >> >> >>>> >>>
> >> >> >>>> >>> > Catalog#inferTableOptions
> >> >> >>>> >>> I strongly recommend not introducing this feature now, because
> >> the
> >> >> >>>> >>> behavior is unclear.
> >> >> >>>> >>> 1) if the catalog support managed table, the connector option
> >> is
> >> >> >>>> >>> empty. but if user forget to
> >> >> >>>> >>> set connector option for CTAS statement, the created table
> >> will be
> >> >> >>>> >>> managed table.
> >> >> >>>> >>> 2) the options and its values for catalog and for connector
> >> may be
> >> >> >>>> different,
> >> >> >>>> >>> so use the catalog option may cause expected errors.
> >> >> >>>> >>>
> >> >> >>>> >>> > StreamGraph#addJobStatusHook
> >> >> >>>> >>> I prefer `registerJobStatusHook`
> >> >> >>>> >>>
> >> >> >>>> >>> Best,
> >> >> >>>> >>> Godfrey
> >> >> >>>> >>>
> >> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
> >> >> >>>> >>> >
> >> >> >>>> >>> > Hi Yun,
> >> >> >>>> >>> > Thanks for your reply!
> >> >> >>>> >>> > Through offline communication with Dalong, I updated the
> >> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> > --
> >> >> >>>> >>> >
> >> >> >>>> >>> > Best regards,
> >> >> >>>> >>> > Mang Zhang
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> >
> >> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
> >> <yungao.gy@aliyun.com.INVALID
> >> >> >
> >> >> >>>> wrote:
> >> >> >>>> >>> > >Hi,
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
> >> with
> >> >> >>>> Dalong and Zhu,
> >> >> >>>> >>> > >we think that listening in the client side might be
> >> problematic
> >> >> >>>> since it would exit
> >> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
> >> operation
> >> >> >>>> might need to
> >> >> >>>> >>> > >be in the JobMaster side.
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >For the listener interface, currently JobListener only
> >> resides
> >> >> in
> >> >> >>>> the client side
> >> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
> >> >> >>>> scenario, and
> >> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
> >> >> JM and
> >> >> >>>> is not
> >> >> >>>> >>> > >serializable, thus we tend to add a new interface
> >> >> JobStatusHook,
> >> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
> >> >> >>>> JobMaster.
> >> >> >>>> >>> > >The interface will also be marked as Internal.
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Best,
> >> >> >>>> >>> > >Yun
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> >
> >> >> >------------------------------------------------------------------
> >> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
> >> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
> >> >> >>>> >>> > >To:dev <de...@flink.apache.org>
> >> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> >> CREATE
> >> >> >>>> TABLE(CTAS)
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Hi, Martijn
> >> >> >>>> >>> > >Thanks for your reply!
> >> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
> >> standard.
> >> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >--
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >Best regards,
> >> >> >>>> >>> > >Mang Zhang
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >
> >> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
> >> >> martijnvisser@apache.org>
> >> >> >>>> wrote:
> >> >> >>>> >>> > >>Hi everyone,
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
> >> >> >>>> standard?
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>Best regards,
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>Martijn Visser
> >> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
> >> >> >>>> >>> > >>https://github.com/MartijnVisser
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
> >> >> luoyuxia@alumni.sjtu.edu.cn>
> >> >> >>>> wrote:
> >> >> >>>> >>> > >>
> >> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
> >> >> feature.
> >> >> >>>> >>> > >>> About the flip-218, I have some questions.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
> >> >> schema
> >> >> >>>> including
> >> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
> >> fature
> >> >> in
> >> >> >>>> case we want
> >> >> >>>> >>> > >>> to change the data types in target table instead of
> >> always
> >> >> copy
> >> >> >>>> the source
> >> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
> >> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
> >> support
> >> >> this
> >> >> >>>> feature.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
> >> interface
> >> >> to
> >> >> >>>> drop table,
> >> >> >>>> >>> > >>> so what's the interface will look like?
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> [1]
> >> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> Best regards,
> >> >> >>>> >>> > >>> Yuxia
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> ----- 原始邮件 -----
> >> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
> >> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
> >> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> >> >> >>>> TABLE(CTAS)
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> Hi, everyone
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> I would like to open a discussion for support select
> >> clause
> >> >> in
> >> >> >>>> CREATE
> >> >> >>>> >>> > >>> TABLE(CTAS),
> >> >> >>>> >>> > >>> With the development of business and the enhancement of
> >> >> flink sql
> >> >> >>>> >>> > >>> capabilities, queries become more and more complex.
> >> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
> >> >> create
> >> >> >>>> the target
> >> >> >>>> >>> > >>> table first, and then execute the insert statement.
> >> >> >>>> >>> > >>> However, the target table may have many columns, which
> >> will
> >> >> >>>> bring a lot of
> >> >> >>>> >>> > >>> work outside the business logic to the user.
> >> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
> >> >> target
> >> >> >>>> table is
> >> >> >>>> >>> > >>> consistent with the schema of the query result.
> >> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
> >> facilitate
> >> >> the
> >> >> >>>> user.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
> >> forward to
> >> >> >>>> your feedback.
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> [1]
> >> >> >>>> >>> > >>>
> >> >> >>>>
> >> >>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> --
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >>> Best regards,
> >> >> >>>> >>> > >>> Mang Zhang
> >> >> >>>> >>> > >>>
> >> >> >>>> >>> > >
> >> >> >>>> >>
> >> >> >>>> >>
> >> >> >>>> >>------------------------------
> >> >> >>>> >>Best,
> >> >> >>>> >>Ron
> >> >> >>>>
> >> >>
> >>
>
>

Re:Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi Jark,
Regarding the two issues of concern to yuxia, we did some offline discussions and adjusted the implementation plan.


>1) RTAS

RTAS is not supported in FLIP, so we will remove rtas from option name and do option forward compatibility when it is supported in the future.


>2) AtomicCatalog
AtomicCatalog was introduced to solve the Catalog serialization problem, but the function is to make CTAS support atomicity, in order to facilitate the user to understand the function so named AtomicCatalog, which seems to bring confusion to developers at present.
So we modified it to only do java Serializable support for Catalogs that support CTAS atomicity and make sure it is serializable/deserializable, if it is a user-defined Catalog that wants to support CTAS atomicity, then it must also follow this requirement, we will do the check in Planner and update the Catalog's Java Doc description.


What do you think? Looking forward to your feedback!

--

Best regards,
Mang Zhang





At 2022-07-04 17:32:20, "Jark Wu" <im...@gmail.com> wrote:
>Hi Mang,
>
>I'm not sure whether your response has addressed Yuxia's concern or not.
>Would be better to receive a confirmation from participants before starting
>the vote.
>
>Actually, I have the same feeling with Yuxia's reply.
>
>1) RTAS
>If it's hard to have a consistent behavior for RTAS between streaming mode
>and batch mode,
>it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
>suitable and may need to
>change in the future. If the RTAS will not be supported in this version and
>the configuration
>may be not suitable in the future, how about removing the "rtas" from the
>config? We can
>still evolve the config to "table.ctas-rtas" if the semantics are the same,
>and still keeps backward compatibility.
>
>2) AtomicCatalog
>We won't add other methods to `AtomicCatalog` in the future, because new
>methods required for isolation doesn't
>belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
>`TransactionalCatalog` or `StagingCalalog`.
>So, I think Yuxia's concern is reasonable that it's confusing an atomic
>catalog is just a serializable catalog.
>How about just adding more javadocs on the `Catalog` interface to implement
>`Serializable` and make the catalog
>instances can be de/serialized using Java Serialization in case of
>supporting CTAS for the catalog. The planner
>should check the serialization for the catalog and throw an instruction for
>users on how to adapt the catalog to support
>CTAS. In this way, we don't need to introduce a new interface
>`AtomicCatalog` or else.
>
>
>Best,
>Jark
>
>
>On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:
>
>> Hi Martijn,
>> Thank you for your reply, these are two good questions.
>> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >in the query of the sink table, it will be assumed that the user wants to
>> >create a managed table. What will happen if the user doesn't have Table
>> >Store configured/installed? Will we throw an error?
>>
>> If it is a Catalog that does not support managed table and no `connector`
>> is specified, then the corresponding TableSink cannot be generated, will
>> fail.
>>
>> If it is a Catalog that supports managed table and no `connector` is
>> specified, then it will fail because the table store related configuration
>> is not set and there is no table store related jar.
>>
>>
>> >2. Will there be support included for FLIP-190 (version upgrades)?
>> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
>> scenarios more in Batch mode.
>> CTAS atomicity implementation requires serialization support for Catalog
>> and hook, which currently cannot be serialized into json, so they cannot be
>> supported FLIP-190.
>> Non-atomic implementations are able to support FLIP-190.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>
>>
>>
>>
>>
>> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>> >Hi Mang,
>> >
>> >I have two questions/remarks:
>> >
>> >1. The FLIP mentions that if the user doesn't specify the WITH option part
>> >in the query of the sink table, it will be assumed that the user wants to
>> >create a managed table. What will happen if the user doesn't have Table
>> >Store configured/installed? Will we throw an error?
>> >
>> >2. Will there be support included for FLIP-190 (version upgrades)?
>> >
>> >Best regards,
>> >
>> >Martijn
>> >
>> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>> >
>> >> Hi everyone,
>> >> Thank you to all those who participated in the discussion, we have
>> >> discussed many rounds, the program has been gradually revised and
>> improved,
>> >> looking forward to further feedback, we will launch a vote in the next
>> day
>> >> or two.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >> >Hi Yuxia,
>> >> >Thank you very much for your reply.
>> >> >
>> >> >
>> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> it
>> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >Currently does not support RTAS because in the stream mode and batch
>> mode
>> >> semantic unification issues and specific business scenarios are not very
>> >> clear, the future we will support, if in support of rtas and then modify
>> >> the option name, then it will bring the cost of modifying the
>> configuration
>> >> to the user.
>> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> Could you please explain about it. Some pseudocode will be much better
>> if
>> >> it's possible. I'm lost in this part.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >This part is too much of an implementation detail, and of course we had
>> >> to make some changes to achieve this. FLIP focuses on semantic
>> consistency
>> >> in stream and batch mode, and can provide optional atomicity support.
>> >> >
>> >> >
>> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> naming is to implement atomic for ctas, we propose a interface for
>> catalog
>> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> the
>> >> interface is for the atomic of ctas. But if we want to implement other
>> >> features like isolate which may also require serializable catalog in the
>> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> Have
>> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> public interface, maybe we should be careful about the name.
>> >> >Regarding the definition of the Catalog name, we have also discussed
>> the
>> >> name `SerializableCatalog`, which is too specific and does not relate to
>> >> the atomic functionality we want to express. CTAS/RTAS want to support
>> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> >> straightforward to understand.
>> >> >
>> >> >
>> >> >Hope this answers your question.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >--
>> >> >
>> >> >Best regards,
>> >> >Mang Zhang
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> >> minor questions:
>> >> >>
>> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> >> nothing about rtas but refer it in the configuration suddenly.  And if
>> >> we're not to implement rtas in this FLIP, it may be better not to refer
>> it
>> >> and the `rtas` shouldn't exposed to user as a configuration.
>> >> >>
>> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> >> Could you please explain about it. Some pseudocode will be much better
>> if
>> >> it's possible.  I'm lost in this part.
>> >> >>
>> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> >> naming is to implement atomic for ctas, we propose a interface for
>> catalog
>> >> to support serializing, then we name it to `AtomicCatalog`. At least,
>> the
>> >> interface is for the atomic of ctas. But if we want to implement other
>> >> features like isolate which may also require serializable catalog in the
>> >> future, should we introduce a new interface naming `IsolateCatalog`?
>> Have
>> >> you ever considered other names like `SerializableCatalog`.  As it's a
>> >> public interface, maybe we should be careful about the name.
>> >> >>
>> >> >>
>> >> >>Best regards,
>> >> >>Yuxia
>> >> >>
>> >> >>----- 原始邮件 -----
>> >> >>发件人: "Mang Zhang" <zh...@163.com>
>> >> >>收件人: "dev" <de...@flink.apache.org>
>> >> >>抄送: imjark@gmail.com
>> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
>> clause
>> >> in CREATE TABLE(CTAS)
>> >> >>
>> >> >>Hi Jark,
>> >> >>First of all, thank you for your very good advice!
>> >> >>The RTAS point you mentioned is a good one, and we should support it
>> as
>> >> well.
>> >> >>However, by investigating the semantics of RTAS and how RTAS is used
>> >> within the company, I found that:
>> >> >>1. The semantics of RTAS says that if the table exists, need to delete
>> >> the old data and use the new data.
>> >> >>This semantics is better implemented in Batch mode, for example, if
>> the
>> >> target table is a Hive table, old data file can be deleted directly.
>> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> >> can't delete the data.
>> >> >>So the semantics in streaming and batch scenarios are not well
>> >> guaranteed to be consistent.
>> >> >>2. I checked the SQL for big data in the company in the last week and
>> >> found that RTAS was not used.
>> >> >>No users in the company have mentioned the need for RTAS yet. So this
>> >> application scenario is not very clear.
>> >> >>
>> >> >>
>> >> >>It is not clear what kind of semantics RTAS should provide in
>> streaming
>> >> mode, and the user's business scenarios are not very clear.
>> >> >>Maybe We don't have to support RTAS soon, but we can leave the
>> >> possibility of supporting RTAS in the future in the interface
>> definition.
>> >> >>What do you think? Looking forward to your response!
>> >> >>
>> >> >>
>> >> >>By the way, the other points raised have been updated. thanks.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>--
>> >> >>
>> >> >>Best regards,
>> >> >>Mang Zhang
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >> >>>Thanks for the update, Mang and Ron,
>> >> >>>
>> >> >>>The new proposal looks good to me in general, especially keeping the
>> >> >>>behavior
>> >> >>>consistent between batch and streaming mode by default. This is how
>> we
>> >> do
>> >> >>>it
>> >> >>>in the previous "table.dml-sync" option on ML [1].
>> >> >>>
>> >> >>>Besides that, I just have some final minor comments regarding some
>> >> >>>interfaces.
>> >> >>>
>> >> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >> >>>The "OR" keyword sounds like this configuration can only take effect
>> on
>> >> one
>> >> >>>of CTAS and RTAS.
>> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >> >>>
>> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
>> to
>> >> >>>support it.
>> >> >>>RTAS is another widely used statement similar to CTAS. It seems
>> there is
>> >> >>>not much difference
>> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> >> configurations,
>> >> >>>is it possible
>> >> >>> to support RTAS in this FLIP as well?
>> >> >>>
>> >> >>>3) connector.type
>> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> >> them
>> >> >>>with 'connector'?
>> >> >>>
>> >> >>>4) SupportsAtomicCatalog
>> >> >>>I have some concerns about using "Supports.." prefix which is known
>> as
>> >> the
>> >> >>>ability extension for
>> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> >> enough?
>> >> >>>
>> >> >>>Best,
>> >> >>>Jark
>> >> >>>
>> >> >>>[1]:
>> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >> >>>
>> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >> >>>
>> >> >>>> Hi all,
>> >> >>>> Thank you to all those who participated in the discussion and made
>> >> >>>> suggestions!
>> >> >>>> After several rounds of online and offline discussions, the
>> solution
>> >> in
>> >> >>>> FLIP has been updated.
>> >> >>>> Looking forward to more feedback from everyone.
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>>
>> >> >>>> Best regards,
>> >> >>>> Mang Zhang
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >> >>>> >Hi godfrey and ron,
>> >> >>>> >Thank you very much for your replies and suggestions.
>> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >> >>>> >Looking forward to further feedback from others.
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >--
>> >> >>>> >
>> >> >>>> >Best regards,
>> >> >>>> >Mang Zhang
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
>> good
>> >> to
>> >> >>>> me, the FLIP has updated according to your feedback. It will be
>> very
>> >> good
>> >> >>>> if you look at it again。
>> >> >>>> >>
>> >> >>>> >>Also looking forward to further feedback from others.
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >>> -----原始邮件-----
>> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> CREATE
>> >> >>>> TABLE(CTAS)
>> >> >>>> >>>
>> >> >>>> >>> Hi all,
>> >> >>>> >>>
>> >> >>>> >>> Sorry for the late reply.
>> >> >>>> >>>
>> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >> >>>> >>>
>> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> >> serializability
>> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
>> need
>> >> to try
>> >> >>>> to serialize the catalog when compile SQL in planner and if it
>> fails,
>> >> an
>> >> >>>> exception will be >thrown to indicate to user that the catalog does
>> >> not
>> >> >>>> support CTAS because it cannot be serialized.
>> >> >>>> >>> This behavior is too cryptic, and will break the current
>> catalog
>> >> >>>> >>> behavior when using 1.16.
>> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >> >>>> >>> implements Serializable.
>> >> >>>> >>>  The existent catalogs can choose whether implements the new
>> >> catalog
>> >> >>>> interface.
>> >> >>>> >>>
>> >> >>>> >>> > Catalog#inferTableOptions
>> >> >>>> >>> I strongly recommend not introducing this feature now, because
>> the
>> >> >>>> >>> behavior is unclear.
>> >> >>>> >>> 1) if the catalog support managed table, the connector option
>> is
>> >> >>>> >>> empty. but if user forget to
>> >> >>>> >>> set connector option for CTAS statement, the created table
>> will be
>> >> >>>> >>> managed table.
>> >> >>>> >>> 2) the options and its values for catalog and for connector
>> may be
>> >> >>>> different,
>> >> >>>> >>> so use the catalog option may cause expected errors.
>> >> >>>> >>>
>> >> >>>> >>> > StreamGraph#addJobStatusHook
>> >> >>>> >>> I prefer `registerJobStatusHook`
>> >> >>>> >>>
>> >> >>>> >>> Best,
>> >> >>>> >>> Godfrey
>> >> >>>> >>>
>> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >> >>>> >>> >
>> >> >>>> >>> > Hi Yun,
>> >> >>>> >>> > Thanks for your reply!
>> >> >>>> >>> > Through offline communication with Dalong, I updated the
>> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> > --
>> >> >>>> >>> >
>> >> >>>> >>> > Best regards,
>> >> >>>> >>> > Mang Zhang
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> >
>> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
>> <yungao.gy@aliyun.com.INVALID
>> >> >
>> >> >>>> wrote:
>> >> >>>> >>> > >Hi,
>> >> >>>> >>> > >
>> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
>> with
>> >> >>>> Dalong and Zhu,
>> >> >>>> >>> > >we think that listening in the client side might be
>> problematic
>> >> >>>> since it would exit
>> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
>> operation
>> >> >>>> might need to
>> >> >>>> >>> > >be in the JobMaster side.
>> >> >>>> >>> > >
>> >> >>>> >>> > >For the listener interface, currently JobListener only
>> resides
>> >> in
>> >> >>>> the client side
>> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >> >>>> scenario, and
>> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> >> JM and
>> >> >>>> is not
>> >> >>>> >>> > >serializable, thus we tend to add a new interface
>> >> JobStatusHook,
>> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >> >>>> JobMaster.
>> >> >>>> >>> > >The interface will also be marked as Internal.
>> >> >>>> >>> > >
>> >> >>>> >>> > >Best,
>> >> >>>> >>> > >Yun
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> >
>> >> >------------------------------------------------------------------
>> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> >> CREATE
>> >> >>>> TABLE(CTAS)
>> >> >>>> >>> > >
>> >> >>>> >>> > >Hi, Martijn
>> >> >>>> >>> > >Thanks for your reply!
>> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
>> standard.
>> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >--
>> >> >>>> >>> > >
>> >> >>>> >>> > >Best regards,
>> >> >>>> >>> > >Mang Zhang
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >
>> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> >> martijnvisser@apache.org>
>> >> >>>> wrote:
>> >> >>>> >>> > >>Hi everyone,
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >> >>>> standard?
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>Best regards,
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>Martijn Visser
>> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >> >>>> >>> > >>https://github.com/MartijnVisser
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> >> luoyuxia@alumni.sjtu.edu.cn>
>> >> >>>> wrote:
>> >> >>>> >>> > >>
>> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> >> feature.
>> >> >>>> >>> > >>> About the flip-218, I have some questions.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> >> schema
>> >> >>>> including
>> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
>> fature
>> >> in
>> >> >>>> case we want
>> >> >>>> >>> > >>> to change the data types in target table instead of
>> always
>> >> copy
>> >> >>>> the source
>> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
>> support
>> >> this
>> >> >>>> feature.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
>> interface
>> >> to
>> >> >>>> drop table,
>> >> >>>> >>> > >>> so what's the interface will look like?
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> [1]
>> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> Best regards,
>> >> >>>> >>> > >>> Yuxia
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> ----- 原始邮件 -----
>> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >> >>>> TABLE(CTAS)
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> Hi, everyone
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> I would like to open a discussion for support select
>> clause
>> >> in
>> >> >>>> CREATE
>> >> >>>> >>> > >>> TABLE(CTAS),
>> >> >>>> >>> > >>> With the development of business and the enhancement of
>> >> flink sql
>> >> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> >> create
>> >> >>>> the target
>> >> >>>> >>> > >>> table first, and then execute the insert statement.
>> >> >>>> >>> > >>> However, the target table may have many columns, which
>> will
>> >> >>>> bring a lot of
>> >> >>>> >>> > >>> work outside the business logic to the user.
>> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> >> target
>> >> >>>> table is
>> >> >>>> >>> > >>> consistent with the schema of the query result.
>> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
>> facilitate
>> >> the
>> >> >>>> user.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
>> forward to
>> >> >>>> your feedback.
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> [1]
>> >> >>>> >>> > >>>
>> >> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> --
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >>> Best regards,
>> >> >>>> >>> > >>> Mang Zhang
>> >> >>>> >>> > >>>
>> >> >>>> >>> > >
>> >> >>>> >>
>> >> >>>> >>
>> >> >>>> >>------------------------------
>> >> >>>> >>Best,
>> >> >>>> >>Ron
>> >> >>>>
>> >>
>>

Re: Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Jark Wu <im...@gmail.com>.
Hi Mang,

I'm not sure whether your response has addressed Yuxia's concern or not.
Would be better to receive a confirmation from participants before starting
the vote.

Actually, I have the same feeling with Yuxia's reply.

1) RTAS
If it's hard to have a consistent behavior for RTAS between streaming mode
and batch mode,
it's very possible that the "table.ctas-rtas.atomicity-enabled" is not
suitable and may need to
change in the future. If the RTAS will not be supported in this version and
the configuration
may be not suitable in the future, how about removing the "rtas" from the
config? We can
still evolve the config to "table.ctas-rtas" if the semantics are the same,
and still keeps backward compatibility.

2) AtomicCatalog
We won't add other methods to `AtomicCatalog` in the future, because new
methods required for isolation doesn't
belong to `AtomicCatalog`, maybe a new interface `IsolateCatalog`,
`TransactionalCatalog` or `StagingCalalog`.
So, I think Yuxia's concern is reasonable that it's confusing an atomic
catalog is just a serializable catalog.
How about just adding more javadocs on the `Catalog` interface to implement
`Serializable` and make the catalog
instances can be de/serialized using Java Serialization in case of
supporting CTAS for the catalog. The planner
should check the serialization for the catalog and throw an instruction for
users on how to adapt the catalog to support
CTAS. In this way, we don't need to introduce a new interface
`AtomicCatalog` or else.


Best,
Jark


On Thu, 30 Jun 2022 at 22:07, Mang Zhang <zh...@163.com> wrote:

> Hi Martijn,
> Thank you for your reply, these are two good questions.
> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >in the query of the sink table, it will be assumed that the user wants to
> >create a managed table. What will happen if the user doesn't have Table
> >Store configured/installed? Will we throw an error?
>
> If it is a Catalog that does not support managed table and no `connector`
> is specified, then the corresponding TableSink cannot be generated, will
> fail.
>
> If it is a Catalog that supports managed table and no `connector` is
> specified, then it will fail because the table store related configuration
> is not set and there is no table store related jar.
>
>
> >2. Will there be support included for FLIP-190 (version upgrades)?
> FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use
> scenarios more in Batch mode.
> CTAS atomicity implementation requires serialization support for Catalog
> and hook, which currently cannot be serialized into json, so they cannot be
> supported FLIP-190.
> Non-atomic implementations are able to support FLIP-190.
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
> >Hi Mang,
> >
> >I have two questions/remarks:
> >
> >1. The FLIP mentions that if the user doesn't specify the WITH option part
> >in the query of the sink table, it will be assumed that the user wants to
> >create a managed table. What will happen if the user doesn't have Table
> >Store configured/installed? Will we throw an error?
> >
> >2. Will there be support included for FLIP-190 (version upgrades)?
> >
> >Best regards,
> >
> >Martijn
> >
> >Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
> >
> >> Hi everyone,
> >> Thank you to all those who participated in the discussion, we have
> >> discussed many rounds, the program has been gradually revised and
> improved,
> >> looking forward to further feedback, we will launch a vote in the next
> day
> >> or two.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Mang Zhang
> >>
> >>
> >>
> >>
> >>
> >> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
> >> >Hi Yuxia,
> >> >Thank you very much for your reply.
> >> >
> >> >
> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> we're not to implement rtas in this FLIP, it may be better not to refer
> it
> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >Currently does not support RTAS because in the stream mode and batch
> mode
> >> semantic unification issues and specific business scenarios are not very
> >> clear, the future we will support, if in support of rtas and then modify
> >> the option name, then it will bring the cost of modifying the
> configuration
> >> to the user.
> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> Could you please explain about it. Some pseudocode will be much better
> if
> >> it's possible. I'm lost in this part.
> >> >
> >> >
> >> >
> >> >
> >> >This part is too much of an implementation detail, and of course we had
> >> to make some changes to achieve this. FLIP focuses on semantic
> consistency
> >> in stream and batch mode, and can provide optional atomicity support.
> >> >
> >> >
> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> naming is to implement atomic for ctas, we propose a interface for
> catalog
> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> the
> >> interface is for the atomic of ctas. But if we want to implement other
> >> features like isolate which may also require serializable catalog in the
> >> future, should we introduce a new interface naming `IsolateCatalog`?
> Have
> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> public interface, maybe we should be careful about the name.
> >> >Regarding the definition of the Catalog name, we have also discussed
> the
> >> name `SerializableCatalog`, which is too specific and does not relate to
> >> the atomic functionality we want to express. CTAS/RTAS want to support
> >> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
> >> straightforward to understand.
> >> >
> >> >
> >> >Hope this answers your question.
> >> >
> >> >
> >> >
> >> >
> >> >--
> >> >
> >> >Best regards,
> >> >Mang Zhang
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
> >> >>Thanks for updating. The FLIP looks generall good to me. I have only
> >> minor questions:
> >> >>
> >> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> >> nothing about rtas but refer it in the configuration suddenly.  And if
> >> we're not to implement rtas in this FLIP, it may be better not to refer
> it
> >> and the `rtas` shouldn't exposed to user as a configuration.
> >> >>
> >> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> >> Could you please explain about it. Some pseudocode will be much better
> if
> >> it's possible.  I'm lost in this part.
> >> >>
> >> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> >> naming is to implement atomic for ctas, we propose a interface for
> catalog
> >> to support serializing, then we name it to `AtomicCatalog`. At least,
> the
> >> interface is for the atomic of ctas. But if we want to implement other
> >> features like isolate which may also require serializable catalog in the
> >> future, should we introduce a new interface naming `IsolateCatalog`?
> Have
> >> you ever considered other names like `SerializableCatalog`.  As it's a
> >> public interface, maybe we should be careful about the name.
> >> >>
> >> >>
> >> >>Best regards,
> >> >>Yuxia
> >> >>
> >> >>----- 原始邮件 -----
> >> >>发件人: "Mang Zhang" <zh...@163.com>
> >> >>收件人: "dev" <de...@flink.apache.org>
> >> >>抄送: imjark@gmail.com
> >> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
> >> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT
> clause
> >> in CREATE TABLE(CTAS)
> >> >>
> >> >>Hi Jark,
> >> >>First of all, thank you for your very good advice!
> >> >>The RTAS point you mentioned is a good one, and we should support it
> as
> >> well.
> >> >>However, by investigating the semantics of RTAS and how RTAS is used
> >> within the company, I found that:
> >> >>1. The semantics of RTAS says that if the table exists, need to delete
> >> the old data and use the new data.
> >> >>This semantics is better implemented in Batch mode, for example, if
> the
> >> target table is a Hive table, old data file can be deleted directly.
> >> >>But in Streaming mode, the target table is probably a Kafka topic, we
> >> can't delete the data.
> >> >>So the semantics in streaming and batch scenarios are not well
> >> guaranteed to be consistent.
> >> >>2. I checked the SQL for big data in the company in the last week and
> >> found that RTAS was not used.
> >> >>No users in the company have mentioned the need for RTAS yet. So this
> >> application scenario is not very clear.
> >> >>
> >> >>
> >> >>It is not clear what kind of semantics RTAS should provide in
> streaming
> >> mode, and the user's business scenarios are not very clear.
> >> >>Maybe We don't have to support RTAS soon, but we can leave the
> >> possibility of supporting RTAS in the future in the interface
> definition.
> >> >>What do you think? Looking forward to your response!
> >> >>
> >> >>
> >> >>By the way, the other points raised have been updated. thanks.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>--
> >> >>
> >> >>Best regards,
> >> >>Mang Zhang
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
> >> >>>Thanks for the update, Mang and Ron,
> >> >>>
> >> >>>The new proposal looks good to me in general, especially keeping the
> >> >>>behavior
> >> >>>consistent between batch and streaming mode by default. This is how
> we
> >> do
> >> >>>it
> >> >>>in the previous "table.dml-sync" option on ML [1].
> >> >>>
> >> >>>Besides that, I just have some final minor comments regarding some
> >> >>>interfaces.
> >> >>>
> >> >>>1) table.ctas-or-rtas.atomicity-enabled
> >> >>>The "OR" keyword sounds like this configuration can only take effect
> on
> >> one
> >> >>>of CTAS and RTAS.
> >> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
> >> >>>
> >> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan
> to
> >> >>>support it.
> >> >>>RTAS is another widely used statement similar to CTAS. It seems
> there is
> >> >>>not much difference
> >> >>>between CTAS and RTAS. Considering we are introducing RTAS
> >> configurations,
> >> >>>is it possible
> >> >>> to support RTAS in this FLIP as well?
> >> >>>
> >> >>>3) connector.type
> >> >>>"connector.type" has been deprecated since FLIP-95, could you replace
> >> them
> >> >>>with 'connector'?
> >> >>>
> >> >>>4) SupportsAtomicCatalog
> >> >>>I have some concerns about using "Supports.." prefix which is known
> as
> >> the
> >> >>>ability extension for
> >> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
> >> enough?
> >> >>>
> >> >>>Best,
> >> >>>Jark
> >> >>>
> >> >>>[1]:
> https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
> >> >>>
> >> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
> >> >>>
> >> >>>> Hi all,
> >> >>>> Thank you to all those who participated in the discussion and made
> >> >>>> suggestions!
> >> >>>> After several rounds of online and offline discussions, the
> solution
> >> in
> >> >>>> FLIP has been updated.
> >> >>>> Looking forward to more feedback from everyone.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Best regards,
> >> >>>> Mang Zhang
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
> >> >>>> >Hi godfrey and ron,
> >> >>>> >Thank you very much for your replies and suggestions.
> >> >>>> >Special thanks to ron for helping to review and improve the FLIP.
> >> >>>> >Looking forward to further feedback from others.
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >--
> >> >>>> >
> >> >>>> >Best regards,
> >> >>>> >Mang Zhang
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >
> >> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
> >> >>>> >>Thanks for godfrey further feedback, your suggestions are very
> good
> >> to
> >> >>>> me, the FLIP has updated according to your feedback. It will be
> very
> >> good
> >> >>>> if you look at it again。
> >> >>>> >>
> >> >>>> >>Also looking forward to further feedback from others.
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>> -----原始邮件-----
> >> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
> >> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
> >> >>>> >>> 收件人: dev <de...@flink.apache.org>
> >> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
> >> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> CREATE
> >> >>>> TABLE(CTAS)
> >> >>>> >>>
> >> >>>> >>> Hi all,
> >> >>>> >>>
> >> >>>> >>> Sorry for the late reply.
> >> >>>> >>>
> >> >>>> >>> >table.cor-table-as-select.atomicity-enabled
> >> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
> >> >>>> >>>
> >> >>>> >>> >Create Table As Select(CTAS) feature depends on the
> >> serializability
> >> >>>> of the catalog. To quickly see if the catalog supports CTAS, we
> need
> >> to try
> >> >>>> to serialize the catalog when compile SQL in planner and if it
> fails,
> >> an
> >> >>>> exception will be >thrown to indicate to user that the catalog does
> >> not
> >> >>>> support CTAS because it cannot be serialized.
> >> >>>> >>> This behavior is too cryptic, and will break the current
> catalog
> >> >>>> >>> behavior when using 1.16.
> >> >>>> >>> I suggest we introduce a new interface for atomic catalog which
> >> >>>> >>> implements Serializable.
> >> >>>> >>>  The existent catalogs can choose whether implements the new
> >> catalog
> >> >>>> interface.
> >> >>>> >>>
> >> >>>> >>> > Catalog#inferTableOptions
> >> >>>> >>> I strongly recommend not introducing this feature now, because
> the
> >> >>>> >>> behavior is unclear.
> >> >>>> >>> 1) if the catalog support managed table, the connector option
> is
> >> >>>> >>> empty. but if user forget to
> >> >>>> >>> set connector option for CTAS statement, the created table
> will be
> >> >>>> >>> managed table.
> >> >>>> >>> 2) the options and its values for catalog and for connector
> may be
> >> >>>> different,
> >> >>>> >>> so use the catalog option may cause expected errors.
> >> >>>> >>>
> >> >>>> >>> > StreamGraph#addJobStatusHook
> >> >>>> >>> I prefer `registerJobStatusHook`
> >> >>>> >>>
> >> >>>> >>> Best,
> >> >>>> >>> Godfrey
> >> >>>> >>>
> >> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
> >> >>>> >>> >
> >> >>>> >>> > Hi Yun,
> >> >>>> >>> > Thanks for your reply!
> >> >>>> >>> > Through offline communication with Dalong, I updated the
> >> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> > --
> >> >>>> >>> >
> >> >>>> >>> > Best regards,
> >> >>>> >>> > Mang Zhang
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> >
> >> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao"
> <yungao.gy@aliyun.com.INVALID
> >> >
> >> >>>> wrote:
> >> >>>> >>> > >Hi,
> >> >>>> >>> > >
> >> >>>> >>> > >Regarding the drop operation, with some offline discussion
> with
> >> >>>> Dalong and Zhu,
> >> >>>> >>> > >we think that listening in the client side might be
> problematic
> >> >>>> since it would exit
> >> >>>> >>> > >after submitting the jobs in detached mode, thus the
> operation
> >> >>>> might need to
> >> >>>> >>> > >be in the JobMaster side.
> >> >>>> >>> > >
> >> >>>> >>> > >For the listener interface, currently JobListener only
> resides
> >> in
> >> >>>> the client side
> >> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
> >> >>>> scenario, and
> >> >>>> >>> > >the internal JobStatusListener is designed to be used inside
> >> JM and
> >> >>>> is not
> >> >>>> >>> > >serializable, thus we tend to add a new interface
> >> JobStatusHook,
> >> >>>> >>> > >which could be attached to the JobGraph and executed in the
> >> >>>> JobMaster.
> >> >>>> >>> > >The interface will also be marked as Internal.
> >> >>>> >>> > >
> >> >>>> >>> > >Best,
> >> >>>> >>> > >Yun
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> >
> >> >------------------------------------------------------------------
> >> >>>> >>> > >From:Mang Zhang <zh...@163.com>
> >> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
> >> >>>> >>> > >To:dev <de...@flink.apache.org>
> >> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
> >> CREATE
> >> >>>> TABLE(CTAS)
> >> >>>> >>> > >
> >> >>>> >>> > >Hi, Martijn
> >> >>>> >>> > >Thanks for your reply!
> >> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL
> standard.
> >> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >--
> >> >>>> >>> > >
> >> >>>> >>> > >Best regards,
> >> >>>> >>> > >Mang Zhang
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >
> >> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
> >> martijnvisser@apache.org>
> >> >>>> wrote:
> >> >>>> >>> > >>Hi everyone,
> >> >>>> >>> > >>
> >> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
> >> >>>> standard?
> >> >>>> >>> > >>
> >> >>>> >>> > >>Best regards,
> >> >>>> >>> > >>
> >> >>>> >>> > >>Martijn Visser
> >> >>>> >>> > >>https://twitter.com/MartijnVisser82
> >> >>>> >>> > >>https://github.com/MartijnVisser
> >> >>>> >>> > >>
> >> >>>> >>> > >>
> >> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
> >> luoyuxia@alumni.sjtu.edu.cn>
> >> >>>> wrote:
> >> >>>> >>> > >>
> >> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
> >> feature.
> >> >>>> >>> > >>> About the flip-218, I have some questions.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
> >> schema
> >> >>>> including
> >> >>>> >>> > >>> column name and data type? I think it maybe a useful
> fature
> >> in
> >> >>>> case we want
> >> >>>> >>> > >>> to change the data types in target table instead of
> always
> >> copy
> >> >>>> the source
> >> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
> >> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1]
> support
> >> this
> >> >>>> feature.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public
> interface
> >> to
> >> >>>> drop table,
> >> >>>> >>> > >>> so what's the interface will look like?
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> [1]
> >> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> Best regards,
> >> >>>> >>> > >>> Yuxia
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> ----- 原始邮件 -----
> >> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
> >> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
> >> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> >> >>>> TABLE(CTAS)
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> Hi, everyone
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> I would like to open a discussion for support select
> clause
> >> in
> >> >>>> CREATE
> >> >>>> >>> > >>> TABLE(CTAS),
> >> >>>> >>> > >>> With the development of business and the enhancement of
> >> flink sql
> >> >>>> >>> > >>> capabilities, queries become more and more complex.
> >> >>>> >>> > >>> Now the user needs to use the Create Table statement to
> >> create
> >> >>>> the target
> >> >>>> >>> > >>> table first, and then execute the insert statement.
> >> >>>> >>> > >>> However, the target table may have many columns, which
> will
> >> >>>> bring a lot of
> >> >>>> >>> > >>> work outside the business logic to the user.
> >> >>>> >>> > >>> At the same time, ensure that the schema of the created
> >> target
> >> >>>> table is
> >> >>>> >>> > >>> consistent with the schema of the query result.
> >> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly
> facilitate
> >> the
> >> >>>> user.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking
> forward to
> >> >>>> your feedback.
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> [1]
> >> >>>> >>> > >>>
> >> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> --
> >> >>>> >>> > >>>
> >> >>>> >>> > >>> Best regards,
> >> >>>> >>> > >>> Mang Zhang
> >> >>>> >>> > >>>
> >> >>>> >>> > >
> >> >>>> >>
> >> >>>> >>
> >> >>>> >>------------------------------
> >> >>>> >>Best,
> >> >>>> >>Ron
> >> >>>>
> >>
>

Re:Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi Martijn,
Thank you for your reply, these are two good questions.
>1. The FLIP mentions that if the user doesn't specify the WITH option part
>in the query of the sink table, it will be assumed that the user wants to
>create a managed table. What will happen if the user doesn't have Table
>Store configured/installed? Will we throw an error?

If it is a Catalog that does not support managed table and no `connector` is specified, then the corresponding TableSink cannot be generated, will fail.

If it is a Catalog that supports managed table and no `connector` is specified, then it will fail because the table store related configuration is not set and there is no table store related jar.


>2. Will there be support included for FLIP-190 (version upgrades)?
FLIP-190 mainly solves the problem of Streaming mode upgrade. FLIP-218 use scenarios more in Batch mode.
CTAS atomicity implementation requires serialization support for Catalog and hook, which currently cannot be serialized into json, so they cannot be supported FLIP-190.
Non-atomic implementations are able to support FLIP-190.







--

Best regards,
Mang Zhang





At 2022-06-30 16:47:38, "Martijn Visser" <ma...@apache.org> wrote:
>Hi Mang,
>
>I have two questions/remarks:
>
>1. The FLIP mentions that if the user doesn't specify the WITH option part
>in the query of the sink table, it will be assumed that the user wants to
>create a managed table. What will happen if the user doesn't have Table
>Store configured/installed? Will we throw an error?
>
>2. Will there be support included for FLIP-190 (version upgrades)?
>
>Best regards,
>
>Martijn
>
>Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:
>
>> Hi everyone,
>> Thank you to all those who participated in the discussion, we have
>> discussed many rounds, the program has been gradually revised and improved,
>> looking forward to further feedback, we will launch a vote in the next day
>> or two.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>
>>
>>
>>
>>
>> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>> >Hi Yuxia,
>> >Thank you very much for your reply.
>> >
>> >
>> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> nothing about rtas but refer it in the configuration suddenly.  And if
>> we're not to implement rtas in this FLIP, it may be better not to refer it
>> and the `rtas` shouldn't exposed to user as a configuration.
>> >Currently does not support RTAS because in the stream mode and batch mode
>> semantic unification issues and specific business scenarios are not very
>> clear, the future we will support, if in support of rtas and then modify
>> the option name, then it will bring the cost of modifying the configuration
>> to the user.
>> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> Could you please explain about it. Some pseudocode will be much better if
>> it's possible. I'm lost in this part.
>> >
>> >
>> >
>> >
>> >This part is too much of an implementation detail, and of course we had
>> to make some changes to achieve this. FLIP focuses on semantic consistency
>> in stream and batch mode, and can provide optional atomicity support.
>> >
>> >
>> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> naming is to implement atomic for ctas, we propose a interface for catalog
>> to support serializing, then we name it to `AtomicCatalog`. At least, the
>> interface is for the atomic of ctas. But if we want to implement other
>> features like isolate which may also require serializable catalog in the
>> future, should we introduce a new interface naming `IsolateCatalog`? Have
>> you ever considered other names like `SerializableCatalog`.  As it's a
>> public interface, maybe we should be careful about the name.
>> >Regarding the definition of the Catalog name, we have also discussed the
>> name `SerializableCatalog`, which is too specific and does not relate to
>> the atomic functionality we want to express. CTAS/RTAS want to support
>> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
>> straightforward to understand.
>> >
>> >
>> >Hope this answers your question.
>> >
>> >
>> >
>> >
>> >--
>> >
>> >Best regards,
>> >Mang Zhang
>> >
>> >
>> >
>> >
>> >
>> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>> >>Thanks for updating. The FLIP looks generall good to me. I have only
>> minor questions:
>> >>
>> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
>> nothing about rtas but refer it in the configuration suddenly.  And if
>> we're not to implement rtas in this FLIP, it may be better not to refer it
>> and the `rtas` shouldn't exposed to user as a configuration.
>> >>
>> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
>> Could you please explain about it. Some pseudocode will be much better if
>> it's possible.  I'm lost in this part.
>> >>
>> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
>> naming is to implement atomic for ctas, we propose a interface for catalog
>> to support serializing, then we name it to `AtomicCatalog`. At least, the
>> interface is for the atomic of ctas. But if we want to implement other
>> features like isolate which may also require serializable catalog in the
>> future, should we introduce a new interface naming `IsolateCatalog`? Have
>> you ever considered other names like `SerializableCatalog`.  As it's a
>> public interface, maybe we should be careful about the name.
>> >>
>> >>
>> >>Best regards,
>> >>Yuxia
>> >>
>> >>----- 原始邮件 -----
>> >>发件人: "Mang Zhang" <zh...@163.com>
>> >>收件人: "dev" <de...@flink.apache.org>
>> >>抄送: imjark@gmail.com
>> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause
>> in CREATE TABLE(CTAS)
>> >>
>> >>Hi Jark,
>> >>First of all, thank you for your very good advice!
>> >>The RTAS point you mentioned is a good one, and we should support it as
>> well.
>> >>However, by investigating the semantics of RTAS and how RTAS is used
>> within the company, I found that:
>> >>1. The semantics of RTAS says that if the table exists, need to delete
>> the old data and use the new data.
>> >>This semantics is better implemented in Batch mode, for example, if the
>> target table is a Hive table, old data file can be deleted directly.
>> >>But in Streaming mode, the target table is probably a Kafka topic, we
>> can't delete the data.
>> >>So the semantics in streaming and batch scenarios are not well
>> guaranteed to be consistent.
>> >>2. I checked the SQL for big data in the company in the last week and
>> found that RTAS was not used.
>> >>No users in the company have mentioned the need for RTAS yet. So this
>> application scenario is not very clear.
>> >>
>> >>
>> >>It is not clear what kind of semantics RTAS should provide in streaming
>> mode, and the user's business scenarios are not very clear.
>> >>Maybe We don't have to support RTAS soon, but we can leave the
>> possibility of supporting RTAS in the future in the interface definition.
>> >>What do you think? Looking forward to your response!
>> >>
>> >>
>> >>By the way, the other points raised have been updated. thanks.
>> >>
>> >>
>> >>
>> >>
>> >>--
>> >>
>> >>Best regards,
>> >>Mang Zhang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>> >>>Thanks for the update, Mang and Ron,
>> >>>
>> >>>The new proposal looks good to me in general, especially keeping the
>> >>>behavior
>> >>>consistent between batch and streaming mode by default. This is how we
>> do
>> >>>it
>> >>>in the previous "table.dml-sync" option on ML [1].
>> >>>
>> >>>Besides that, I just have some final minor comments regarding some
>> >>>interfaces.
>> >>>
>> >>>1) table.ctas-or-rtas.atomicity-enabled
>> >>>The "OR" keyword sounds like this configuration can only take effect on
>> one
>> >>>of CTAS and RTAS.
>> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>> >>>
>> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan to
>> >>>support it.
>> >>>RTAS is another widely used statement similar to CTAS. It seems there is
>> >>>not much difference
>> >>>between CTAS and RTAS. Considering we are introducing RTAS
>> configurations,
>> >>>is it possible
>> >>> to support RTAS in this FLIP as well?
>> >>>
>> >>>3) connector.type
>> >>>"connector.type" has been deprecated since FLIP-95, could you replace
>> them
>> >>>with 'connector'?
>> >>>
>> >>>4) SupportsAtomicCatalog
>> >>>I have some concerns about using "Supports.." prefix which is known as
>> the
>> >>>ability extension for
>> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
>> enough?
>> >>>
>> >>>Best,
>> >>>Jark
>> >>>
>> >>>[1]: https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>> >>>
>> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>> >>>
>> >>>> Hi all,
>> >>>> Thank you to all those who participated in the discussion and made
>> >>>> suggestions!
>> >>>> After several rounds of online and offline discussions, the solution
>> in
>> >>>> FLIP has been updated.
>> >>>> Looking forward to more feedback from everyone.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Best regards,
>> >>>> Mang Zhang
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >>>> >Hi godfrey and ron,
>> >>>> >Thank you very much for your replies and suggestions.
>> >>>> >Special thanks to ron for helping to review and improve the FLIP.
>> >>>> >Looking forward to further feedback from others.
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >--
>> >>>> >
>> >>>> >Best regards,
>> >>>> >Mang Zhang
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >>>> >>Thanks for godfrey further feedback, your suggestions are very good
>> to
>> >>>> me, the FLIP has updated according to your feedback. It will be very
>> good
>> >>>> if you look at it again。
>> >>>> >>
>> >>>> >>Also looking forward to further feedback from others.
>> >>>> >>
>> >>>> >>
>> >>>> >>> -----原始邮件-----
>> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >>>> >>> 收件人: dev <de...@flink.apache.org>
>> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> CREATE
>> >>>> TABLE(CTAS)
>> >>>> >>>
>> >>>> >>> Hi all,
>> >>>> >>>
>> >>>> >>> Sorry for the late reply.
>> >>>> >>>
>> >>>> >>> >table.cor-table-as-select.atomicity-enabled
>> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >>>> >>>
>> >>>> >>> >Create Table As Select(CTAS) feature depends on the
>> serializability
>> >>>> of the catalog. To quickly see if the catalog supports CTAS, we need
>> to try
>> >>>> to serialize the catalog when compile SQL in planner and if it fails,
>> an
>> >>>> exception will be >thrown to indicate to user that the catalog does
>> not
>> >>>> support CTAS because it cannot be serialized.
>> >>>> >>> This behavior is too cryptic, and will break the current catalog
>> >>>> >>> behavior when using 1.16.
>> >>>> >>> I suggest we introduce a new interface for atomic catalog which
>> >>>> >>> implements Serializable.
>> >>>> >>>  The existent catalogs can choose whether implements the new
>> catalog
>> >>>> interface.
>> >>>> >>>
>> >>>> >>> > Catalog#inferTableOptions
>> >>>> >>> I strongly recommend not introducing this feature now, because the
>> >>>> >>> behavior is unclear.
>> >>>> >>> 1) if the catalog support managed table, the connector option is
>> >>>> >>> empty. but if user forget to
>> >>>> >>> set connector option for CTAS statement, the created table will be
>> >>>> >>> managed table.
>> >>>> >>> 2) the options and its values for catalog and for connector may be
>> >>>> different,
>> >>>> >>> so use the catalog option may cause expected errors.
>> >>>> >>>
>> >>>> >>> > StreamGraph#addJobStatusHook
>> >>>> >>> I prefer `registerJobStatusHook`
>> >>>> >>>
>> >>>> >>> Best,
>> >>>> >>> Godfrey
>> >>>> >>>
>> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >>>> >>> >
>> >>>> >>> > Hi Yun,
>> >>>> >>> > Thanks for your reply!
>> >>>> >>> > Through offline communication with Dalong, I updated the
>> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > --
>> >>>> >>> >
>> >>>> >>> > Best regards,
>> >>>> >>> > Mang Zhang
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao" <yungao.gy@aliyun.com.INVALID
>> >
>> >>>> wrote:
>> >>>> >>> > >Hi,
>> >>>> >>> > >
>> >>>> >>> > >Regarding the drop operation, with some offline discussion with
>> >>>> Dalong and Zhu,
>> >>>> >>> > >we think that listening in the client side might be problematic
>> >>>> since it would exit
>> >>>> >>> > >after submitting the jobs in detached mode, thus the operation
>> >>>> might need to
>> >>>> >>> > >be in the JobMaster side.
>> >>>> >>> > >
>> >>>> >>> > >For the listener interface, currently JobListener only resides
>> in
>> >>>> the client side
>> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> >>>> scenario, and
>> >>>> >>> > >the internal JobStatusListener is designed to be used inside
>> JM and
>> >>>> is not
>> >>>> >>> > >serializable, thus we tend to add a new interface
>> JobStatusHook,
>> >>>> >>> > >which could be attached to the JobGraph and executed in the
>> >>>> JobMaster.
>> >>>> >>> > >The interface will also be marked as Internal.
>> >>>> >>> > >
>> >>>> >>> > >Best,
>> >>>> >>> > >Yun
>> >>>> >>> > >
>> >>>> >>> > >
>> >>>> >>> >
>> >------------------------------------------------------------------
>> >>>> >>> > >From:Mang Zhang <zh...@163.com>
>> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >>>> >>> > >To:dev <de...@flink.apache.org>
>> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
>> CREATE
>> >>>> TABLE(CTAS)
>> >>>> >>> > >
>> >>>> >>> > >Hi, Martijn
>> >>>> >>> > >Thanks for your reply!
>> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
>> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >>>> >>> > >
>> >>>> >>> > >
>> >>>> >>> > >
>> >>>> >>> > >--
>> >>>> >>> > >
>> >>>> >>> > >Best regards,
>> >>>> >>> > >Mang Zhang
>> >>>> >>> > >
>> >>>> >>> > >
>> >>>> >>> > >
>> >>>> >>> > >
>> >>>> >>> > >
>> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
>> martijnvisser@apache.org>
>> >>>> wrote:
>> >>>> >>> > >>Hi everyone,
>> >>>> >>> > >>
>> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> >>>> standard?
>> >>>> >>> > >>
>> >>>> >>> > >>Best regards,
>> >>>> >>> > >>
>> >>>> >>> > >>Martijn Visser
>> >>>> >>> > >>https://twitter.com/MartijnVisser82
>> >>>> >>> > >>https://github.com/MartijnVisser
>> >>>> >>> > >>
>> >>>> >>> > >>
>> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
>> luoyuxia@alumni.sjtu.edu.cn>
>> >>>> wrote:
>> >>>> >>> > >>
>> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
>> feature.
>> >>>> >>> > >>> About the flip-218, I have some questions.
>> >>>> >>> > >>>
>> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
>> schema
>> >>>> including
>> >>>> >>> > >>> column name and data type? I think it maybe a useful fature
>> in
>> >>>> case we want
>> >>>> >>> > >>> to change the data types in target table instead of always
>> copy
>> >>>> the source
>> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support
>> this
>> >>>> feature.
>> >>>> >>> > >>>
>> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public interface
>> to
>> >>>> drop table,
>> >>>> >>> > >>> so what's the interface will look like?
>> >>>> >>> > >>>
>> >>>> >>> > >>> [1]
>> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >>>> >>> > >>>
>> >>>> >>> > >>> Best regards,
>> >>>> >>> > >>> Yuxia
>> >>>> >>> > >>>
>> >>>> >>> > >>> ----- 原始邮件 -----
>> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> >>>> TABLE(CTAS)
>> >>>> >>> > >>>
>> >>>> >>> > >>> Hi, everyone
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>> I would like to open a discussion for support select clause
>> in
>> >>>> CREATE
>> >>>> >>> > >>> TABLE(CTAS),
>> >>>> >>> > >>> With the development of business and the enhancement of
>> flink sql
>> >>>> >>> > >>> capabilities, queries become more and more complex.
>> >>>> >>> > >>> Now the user needs to use the Create Table statement to
>> create
>> >>>> the target
>> >>>> >>> > >>> table first, and then execute the insert statement.
>> >>>> >>> > >>> However, the target table may have many columns, which will
>> >>>> bring a lot of
>> >>>> >>> > >>> work outside the business logic to the user.
>> >>>> >>> > >>> At the same time, ensure that the schema of the created
>> target
>> >>>> table is
>> >>>> >>> > >>> consistent with the schema of the query result.
>> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate
>> the
>> >>>> user.
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking forward to
>> >>>> your feedback.
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>> [1]
>> >>>> >>> > >>>
>> >>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>>
>> >>>> >>> > >>> --
>> >>>> >>> > >>>
>> >>>> >>> > >>> Best regards,
>> >>>> >>> > >>> Mang Zhang
>> >>>> >>> > >>>
>> >>>> >>> > >
>> >>>> >>
>> >>>> >>
>> >>>> >>------------------------------
>> >>>> >>Best,
>> >>>> >>Ron
>> >>>>
>>

Re: Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Martijn Visser <ma...@apache.org>.
Hi Mang,

I have two questions/remarks:

1. The FLIP mentions that if the user doesn't specify the WITH option part
in the query of the sink table, it will be assumed that the user wants to
create a managed table. What will happen if the user doesn't have Table
Store configured/installed? Will we throw an error?

2. Will there be support included for FLIP-190 (version upgrades)?

Best regards,

Martijn

Op wo 29 jun. 2022 om 05:18 schreef Mang Zhang <zh...@163.com>:

> Hi everyone,
> Thank you to all those who participated in the discussion, we have
> discussed many rounds, the program has been gradually revised and improved,
> looking forward to further feedback, we will launch a vote in the next day
> or two.
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
> >Hi Yuxia,
> >Thank you very much for your reply.
> >
> >
> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> nothing about rtas but refer it in the configuration suddenly.  And if
> we're not to implement rtas in this FLIP, it may be better not to refer it
> and the `rtas` shouldn't exposed to user as a configuration.
> >Currently does not support RTAS because in the stream mode and batch mode
> semantic unification issues and specific business scenarios are not very
> clear, the future we will support, if in support of rtas and then modify
> the option name, then it will bring the cost of modifying the configuration
> to the user.
> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> Could you please explain about it. Some pseudocode will be much better if
> it's possible. I'm lost in this part.
> >
> >
> >
> >
> >This part is too much of an implementation detail, and of course we had
> to make some changes to achieve this. FLIP focuses on semantic consistency
> in stream and batch mode, and can provide optional atomicity support.
> >
> >
> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> naming is to implement atomic for ctas, we propose a interface for catalog
> to support serializing, then we name it to `AtomicCatalog`. At least, the
> interface is for the atomic of ctas. But if we want to implement other
> features like isolate which may also require serializable catalog in the
> future, should we introduce a new interface naming `IsolateCatalog`? Have
> you ever considered other names like `SerializableCatalog`.  As it's a
> public interface, maybe we should be careful about the name.
> >Regarding the definition of the Catalog name, we have also discussed the
> name `SerializableCatalog`, which is too specific and does not relate to
> the atomic functionality we want to express. CTAS/RTAS want to support
> atomicity, need Catalog to implement `AtomicCatalog`, so it's more
> straightforward to understand.
> >
> >
> >Hope this answers your question.
> >
> >
> >
> >
> >--
> >
> >Best regards,
> >Mang Zhang
> >
> >
> >
> >
> >
> >At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
> >>Thanks for updating. The FLIP looks generall good to me. I have only
> minor questions:
> >>
> >>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks
> nothing about rtas but refer it in the configuration suddenly.  And if
> we're not to implement rtas in this FLIP, it may be better not to refer it
> and the `rtas` shouldn't exposed to user as a configuration.
> >>
> >>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook?
> Could you please explain about it. Some pseudocode will be much better if
> it's possible.  I'm lost in this part.
> >>
> >>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the
> naming is to implement atomic for ctas, we propose a interface for catalog
> to support serializing, then we name it to `AtomicCatalog`. At least, the
> interface is for the atomic of ctas. But if we want to implement other
> features like isolate which may also require serializable catalog in the
> future, should we introduce a new interface naming `IsolateCatalog`? Have
> you ever considered other names like `SerializableCatalog`.  As it's a
> public interface, maybe we should be careful about the name.
> >>
> >>
> >>Best regards,
> >>Yuxia
> >>
> >>----- 原始邮件 -----
> >>发件人: "Mang Zhang" <zh...@163.com>
> >>收件人: "dev" <de...@flink.apache.org>
> >>抄送: imjark@gmail.com
> >>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
> >>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause
> in CREATE TABLE(CTAS)
> >>
> >>Hi Jark,
> >>First of all, thank you for your very good advice!
> >>The RTAS point you mentioned is a good one, and we should support it as
> well.
> >>However, by investigating the semantics of RTAS and how RTAS is used
> within the company, I found that:
> >>1. The semantics of RTAS says that if the table exists, need to delete
> the old data and use the new data.
> >>This semantics is better implemented in Batch mode, for example, if the
> target table is a Hive table, old data file can be deleted directly.
> >>But in Streaming mode, the target table is probably a Kafka topic, we
> can't delete the data.
> >>So the semantics in streaming and batch scenarios are not well
> guaranteed to be consistent.
> >>2. I checked the SQL for big data in the company in the last week and
> found that RTAS was not used.
> >>No users in the company have mentioned the need for RTAS yet. So this
> application scenario is not very clear.
> >>
> >>
> >>It is not clear what kind of semantics RTAS should provide in streaming
> mode, and the user's business scenarios are not very clear.
> >>Maybe We don't have to support RTAS soon, but we can leave the
> possibility of supporting RTAS in the future in the interface definition.
> >>What do you think? Looking forward to your response!
> >>
> >>
> >>By the way, the other points raised have been updated. thanks.
> >>
> >>
> >>
> >>
> >>--
> >>
> >>Best regards,
> >>Mang Zhang
> >>
> >>
> >>
> >>
> >>
> >>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
> >>>Thanks for the update, Mang and Ron,
> >>>
> >>>The new proposal looks good to me in general, especially keeping the
> >>>behavior
> >>>consistent between batch and streaming mode by default. This is how we
> do
> >>>it
> >>>in the previous "table.dml-sync" option on ML [1].
> >>>
> >>>Besides that, I just have some final minor comments regarding some
> >>>interfaces.
> >>>
> >>>1) table.ctas-or-rtas.atomicity-enabled
> >>>The "OR" keyword sounds like this configuration can only take effect on
> one
> >>>of CTAS and RTAS.
> >>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
> >>>
> >>>2) In the FLIP, you have mentioned RTAS many times, but have no plan to
> >>>support it.
> >>>RTAS is another widely used statement similar to CTAS. It seems there is
> >>>not much difference
> >>>between CTAS and RTAS. Considering we are introducing RTAS
> configurations,
> >>>is it possible
> >>> to support RTAS in this FLIP as well?
> >>>
> >>>3) connector.type
> >>>"connector.type" has been deprecated since FLIP-95, could you replace
> them
> >>>with 'connector'?
> >>>
> >>>4) SupportsAtomicCatalog
> >>>I have some concerns about using "Supports.." prefix which is known as
> the
> >>>ability extension for
> >>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is
> enough?
> >>>
> >>>Best,
> >>>Jark
> >>>
> >>>[1]: https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
> >>>
> >>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
> >>>
> >>>> Hi all,
> >>>> Thank you to all those who participated in the discussion and made
> >>>> suggestions!
> >>>> After several rounds of online and offline discussions, the solution
> in
> >>>> FLIP has been updated.
> >>>> Looking forward to more feedback from everyone.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>> Mang Zhang
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
> >>>> >Hi godfrey and ron,
> >>>> >Thank you very much for your replies and suggestions.
> >>>> >Special thanks to ron for helping to review and improve the FLIP.
> >>>> >Looking forward to further feedback from others.
> >>>> >
> >>>> >
> >>>> >
> >>>> >--
> >>>> >
> >>>> >Best regards,
> >>>> >Mang Zhang
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
> >>>> >>Thanks for godfrey further feedback, your suggestions are very good
> to
> >>>> me, the FLIP has updated according to your feedback. It will be very
> good
> >>>> if you look at it again。
> >>>> >>
> >>>> >>Also looking forward to further feedback from others.
> >>>> >>
> >>>> >>
> >>>> >>> -----原始邮件-----
> >>>> >>> 发件人: "godfrey he" <go...@gmail.com>
> >>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
> >>>> >>> 收件人: dev <de...@flink.apache.org>
> >>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
> >>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in
> CREATE
> >>>> TABLE(CTAS)
> >>>> >>>
> >>>> >>> Hi all,
> >>>> >>>
> >>>> >>> Sorry for the late reply.
> >>>> >>>
> >>>> >>> >table.cor-table-as-select.atomicity-enabled
> >>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
> >>>> >>>
> >>>> >>> >Create Table As Select(CTAS) feature depends on the
> serializability
> >>>> of the catalog. To quickly see if the catalog supports CTAS, we need
> to try
> >>>> to serialize the catalog when compile SQL in planner and if it fails,
> an
> >>>> exception will be >thrown to indicate to user that the catalog does
> not
> >>>> support CTAS because it cannot be serialized.
> >>>> >>> This behavior is too cryptic, and will break the current catalog
> >>>> >>> behavior when using 1.16.
> >>>> >>> I suggest we introduce a new interface for atomic catalog which
> >>>> >>> implements Serializable.
> >>>> >>>  The existent catalogs can choose whether implements the new
> catalog
> >>>> interface.
> >>>> >>>
> >>>> >>> > Catalog#inferTableOptions
> >>>> >>> I strongly recommend not introducing this feature now, because the
> >>>> >>> behavior is unclear.
> >>>> >>> 1) if the catalog support managed table, the connector option is
> >>>> >>> empty. but if user forget to
> >>>> >>> set connector option for CTAS statement, the created table will be
> >>>> >>> managed table.
> >>>> >>> 2) the options and its values for catalog and for connector may be
> >>>> different,
> >>>> >>> so use the catalog option may cause expected errors.
> >>>> >>>
> >>>> >>> > StreamGraph#addJobStatusHook
> >>>> >>> I prefer `registerJobStatusHook`
> >>>> >>>
> >>>> >>> Best,
> >>>> >>> Godfrey
> >>>> >>>
> >>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
> >>>> >>> >
> >>>> >>> > Hi Yun,
> >>>> >>> > Thanks for your reply!
> >>>> >>> > Through offline communication with Dalong, I updated the
> >>>> JobStatusHook part to FLIP, looking forward to your feedback.
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > --
> >>>> >>> >
> >>>> >>> > Best regards,
> >>>> >>> > Mang Zhang
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >
> >>>> >>> >
> >>>> >>> > At 2022-05-31 14:34:25, "Yun Gao" <yungao.gy@aliyun.com.INVALID
> >
> >>>> wrote:
> >>>> >>> > >Hi,
> >>>> >>> > >
> >>>> >>> > >Regarding the drop operation, with some offline discussion with
> >>>> Dalong and Zhu,
> >>>> >>> > >we think that listening in the client side might be problematic
> >>>> since it would exit
> >>>> >>> > >after submitting the jobs in detached mode, thus the operation
> >>>> might need to
> >>>> >>> > >be in the JobMaster side.
> >>>> >>> > >
> >>>> >>> > >For the listener interface, currently JobListener only resides
> in
> >>>> the client side
> >>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
> >>>> scenario, and
> >>>> >>> > >the internal JobStatusListener is designed to be used inside
> JM and
> >>>> is not
> >>>> >>> > >serializable, thus we tend to add a new interface
> JobStatusHook,
> >>>> >>> > >which could be attached to the JobGraph and executed in the
> >>>> JobMaster.
> >>>> >>> > >The interface will also be marked as Internal.
> >>>> >>> > >
> >>>> >>> > >Best,
> >>>> >>> > >Yun
> >>>> >>> > >
> >>>> >>> > >
> >>>> >>> >
> >------------------------------------------------------------------
> >>>> >>> > >From:Mang Zhang <zh...@163.com>
> >>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
> >>>> >>> > >To:dev <de...@flink.apache.org>
> >>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in
> CREATE
> >>>> TABLE(CTAS)
> >>>> >>> > >
> >>>> >>> > >Hi, Martijn
> >>>> >>> > >Thanks for your reply!
> >>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
> >>>> >>> > >Feature T172 is "AS subquery clause in table definition".
> >>>> >>> > >
> >>>> >>> > >
> >>>> >>> > >
> >>>> >>> > >--
> >>>> >>> > >
> >>>> >>> > >Best regards,
> >>>> >>> > >Mang Zhang
> >>>> >>> > >
> >>>> >>> > >
> >>>> >>> > >
> >>>> >>> > >
> >>>> >>> > >
> >>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <
> martijnvisser@apache.org>
> >>>> wrote:
> >>>> >>> > >>Hi everyone,
> >>>> >>> > >>
> >>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
> >>>> standard?
> >>>> >>> > >>
> >>>> >>> > >>Best regards,
> >>>> >>> > >>
> >>>> >>> > >>Martijn Visser
> >>>> >>> > >>https://twitter.com/MartijnVisser82
> >>>> >>> > >>https://github.com/MartijnVisser
> >>>> >>> > >>
> >>>> >>> > >>
> >>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <
> luoyuxia@alumni.sjtu.edu.cn>
> >>>> wrote:
> >>>> >>> > >>
> >>>> >>> > >>> Thanks for for driving this work, it's to be a useful
> feature.
> >>>> >>> > >>> About the flip-218, I have some questions.
> >>>> >>> > >>>
> >>>> >>> > >>> 1: Does our CTAS syntax support specify target table's
> schema
> >>>> including
> >>>> >>> > >>> column name and data type? I think it maybe a useful fature
> in
> >>>> case we want
> >>>> >>> > >>> to change the data types in target table instead of always
> copy
> >>>> the source
> >>>> >>> > >>> table's schema. It'll be more flexible with this feature.
> >>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support
> this
> >>>> feature.
> >>>> >>> > >>>
> >>>> >>> > >>> 2: Seems it'll requre sink to implement an public interface
> to
> >>>> drop table,
> >>>> >>> > >>> so what's the interface will look like?
> >>>> >>> > >>>
> >>>> >>> > >>> [1]
> >>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >>>> >>> > >>>
> >>>> >>> > >>> Best regards,
> >>>> >>> > >>> Yuxia
> >>>> >>> > >>>
> >>>> >>> > >>> ----- 原始邮件 -----
> >>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
> >>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
> >>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> >>>> TABLE(CTAS)
> >>>> >>> > >>>
> >>>> >>> > >>> Hi, everyone
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>> I would like to open a discussion for support select clause
> in
> >>>> CREATE
> >>>> >>> > >>> TABLE(CTAS),
> >>>> >>> > >>> With the development of business and the enhancement of
> flink sql
> >>>> >>> > >>> capabilities, queries become more and more complex.
> >>>> >>> > >>> Now the user needs to use the Create Table statement to
> create
> >>>> the target
> >>>> >>> > >>> table first, and then execute the insert statement.
> >>>> >>> > >>> However, the target table may have many columns, which will
> >>>> bring a lot of
> >>>> >>> > >>> work outside the business logic to the user.
> >>>> >>> > >>> At the same time, ensure that the schema of the created
> target
> >>>> table is
> >>>> >>> > >>> consistent with the schema of the query result.
> >>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate
> the
> >>>> user.
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>> You can find more details in FLIP-218[1]. Looking forward to
> >>>> your feedback.
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>> [1]
> >>>> >>> > >>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>>
> >>>> >>> > >>> --
> >>>> >>> > >>>
> >>>> >>> > >>> Best regards,
> >>>> >>> > >>> Mang Zhang
> >>>> >>> > >>>
> >>>> >>> > >
> >>>> >>
> >>>> >>
> >>>> >>------------------------------
> >>>> >>Best,
> >>>> >>Ron
> >>>>
>

Re:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi everyone,
Thank you to all those who participated in the discussion, we have discussed many rounds, the program has been gradually revised and improved, 
looking forward to further feedback, we will launch a vote in the next day or two.







--

Best regards,
Mang Zhang





At 2022-06-28 22:23:16, "Mang Zhang" <zh...@163.com> wrote:
>Hi Yuxia,
>Thank you very much for your reply.
>
>
>>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks nothing about rtas but refer it in the configuration suddenly.  And if we're not to implement rtas in this FLIP, it may be better not to refer it and the `rtas` shouldn't exposed to user as a configuration.
>Currently does not support RTAS because in the stream mode and batch mode semantic unification issues and specific business scenarios are not very clear, the future we will support, if in support of rtas and then modify the option name, then it will bring the cost of modifying the configuration to the user.
>>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook? Could you please explain about it. Some pseudocode will be much better if it's possible. I'm lost in this part.
>
>
>
>
>This part is too much of an implementation detail, and of course we had to make some changes to achieve this. FLIP focuses on semantic consistency in stream and batch mode, and can provide optional atomicity support.
>
>
>>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the naming is to implement atomic for ctas, we propose a interface for catalog to support serializing, then we name it to `AtomicCatalog`. At least, the interface is for the atomic of ctas. But if we want to implement other features like isolate which may also require serializable catalog in the future, should we introduce a new interface naming `IsolateCatalog`? Have you ever considered other names like `SerializableCatalog`.  As it's a public interface, maybe we should be careful about the name. 
>Regarding the definition of the Catalog name, we have also discussed the name `SerializableCatalog`, which is too specific and does not relate to the atomic functionality we want to express. CTAS/RTAS want to support atomicity, need Catalog to implement `AtomicCatalog`, so it's more straightforward to understand. 
>
>
>Hope this answers your question.
>
>
>
>
>--
>
>Best regards,
>Mang Zhang
>
>
>
>
>
>At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>>Thanks for updating. The FLIP looks generall good to me. I have only minor questions:
>>
>>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks nothing about rtas but refer it in the configuration suddenly.  And if we're not to implement rtas in this FLIP, it may be better not to refer it and the `rtas` shouldn't exposed to user as a configuration.
>>
>>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook? Could you please explain about it. Some pseudocode will be much better if it's possible.  I'm lost in this part.
>>
>>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the naming is to implement atomic for ctas, we propose a interface for catalog to support serializing, then we name it to `AtomicCatalog`. At least, the interface is for the atomic of ctas. But if we want to implement other features like isolate which may also require serializable catalog in the future, should we introduce a new interface naming `IsolateCatalog`? Have you ever considered other names like `SerializableCatalog`.  As it's a public interface, maybe we should be careful about the name. 
>>
>>
>>Best regards,
>>Yuxia
>>
>>----- 原始邮件 -----
>>发件人: "Mang Zhang" <zh...@163.com>
>>收件人: "dev" <de...@flink.apache.org>
>>抄送: imjark@gmail.com
>>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>>
>>Hi Jark,
>>First of all, thank you for your very good advice!
>>The RTAS point you mentioned is a good one, and we should support it as well. 
>>However, by investigating the semantics of RTAS and how RTAS is used within the company, I found that:
>>1. The semantics of RTAS says that if the table exists, need to delete the old data and use the new data.
>>This semantics is better implemented in Batch mode, for example, if the target table is a Hive table, old data file can be deleted directly.
>>But in Streaming mode, the target table is probably a Kafka topic, we can't delete the data.
>>So the semantics in streaming and batch scenarios are not well guaranteed to be consistent.
>>2. I checked the SQL for big data in the company in the last week and found that RTAS was not used.
>>No users in the company have mentioned the need for RTAS yet. So this application scenario is not very clear.
>>
>>
>>It is not clear what kind of semantics RTAS should provide in streaming mode, and the user's business scenarios are not very clear.
>>Maybe We don't have to support RTAS soon, but we can leave the possibility of supporting RTAS in the future in the interface definition.
>>What do you think? Looking forward to your response!
>>
>>
>>By the way, the other points raised have been updated. thanks.
>>
>>
>>
>>
>>--
>>
>>Best regards,
>>Mang Zhang
>>
>>
>>
>>
>>
>>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>>>Thanks for the update, Mang and Ron,
>>>
>>>The new proposal looks good to me in general, especially keeping the
>>>behavior
>>>consistent between batch and streaming mode by default. This is how we do
>>>it
>>>in the previous "table.dml-sync" option on ML [1].
>>>
>>>Besides that, I just have some final minor comments regarding some
>>>interfaces.
>>>
>>>1) table.ctas-or-rtas.atomicity-enabled
>>>The "OR" keyword sounds like this configuration can only take effect on one
>>>of CTAS and RTAS.
>>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>>>
>>>2) In the FLIP, you have mentioned RTAS many times, but have no plan to
>>>support it.
>>>RTAS is another widely used statement similar to CTAS. It seems there is
>>>not much difference
>>>between CTAS and RTAS. Considering we are introducing RTAS configurations,
>>>is it possible
>>> to support RTAS in this FLIP as well?
>>>
>>>3) connector.type
>>>"connector.type" has been deprecated since FLIP-95, could you replace them
>>>with 'connector'?
>>>
>>>4) SupportsAtomicCatalog
>>>I have some concerns about using "Supports.." prefix which is known as the
>>>ability extension for
>>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is enough?
>>>
>>>Best,
>>>Jark
>>>
>>>[1]: https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>>>
>>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>>>
>>>> Hi all,
>>>> Thank you to all those who participated in the discussion and made
>>>> suggestions!
>>>> After several rounds of online and offline discussions, the solution in
>>>> FLIP has been updated.
>>>> Looking forward to more feedback from everyone.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>> Mang Zhang
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>>>> >Hi godfrey and ron,
>>>> >Thank you very much for your replies and suggestions.
>>>> >Special thanks to ron for helping to review and improve the FLIP.
>>>> >Looking forward to further feedback from others.
>>>> >
>>>> >
>>>> >
>>>> >--
>>>> >
>>>> >Best regards,
>>>> >Mang Zhang
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>>>> >>Thanks for godfrey further feedback, your suggestions are very good to
>>>> me, the FLIP has updated according to your feedback. It will be very good
>>>> if you look at it again。
>>>> >>
>>>> >>Also looking forward to further feedback from others.
>>>> >>
>>>> >>
>>>> >>> -----原始邮件-----
>>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>>>> >>> 收件人: dev <de...@flink.apache.org>
>>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>>>> TABLE(CTAS)
>>>> >>>
>>>> >>> Hi all,
>>>> >>>
>>>> >>> Sorry for the late reply.
>>>> >>>
>>>> >>> >table.cor-table-as-select.atomicity-enabled
>>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>>>> >>>
>>>> >>> >Create Table As Select(CTAS) feature depends on the serializability
>>>> of the catalog. To quickly see if the catalog supports CTAS, we need to try
>>>> to serialize the catalog when compile SQL in planner and if it fails, an
>>>> exception will be >thrown to indicate to user that the catalog does not
>>>> support CTAS because it cannot be serialized.
>>>> >>> This behavior is too cryptic, and will break the current catalog
>>>> >>> behavior when using 1.16.
>>>> >>> I suggest we introduce a new interface for atomic catalog which
>>>> >>> implements Serializable.
>>>> >>>  The existent catalogs can choose whether implements the new catalog
>>>> interface.
>>>> >>>
>>>> >>> > Catalog#inferTableOptions
>>>> >>> I strongly recommend not introducing this feature now, because the
>>>> >>> behavior is unclear.
>>>> >>> 1) if the catalog support managed table, the connector option is
>>>> >>> empty. but if user forget to
>>>> >>> set connector option for CTAS statement, the created table will be
>>>> >>> managed table.
>>>> >>> 2) the options and its values for catalog and for connector may be
>>>> different,
>>>> >>> so use the catalog option may cause expected errors.
>>>> >>>
>>>> >>> > StreamGraph#addJobStatusHook
>>>> >>> I prefer `registerJobStatusHook`
>>>> >>>
>>>> >>> Best,
>>>> >>> Godfrey
>>>> >>>
>>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>>>> >>> >
>>>> >>> > Hi Yun,
>>>> >>> > Thanks for your reply!
>>>> >>> > Through offline communication with Dalong, I updated the
>>>> JobStatusHook part to FLIP, looking forward to your feedback.
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > --
>>>> >>> >
>>>> >>> > Best regards,
>>>> >>> > Mang Zhang
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID>
>>>> wrote:
>>>> >>> > >Hi,
>>>> >>> > >
>>>> >>> > >Regarding the drop operation, with some offline discussion with
>>>> Dalong and Zhu,
>>>> >>> > >we think that listening in the client side might be problematic
>>>> since it would exit
>>>> >>> > >after submitting the jobs in detached mode, thus the operation
>>>> might need to
>>>> >>> > >be in the JobMaster side.
>>>> >>> > >
>>>> >>> > >For the listener interface, currently JobListener only resides in
>>>> the client side
>>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>>>> scenario, and
>>>> >>> > >the internal JobStatusListener is designed to be used inside JM and
>>>> is not
>>>> >>> > >serializable, thus we tend to add a new interface JobStatusHook,
>>>> >>> > >which could be attached to the JobGraph and executed in the
>>>> JobMaster.
>>>> >>> > >The interface will also be marked as Internal.
>>>> >>> > >
>>>> >>> > >Best,
>>>> >>> > >Yun
>>>> >>> > >
>>>> >>> > >
>>>> >>> > >------------------------------------------------------------------
>>>> >>> > >From:Mang Zhang <zh...@163.com>
>>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>>>> >>> > >To:dev <de...@flink.apache.org>
>>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>>>> TABLE(CTAS)
>>>> >>> > >
>>>> >>> > >Hi, Martijn
>>>> >>> > >Thanks for your reply!
>>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
>>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>>>> >>> > >
>>>> >>> > >
>>>> >>> > >
>>>> >>> > >--
>>>> >>> > >
>>>> >>> > >Best regards,
>>>> >>> > >Mang Zhang
>>>> >>> > >
>>>> >>> > >
>>>> >>> > >
>>>> >>> > >
>>>> >>> > >
>>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org>
>>>> wrote:
>>>> >>> > >>Hi everyone,
>>>> >>> > >>
>>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>>>> standard?
>>>> >>> > >>
>>>> >>> > >>Best regards,
>>>> >>> > >>
>>>> >>> > >>Martijn Visser
>>>> >>> > >>https://twitter.com/MartijnVisser82
>>>> >>> > >>https://github.com/MartijnVisser
>>>> >>> > >>
>>>> >>> > >>
>>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn>
>>>> wrote:
>>>> >>> > >>
>>>> >>> > >>> Thanks for for driving this work, it's to be a useful feature.
>>>> >>> > >>> About the flip-218, I have some questions.
>>>> >>> > >>>
>>>> >>> > >>> 1: Does our CTAS syntax support specify target table's schema
>>>> including
>>>> >>> > >>> column name and data type? I think it maybe a useful fature in
>>>> case we want
>>>> >>> > >>> to change the data types in target table instead of always copy
>>>> the source
>>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this
>>>> feature.
>>>> >>> > >>>
>>>> >>> > >>> 2: Seems it'll requre sink to implement an public interface to
>>>> drop table,
>>>> >>> > >>> so what's the interface will look like?
>>>> >>> > >>>
>>>> >>> > >>> [1]
>>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>>>> >>> > >>>
>>>> >>> > >>> Best regards,
>>>> >>> > >>> Yuxia
>>>> >>> > >>>
>>>> >>> > >>> ----- 原始邮件 -----
>>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>>>> TABLE(CTAS)
>>>> >>> > >>>
>>>> >>> > >>> Hi, everyone
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>> I would like to open a discussion for support select clause in
>>>> CREATE
>>>> >>> > >>> TABLE(CTAS),
>>>> >>> > >>> With the development of business and the enhancement of flink sql
>>>> >>> > >>> capabilities, queries become more and more complex.
>>>> >>> > >>> Now the user needs to use the Create Table statement to create
>>>> the target
>>>> >>> > >>> table first, and then execute the insert statement.
>>>> >>> > >>> However, the target table may have many columns, which will
>>>> bring a lot of
>>>> >>> > >>> work outside the business logic to the user.
>>>> >>> > >>> At the same time, ensure that the schema of the created target
>>>> table is
>>>> >>> > >>> consistent with the schema of the query result.
>>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the
>>>> user.
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>> You can find more details in FLIP-218[1]. Looking forward to
>>>> your feedback.
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>> [1]
>>>> >>> > >>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>>
>>>> >>> > >>> --
>>>> >>> > >>>
>>>> >>> > >>> Best regards,
>>>> >>> > >>> Mang Zhang
>>>> >>> > >>>
>>>> >>> > >
>>>> >>
>>>> >>
>>>> >>------------------------------
>>>> >>Best,
>>>> >>Ron
>>>>

Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi Yuxia,
Thank you very much for your reply.


>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks nothing about rtas but refer it in the configuration suddenly.  And if we're not to implement rtas in this FLIP, it may be better not to refer it and the `rtas` shouldn't exposed to user as a configuration.
Currently does not support RTAS because in the stream mode and batch mode semantic unification issues and specific business scenarios are not very clear, the future we will support, if in support of rtas and then modify the option name, then it will bring the cost of modifying the configuration to the user.
>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook? Could you please explain about it. Some pseudocode will be much better if it's possible. I'm lost in this part.




This part is too much of an implementation detail, and of course we had to make some changes to achieve this. FLIP focuses on semantic consistency in stream and batch mode, and can provide optional atomicity support.


>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the naming is to implement atomic for ctas, we propose a interface for catalog to support serializing, then we name it to `AtomicCatalog`. At least, the interface is for the atomic of ctas. But if we want to implement other features like isolate which may also require serializable catalog in the future, should we introduce a new interface naming `IsolateCatalog`? Have you ever considered other names like `SerializableCatalog`.  As it's a public interface, maybe we should be careful about the name. 
Regarding the definition of the Catalog name, we have also discussed the name `SerializableCatalog`, which is too specific and does not relate to the atomic functionality we want to express. CTAS/RTAS want to support atomicity, need Catalog to implement `AtomicCatalog`, so it's more straightforward to understand. 


Hope this answers your question.




--

Best regards,
Mang Zhang





At 2022-06-28 11:36:51, "yuxia" <lu...@alumni.sjtu.edu.cn> wrote:
>Thanks for updating. The FLIP looks generall good to me. I have only minor questions:
>
>1: Also, the mixture of ctas and rtas confuses me as the FLIP talks nothing about rtas but refer it in the configuration suddenly.  And if we're not to implement rtas in this FLIP, it may be better not to refer it and the `rtas` shouldn't exposed to user as a configuration.
>
>2: How will the CTASJobStatusHook be passed to StreamGraph as a hook? Could you please explain about it. Some pseudocode will be much better if it's possible.  I'm lost in this part.
>
>3: The name `AtomicCatalog` confuses me. Seems the backgroud for the naming is to implement atomic for ctas, we propose a interface for catalog to support serializing, then we name it to `AtomicCatalog`. At least, the interface is for the atomic of ctas. But if we want to implement other features like isolate which may also require serializable catalog in the future, should we introduce a new interface naming `IsolateCatalog`? Have you ever considered other names like `SerializableCatalog`.  As it's a public interface, maybe we should be careful about the name. 
>
>
>Best regards,
>Yuxia
>
>----- 原始邮件 -----
>发件人: "Mang Zhang" <zh...@163.com>
>收件人: "dev" <de...@flink.apache.org>
>抄送: imjark@gmail.com
>发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
>主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>
>Hi Jark,
>First of all, thank you for your very good advice!
>The RTAS point you mentioned is a good one, and we should support it as well. 
>However, by investigating the semantics of RTAS and how RTAS is used within the company, I found that:
>1. The semantics of RTAS says that if the table exists, need to delete the old data and use the new data.
>This semantics is better implemented in Batch mode, for example, if the target table is a Hive table, old data file can be deleted directly.
>But in Streaming mode, the target table is probably a Kafka topic, we can't delete the data.
>So the semantics in streaming and batch scenarios are not well guaranteed to be consistent.
>2. I checked the SQL for big data in the company in the last week and found that RTAS was not used.
>No users in the company have mentioned the need for RTAS yet. So this application scenario is not very clear.
>
>
>It is not clear what kind of semantics RTAS should provide in streaming mode, and the user's business scenarios are not very clear.
>Maybe We don't have to support RTAS soon, but we can leave the possibility of supporting RTAS in the future in the interface definition.
>What do you think? Looking forward to your response!
>
>
>By the way, the other points raised have been updated. thanks.
>
>
>
>
>--
>
>Best regards,
>Mang Zhang
>
>
>
>
>
>At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>>Thanks for the update, Mang and Ron,
>>
>>The new proposal looks good to me in general, especially keeping the
>>behavior
>>consistent between batch and streaming mode by default. This is how we do
>>it
>>in the previous "table.dml-sync" option on ML [1].
>>
>>Besides that, I just have some final minor comments regarding some
>>interfaces.
>>
>>1) table.ctas-or-rtas.atomicity-enabled
>>The "OR" keyword sounds like this configuration can only take effect on one
>>of CTAS and RTAS.
>>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>>
>>2) In the FLIP, you have mentioned RTAS many times, but have no plan to
>>support it.
>>RTAS is another widely used statement similar to CTAS. It seems there is
>>not much difference
>>between CTAS and RTAS. Considering we are introducing RTAS configurations,
>>is it possible
>> to support RTAS in this FLIP as well?
>>
>>3) connector.type
>>"connector.type" has been deprecated since FLIP-95, could you replace them
>>with 'connector'?
>>
>>4) SupportsAtomicCatalog
>>I have some concerns about using "Supports.." prefix which is known as the
>>ability extension for
>>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is enough?
>>
>>Best,
>>Jark
>>
>>[1]: https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>>
>>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>>
>>> Hi all,
>>> Thank you to all those who participated in the discussion and made
>>> suggestions!
>>> After several rounds of online and offline discussions, the solution in
>>> FLIP has been updated.
>>> Looking forward to more feedback from everyone.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Best regards,
>>> Mang Zhang
>>>
>>>
>>>
>>>
>>>
>>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>>> >Hi godfrey and ron,
>>> >Thank you very much for your replies and suggestions.
>>> >Special thanks to ron for helping to review and improve the FLIP.
>>> >Looking forward to further feedback from others.
>>> >
>>> >
>>> >
>>> >--
>>> >
>>> >Best regards,
>>> >Mang Zhang
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>>> >>Thanks for godfrey further feedback, your suggestions are very good to
>>> me, the FLIP has updated according to your feedback. It will be very good
>>> if you look at it again。
>>> >>
>>> >>Also looking forward to further feedback from others.
>>> >>
>>> >>
>>> >>> -----原始邮件-----
>>> >>> 发件人: "godfrey he" <go...@gmail.com>
>>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>>> >>> 收件人: dev <de...@flink.apache.org>
>>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>>> TABLE(CTAS)
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> Sorry for the late reply.
>>> >>>
>>> >>> >table.cor-table-as-select.atomicity-enabled
>>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>>> >>>
>>> >>> >Create Table As Select(CTAS) feature depends on the serializability
>>> of the catalog. To quickly see if the catalog supports CTAS, we need to try
>>> to serialize the catalog when compile SQL in planner and if it fails, an
>>> exception will be >thrown to indicate to user that the catalog does not
>>> support CTAS because it cannot be serialized.
>>> >>> This behavior is too cryptic, and will break the current catalog
>>> >>> behavior when using 1.16.
>>> >>> I suggest we introduce a new interface for atomic catalog which
>>> >>> implements Serializable.
>>> >>>  The existent catalogs can choose whether implements the new catalog
>>> interface.
>>> >>>
>>> >>> > Catalog#inferTableOptions
>>> >>> I strongly recommend not introducing this feature now, because the
>>> >>> behavior is unclear.
>>> >>> 1) if the catalog support managed table, the connector option is
>>> >>> empty. but if user forget to
>>> >>> set connector option for CTAS statement, the created table will be
>>> >>> managed table.
>>> >>> 2) the options and its values for catalog and for connector may be
>>> different,
>>> >>> so use the catalog option may cause expected errors.
>>> >>>
>>> >>> > StreamGraph#addJobStatusHook
>>> >>> I prefer `registerJobStatusHook`
>>> >>>
>>> >>> Best,
>>> >>> Godfrey
>>> >>>
>>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>>> >>> >
>>> >>> > Hi Yun,
>>> >>> > Thanks for your reply!
>>> >>> > Through offline communication with Dalong, I updated the
>>> JobStatusHook part to FLIP, looking forward to your feedback.
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> >
>>> >>> > Best regards,
>>> >>> > Mang Zhang
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID>
>>> wrote:
>>> >>> > >Hi,
>>> >>> > >
>>> >>> > >Regarding the drop operation, with some offline discussion with
>>> Dalong and Zhu,
>>> >>> > >we think that listening in the client side might be problematic
>>> since it would exit
>>> >>> > >after submitting the jobs in detached mode, thus the operation
>>> might need to
>>> >>> > >be in the JobMaster side.
>>> >>> > >
>>> >>> > >For the listener interface, currently JobListener only resides in
>>> the client side
>>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>>> scenario, and
>>> >>> > >the internal JobStatusListener is designed to be used inside JM and
>>> is not
>>> >>> > >serializable, thus we tend to add a new interface JobStatusHook,
>>> >>> > >which could be attached to the JobGraph and executed in the
>>> JobMaster.
>>> >>> > >The interface will also be marked as Internal.
>>> >>> > >
>>> >>> > >Best,
>>> >>> > >Yun
>>> >>> > >
>>> >>> > >
>>> >>> > >------------------------------------------------------------------
>>> >>> > >From:Mang Zhang <zh...@163.com>
>>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>>> >>> > >To:dev <de...@flink.apache.org>
>>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>>> TABLE(CTAS)
>>> >>> > >
>>> >>> > >Hi, Martijn
>>> >>> > >Thanks for your reply!
>>> >>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
>>> >>> > >Feature T172 is "AS subquery clause in table definition".
>>> >>> > >
>>> >>> > >
>>> >>> > >
>>> >>> > >--
>>> >>> > >
>>> >>> > >Best regards,
>>> >>> > >Mang Zhang
>>> >>> > >
>>> >>> > >
>>> >>> > >
>>> >>> > >
>>> >>> > >
>>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org>
>>> wrote:
>>> >>> > >>Hi everyone,
>>> >>> > >>
>>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>>> standard?
>>> >>> > >>
>>> >>> > >>Best regards,
>>> >>> > >>
>>> >>> > >>Martijn Visser
>>> >>> > >>https://twitter.com/MartijnVisser82
>>> >>> > >>https://github.com/MartijnVisser
>>> >>> > >>
>>> >>> > >>
>>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn>
>>> wrote:
>>> >>> > >>
>>> >>> > >>> Thanks for for driving this work, it's to be a useful feature.
>>> >>> > >>> About the flip-218, I have some questions.
>>> >>> > >>>
>>> >>> > >>> 1: Does our CTAS syntax support specify target table's schema
>>> including
>>> >>> > >>> column name and data type? I think it maybe a useful fature in
>>> case we want
>>> >>> > >>> to change the data types in target table instead of always copy
>>> the source
>>> >>> > >>> table's schema. It'll be more flexible with this feature.
>>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this
>>> feature.
>>> >>> > >>>
>>> >>> > >>> 2: Seems it'll requre sink to implement an public interface to
>>> drop table,
>>> >>> > >>> so what's the interface will look like?
>>> >>> > >>>
>>> >>> > >>> [1]
>>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>>> >>> > >>>
>>> >>> > >>> Best regards,
>>> >>> > >>> Yuxia
>>> >>> > >>>
>>> >>> > >>> ----- 原始邮件 -----
>>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>>> TABLE(CTAS)
>>> >>> > >>>
>>> >>> > >>> Hi, everyone
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>> I would like to open a discussion for support select clause in
>>> CREATE
>>> >>> > >>> TABLE(CTAS),
>>> >>> > >>> With the development of business and the enhancement of flink sql
>>> >>> > >>> capabilities, queries become more and more complex.
>>> >>> > >>> Now the user needs to use the Create Table statement to create
>>> the target
>>> >>> > >>> table first, and then execute the insert statement.
>>> >>> > >>> However, the target table may have many columns, which will
>>> bring a lot of
>>> >>> > >>> work outside the business logic to the user.
>>> >>> > >>> At the same time, ensure that the schema of the created target
>>> table is
>>> >>> > >>> consistent with the schema of the query result.
>>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the
>>> user.
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>> You can find more details in FLIP-218[1]. Looking forward to
>>> your feedback.
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>> [1]
>>> >>> > >>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>> --
>>> >>> > >>>
>>> >>> > >>> Best regards,
>>> >>> > >>> Mang Zhang
>>> >>> > >>>
>>> >>> > >
>>> >>
>>> >>
>>> >>------------------------------
>>> >>Best,
>>> >>Ron
>>>

Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Thanks for updating. The FLIP looks generall good to me. I have only minor questions:

1: Also, the mixture of ctas and rtas confuses me as the FLIP talks nothing about rtas but refer it in the configuration suddenly.  And if we're not to implement rtas in this FLIP, it may be better not to refer it and the `rtas` shouldn't exposed to user as a configuration.

2: How will the CTASJobStatusHook be passed to StreamGraph as a hook? Could you please explain about it. Some pseudocode will be much better if it's possible.  I'm lost in this part.

3: The name `AtomicCatalog` confuses me. Seems the backgroud for the naming is to implement atomic for ctas, we propose a interface for catalog to support serializing, then we name it to `AtomicCatalog`. At least, the interface is for the atomic of ctas. But if we want to implement other features like isolate which may also require serializable catalog in the future, should we introduce a new interface naming `IsolateCatalog`? Have you ever considered other names like `SerializableCatalog`.  As it's a public interface, maybe we should be careful about the name. 


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Mang Zhang" <zh...@163.com>
收件人: "dev" <de...@flink.apache.org>
抄送: imjark@gmail.com
发送时间: 星期一, 2022年 6 月 27日 下午 5:43:50
主题: Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Hi Jark,
First of all, thank you for your very good advice!
The RTAS point you mentioned is a good one, and we should support it as well. 
However, by investigating the semantics of RTAS and how RTAS is used within the company, I found that:
1. The semantics of RTAS says that if the table exists, need to delete the old data and use the new data.
This semantics is better implemented in Batch mode, for example, if the target table is a Hive table, old data file can be deleted directly.
But in Streaming mode, the target table is probably a Kafka topic, we can't delete the data.
So the semantics in streaming and batch scenarios are not well guaranteed to be consistent.
2. I checked the SQL for big data in the company in the last week and found that RTAS was not used.
No users in the company have mentioned the need for RTAS yet. So this application scenario is not very clear.


It is not clear what kind of semantics RTAS should provide in streaming mode, and the user's business scenarios are not very clear.
Maybe We don't have to support RTAS soon, but we can leave the possibility of supporting RTAS in the future in the interface definition.
What do you think? Looking forward to your response!


By the way, the other points raised have been updated. thanks.




--

Best regards,
Mang Zhang





At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, Mang and Ron,
>
>The new proposal looks good to me in general, especially keeping the
>behavior
>consistent between batch and streaming mode by default. This is how we do
>it
>in the previous "table.dml-sync" option on ML [1].
>
>Besides that, I just have some final minor comments regarding some
>interfaces.
>
>1) table.ctas-or-rtas.atomicity-enabled
>The "OR" keyword sounds like this configuration can only take effect on one
>of CTAS and RTAS.
>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>
>2) In the FLIP, you have mentioned RTAS many times, but have no plan to
>support it.
>RTAS is another widely used statement similar to CTAS. It seems there is
>not much difference
>between CTAS and RTAS. Considering we are introducing RTAS configurations,
>is it possible
> to support RTAS in this FLIP as well?
>
>3) connector.type
>"connector.type" has been deprecated since FLIP-95, could you replace them
>with 'connector'?
>
>4) SupportsAtomicCatalog
>I have some concerns about using "Supports.." prefix which is known as the
>ability extension for
>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is enough?
>
>Best,
>Jark
>
>[1]: https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>
>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>
>> Hi all,
>> Thank you to all those who participated in the discussion and made
>> suggestions!
>> After several rounds of online and offline discussions, the solution in
>> FLIP has been updated.
>> Looking forward to more feedback from everyone.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>
>>
>>
>>
>>
>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >Hi godfrey and ron,
>> >Thank you very much for your replies and suggestions.
>> >Special thanks to ron for helping to review and improve the FLIP.
>> >Looking forward to further feedback from others.
>> >
>> >
>> >
>> >--
>> >
>> >Best regards,
>> >Mang Zhang
>> >
>> >
>> >
>> >
>> >
>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >>Thanks for godfrey further feedback, your suggestions are very good to
>> me, the FLIP has updated according to your feedback. It will be very good
>> if you look at it again。
>> >>
>> >>Also looking forward to further feedback from others.
>> >>
>> >>
>> >>> -----原始邮件-----
>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >>> 收件人: dev <de...@flink.apache.org>
>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> TABLE(CTAS)
>> >>>
>> >>> Hi all,
>> >>>
>> >>> Sorry for the late reply.
>> >>>
>> >>> >table.cor-table-as-select.atomicity-enabled
>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >>>
>> >>> >Create Table As Select(CTAS) feature depends on the serializability
>> of the catalog. To quickly see if the catalog supports CTAS, we need to try
>> to serialize the catalog when compile SQL in planner and if it fails, an
>> exception will be >thrown to indicate to user that the catalog does not
>> support CTAS because it cannot be serialized.
>> >>> This behavior is too cryptic, and will break the current catalog
>> >>> behavior when using 1.16.
>> >>> I suggest we introduce a new interface for atomic catalog which
>> >>> implements Serializable.
>> >>>  The existent catalogs can choose whether implements the new catalog
>> interface.
>> >>>
>> >>> > Catalog#inferTableOptions
>> >>> I strongly recommend not introducing this feature now, because the
>> >>> behavior is unclear.
>> >>> 1) if the catalog support managed table, the connector option is
>> >>> empty. but if user forget to
>> >>> set connector option for CTAS statement, the created table will be
>> >>> managed table.
>> >>> 2) the options and its values for catalog and for connector may be
>> different,
>> >>> so use the catalog option may cause expected errors.
>> >>>
>> >>> > StreamGraph#addJobStatusHook
>> >>> I prefer `registerJobStatusHook`
>> >>>
>> >>> Best,
>> >>> Godfrey
>> >>>
>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >>> >
>> >>> > Hi Yun,
>> >>> > Thanks for your reply!
>> >>> > Through offline communication with Dalong, I updated the
>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> >
>> >>> > Best regards,
>> >>> > Mang Zhang
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID>
>> wrote:
>> >>> > >Hi,
>> >>> > >
>> >>> > >Regarding the drop operation, with some offline discussion with
>> Dalong and Zhu,
>> >>> > >we think that listening in the client side might be problematic
>> since it would exit
>> >>> > >after submitting the jobs in detached mode, thus the operation
>> might need to
>> >>> > >be in the JobMaster side.
>> >>> > >
>> >>> > >For the listener interface, currently JobListener only resides in
>> the client side
>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> scenario, and
>> >>> > >the internal JobStatusListener is designed to be used inside JM and
>> is not
>> >>> > >serializable, thus we tend to add a new interface JobStatusHook,
>> >>> > >which could be attached to the JobGraph and executed in the
>> JobMaster.
>> >>> > >The interface will also be marked as Internal.
>> >>> > >
>> >>> > >Best,
>> >>> > >Yun
>> >>> > >
>> >>> > >
>> >>> > >------------------------------------------------------------------
>> >>> > >From:Mang Zhang <zh...@163.com>
>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >>> > >To:dev <de...@flink.apache.org>
>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> TABLE(CTAS)
>> >>> > >
>> >>> > >Hi, Martijn
>> >>> > >Thanks for your reply!
>> >>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >--
>> >>> > >
>> >>> > >Best regards,
>> >>> > >Mang Zhang
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org>
>> wrote:
>> >>> > >>Hi everyone,
>> >>> > >>
>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> standard?
>> >>> > >>
>> >>> > >>Best regards,
>> >>> > >>
>> >>> > >>Martijn Visser
>> >>> > >>https://twitter.com/MartijnVisser82
>> >>> > >>https://github.com/MartijnVisser
>> >>> > >>
>> >>> > >>
>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn>
>> wrote:
>> >>> > >>
>> >>> > >>> Thanks for for driving this work, it's to be a useful feature.
>> >>> > >>> About the flip-218, I have some questions.
>> >>> > >>>
>> >>> > >>> 1: Does our CTAS syntax support specify target table's schema
>> including
>> >>> > >>> column name and data type? I think it maybe a useful fature in
>> case we want
>> >>> > >>> to change the data types in target table instead of always copy
>> the source
>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this
>> feature.
>> >>> > >>>
>> >>> > >>> 2: Seems it'll requre sink to implement an public interface to
>> drop table,
>> >>> > >>> so what's the interface will look like?
>> >>> > >>>
>> >>> > >>> [1]
>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >>> > >>>
>> >>> > >>> Best regards,
>> >>> > >>> Yuxia
>> >>> > >>>
>> >>> > >>> ----- 原始邮件 -----
>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> TABLE(CTAS)
>> >>> > >>>
>> >>> > >>> Hi, everyone
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> I would like to open a discussion for support select clause in
>> CREATE
>> >>> > >>> TABLE(CTAS),
>> >>> > >>> With the development of business and the enhancement of flink sql
>> >>> > >>> capabilities, queries become more and more complex.
>> >>> > >>> Now the user needs to use the Create Table statement to create
>> the target
>> >>> > >>> table first, and then execute the insert statement.
>> >>> > >>> However, the target table may have many columns, which will
>> bring a lot of
>> >>> > >>> work outside the business logic to the user.
>> >>> > >>> At the same time, ensure that the schema of the created target
>> table is
>> >>> > >>> consistent with the schema of the query result.
>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the
>> user.
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> You can find more details in FLIP-218[1]. Looking forward to
>> your feedback.
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> [1]
>> >>> > >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> --
>> >>> > >>>
>> >>> > >>> Best regards,
>> >>> > >>> Mang Zhang
>> >>> > >>>
>> >>> > >
>> >>
>> >>
>> >>------------------------------
>> >>Best,
>> >>Ron
>>

Re:Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi Jark,
First of all, thank you for your very good advice!
The RTAS point you mentioned is a good one, and we should support it as well. 
However, by investigating the semantics of RTAS and how RTAS is used within the company, I found that:
1. The semantics of RTAS says that if the table exists, need to delete the old data and use the new data.
This semantics is better implemented in Batch mode, for example, if the target table is a Hive table, old data file can be deleted directly.
But in Streaming mode, the target table is probably a Kafka topic, we can't delete the data.
So the semantics in streaming and batch scenarios are not well guaranteed to be consistent.
2. I checked the SQL for big data in the company in the last week and found that RTAS was not used.
No users in the company have mentioned the need for RTAS yet. So this application scenario is not very clear.


It is not clear what kind of semantics RTAS should provide in streaming mode, and the user's business scenarios are not very clear.
Maybe We don't have to support RTAS soon, but we can leave the possibility of supporting RTAS in the future in the interface definition.
What do you think? Looking forward to your response!


By the way, the other points raised have been updated. thanks.




--

Best regards,
Mang Zhang





At 2022-06-26 11:56:53, "Jark Wu" <im...@gmail.com> wrote:
>Thanks for the update, Mang and Ron,
>
>The new proposal looks good to me in general, especially keeping the
>behavior
>consistent between batch and streaming mode by default. This is how we do
>it
>in the previous "table.dml-sync" option on ML [1].
>
>Besides that, I just have some final minor comments regarding some
>interfaces.
>
>1) table.ctas-or-rtas.atomicity-enabled
>The "OR" keyword sounds like this configuration can only take effect on one
>of CTAS and RTAS.
>What about "table.ctas-and-rtas" or "table.ctas-rtas"?
>
>2) In the FLIP, you have mentioned RTAS many times, but have no plan to
>support it.
>RTAS is another widely used statement similar to CTAS. It seems there is
>not much difference
>between CTAS and RTAS. Considering we are introducing RTAS configurations,
>is it possible
> to support RTAS in this FLIP as well?
>
>3) connector.type
>"connector.type" has been deprecated since FLIP-95, could you replace them
>with 'connector'?
>
>4) SupportsAtomicCatalog
>I have some concerns about using "Supports.." prefix which is known as the
>ability extension for
>DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is enough?
>
>Best,
>Jark
>
>[1]: https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2
>
>On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:
>
>> Hi all,
>> Thank you to all those who participated in the discussion and made
>> suggestions!
>> After several rounds of online and offline discussions, the solution in
>> FLIP has been updated.
>> Looking forward to more feedback from everyone.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>
>>
>>
>>
>>
>> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>> >Hi godfrey and ron,
>> >Thank you very much for your replies and suggestions.
>> >Special thanks to ron for helping to review and improve the FLIP.
>> >Looking forward to further feedback from others.
>> >
>> >
>> >
>> >--
>> >
>> >Best regards,
>> >Mang Zhang
>> >
>> >
>> >
>> >
>> >
>> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>> >>Thanks for godfrey further feedback, your suggestions are very good to
>> me, the FLIP has updated according to your feedback. It will be very good
>> if you look at it again。
>> >>
>> >>Also looking forward to further feedback from others.
>> >>
>> >>
>> >>> -----原始邮件-----
>> >>> 发件人: "godfrey he" <go...@gmail.com>
>> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> >>> 收件人: dev <de...@flink.apache.org>
>> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> TABLE(CTAS)
>> >>>
>> >>> Hi all,
>> >>>
>> >>> Sorry for the late reply.
>> >>>
>> >>> >table.cor-table-as-select.atomicity-enabled
>> >>> Regarding `cor`,  this abbreviation is not commonly used.
>> >>>
>> >>> >Create Table As Select(CTAS) feature depends on the serializability
>> of the catalog. To quickly see if the catalog supports CTAS, we need to try
>> to serialize the catalog when compile SQL in planner and if it fails, an
>> exception will be >thrown to indicate to user that the catalog does not
>> support CTAS because it cannot be serialized.
>> >>> This behavior is too cryptic, and will break the current catalog
>> >>> behavior when using 1.16.
>> >>> I suggest we introduce a new interface for atomic catalog which
>> >>> implements Serializable.
>> >>>  The existent catalogs can choose whether implements the new catalog
>> interface.
>> >>>
>> >>> > Catalog#inferTableOptions
>> >>> I strongly recommend not introducing this feature now, because the
>> >>> behavior is unclear.
>> >>> 1) if the catalog support managed table, the connector option is
>> >>> empty. but if user forget to
>> >>> set connector option for CTAS statement, the created table will be
>> >>> managed table.
>> >>> 2) the options and its values for catalog and for connector may be
>> different,
>> >>> so use the catalog option may cause expected errors.
>> >>>
>> >>> > StreamGraph#addJobStatusHook
>> >>> I prefer `registerJobStatusHook`
>> >>>
>> >>> Best,
>> >>> Godfrey
>> >>>
>> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >>> >
>> >>> > Hi Yun,
>> >>> > Thanks for your reply!
>> >>> > Through offline communication with Dalong, I updated the
>> JobStatusHook part to FLIP, looking forward to your feedback.
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> >
>> >>> > Best regards,
>> >>> > Mang Zhang
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID>
>> wrote:
>> >>> > >Hi,
>> >>> > >
>> >>> > >Regarding the drop operation, with some offline discussion with
>> Dalong and Zhu,
>> >>> > >we think that listening in the client side might be problematic
>> since it would exit
>> >>> > >after submitting the jobs in detached mode, thus the operation
>> might need to
>> >>> > >be in the JobMaster side.
>> >>> > >
>> >>> > >For the listener interface, currently JobListener only resides in
>> the client side
>> >>> > >and contains unsuitable methods like onJobSubmitted for this
>> scenario, and
>> >>> > >the internal JobStatusListener is designed to be used inside JM and
>> is not
>> >>> > >serializable, thus we tend to add a new interface JobStatusHook,
>> >>> > >which could be attached to the JobGraph and executed in the
>> JobMaster.
>> >>> > >The interface will also be marked as Internal.
>> >>> > >
>> >>> > >Best,
>> >>> > >Yun
>> >>> > >
>> >>> > >
>> >>> > >------------------------------------------------------------------
>> >>> > >From:Mang Zhang <zh...@163.com>
>> >>> > >Send Time:2022 May 25 (Wed.) 10:24
>> >>> > >To:dev <de...@flink.apache.org>
>> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> TABLE(CTAS)
>> >>> > >
>> >>> > >Hi, Martijn
>> >>> > >Thanks for your reply!
>> >>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
>> >>> > >Feature T172 is "AS subquery clause in table definition".
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >--
>> >>> > >
>> >>> > >Best regards,
>> >>> > >Mang Zhang
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org>
>> wrote:
>> >>> > >>Hi everyone,
>> >>> > >>
>> >>> > >>Can we identify if this proposed syntax is part of the SQL
>> standard?
>> >>> > >>
>> >>> > >>Best regards,
>> >>> > >>
>> >>> > >>Martijn Visser
>> >>> > >>https://twitter.com/MartijnVisser82
>> >>> > >>https://github.com/MartijnVisser
>> >>> > >>
>> >>> > >>
>> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn>
>> wrote:
>> >>> > >>
>> >>> > >>> Thanks for for driving this work, it's to be a useful feature.
>> >>> > >>> About the flip-218, I have some questions.
>> >>> > >>>
>> >>> > >>> 1: Does our CTAS syntax support specify target table's schema
>> including
>> >>> > >>> column name and data type? I think it maybe a useful fature in
>> case we want
>> >>> > >>> to change the data types in target table instead of always copy
>> the source
>> >>> > >>> table's schema. It'll be more flexible with this feature.
>> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this
>> feature.
>> >>> > >>>
>> >>> > >>> 2: Seems it'll requre sink to implement an public interface to
>> drop table,
>> >>> > >>> so what's the interface will look like?
>> >>> > >>>
>> >>> > >>> [1]
>> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> >>> > >>>
>> >>> > >>> Best regards,
>> >>> > >>> Yuxia
>> >>> > >>>
>> >>> > >>> ----- 原始邮件 -----
>> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
>> TABLE(CTAS)
>> >>> > >>>
>> >>> > >>> Hi, everyone
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> I would like to open a discussion for support select clause in
>> CREATE
>> >>> > >>> TABLE(CTAS),
>> >>> > >>> With the development of business and the enhancement of flink sql
>> >>> > >>> capabilities, queries become more and more complex.
>> >>> > >>> Now the user needs to use the Create Table statement to create
>> the target
>> >>> > >>> table first, and then execute the insert statement.
>> >>> > >>> However, the target table may have many columns, which will
>> bring a lot of
>> >>> > >>> work outside the business logic to the user.
>> >>> > >>> At the same time, ensure that the schema of the created target
>> table is
>> >>> > >>> consistent with the schema of the query result.
>> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the
>> user.
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> You can find more details in FLIP-218[1]. Looking forward to
>> your feedback.
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> [1]
>> >>> > >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> --
>> >>> > >>>
>> >>> > >>> Best regards,
>> >>> > >>> Mang Zhang
>> >>> > >>>
>> >>> > >
>> >>
>> >>
>> >>------------------------------
>> >>Best,
>> >>Ron
>>

Re: Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Jark Wu <im...@gmail.com>.
Thanks for the update, Mang and Ron,

The new proposal looks good to me in general, especially keeping the
behavior
consistent between batch and streaming mode by default. This is how we do
it
in the previous "table.dml-sync" option on ML [1].

Besides that, I just have some final minor comments regarding some
interfaces.

1) table.ctas-or-rtas.atomicity-enabled
The "OR" keyword sounds like this configuration can only take effect on one
of CTAS and RTAS.
What about "table.ctas-and-rtas" or "table.ctas-rtas"?

2) In the FLIP, you have mentioned RTAS many times, but have no plan to
support it.
RTAS is another widely used statement similar to CTAS. It seems there is
not much difference
between CTAS and RTAS. Considering we are introducing RTAS configurations,
is it possible
 to support RTAS in this FLIP as well?

3) connector.type
"connector.type" has been deprecated since FLIP-95, could you replace them
with 'connector'?

4) SupportsAtomicCatalog
I have some concerns about using "Supports.." prefix which is known as the
ability extension for
DynamicTableSource and DynamicTableSink. Maybe "AtomicCatalog" is enough?

Best,
Jark

[1]: https://lists.apache.org/thread/78r8ybh4q3hkxf935vzjkb7782hqzcj2

On Fri, 24 Jun 2022 at 22:51, Mang Zhang <zh...@163.com> wrote:

> Hi all,
> Thank you to all those who participated in the discussion and made
> suggestions!
> After several rounds of online and offline discussions, the solution in
> FLIP has been updated.
> Looking forward to more feedback from everyone.
>
>
>
>
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
> >Hi godfrey and ron,
> >Thank you very much for your replies and suggestions.
> >Special thanks to ron for helping to review and improve the FLIP.
> >Looking forward to further feedback from others.
> >
> >
> >
> >--
> >
> >Best regards,
> >Mang Zhang
> >
> >
> >
> >
> >
> >At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
> >>Thanks for godfrey further feedback, your suggestions are very good to
> me, the FLIP has updated according to your feedback. It will be very good
> if you look at it again。
> >>
> >>Also looking forward to further feedback from others.
> >>
> >>
> >>> -----原始邮件-----
> >>> 发件人: "godfrey he" <go...@gmail.com>
> >>> 发送时间: 2022-06-24 17:00:51 (星期五)
> >>> 收件人: dev <de...@flink.apache.org>
> >>> 抄送: "Yun Gao" <yu...@aliyun.com>
> >>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> TABLE(CTAS)
> >>>
> >>> Hi all,
> >>>
> >>> Sorry for the late reply.
> >>>
> >>> >table.cor-table-as-select.atomicity-enabled
> >>> Regarding `cor`,  this abbreviation is not commonly used.
> >>>
> >>> >Create Table As Select(CTAS) feature depends on the serializability
> of the catalog. To quickly see if the catalog supports CTAS, we need to try
> to serialize the catalog when compile SQL in planner and if it fails, an
> exception will be >thrown to indicate to user that the catalog does not
> support CTAS because it cannot be serialized.
> >>> This behavior is too cryptic, and will break the current catalog
> >>> behavior when using 1.16.
> >>> I suggest we introduce a new interface for atomic catalog which
> >>> implements Serializable.
> >>>  The existent catalogs can choose whether implements the new catalog
> interface.
> >>>
> >>> > Catalog#inferTableOptions
> >>> I strongly recommend not introducing this feature now, because the
> >>> behavior is unclear.
> >>> 1) if the catalog support managed table, the connector option is
> >>> empty. but if user forget to
> >>> set connector option for CTAS statement, the created table will be
> >>> managed table.
> >>> 2) the options and its values for catalog and for connector may be
> different,
> >>> so use the catalog option may cause expected errors.
> >>>
> >>> > StreamGraph#addJobStatusHook
> >>> I prefer `registerJobStatusHook`
> >>>
> >>> Best,
> >>> Godfrey
> >>>
> >>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
> >>> >
> >>> > Hi Yun,
> >>> > Thanks for your reply!
> >>> > Through offline communication with Dalong, I updated the
> JobStatusHook part to FLIP, looking forward to your feedback.
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Best regards,
> >>> > Mang Zhang
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID>
> wrote:
> >>> > >Hi,
> >>> > >
> >>> > >Regarding the drop operation, with some offline discussion with
> Dalong and Zhu,
> >>> > >we think that listening in the client side might be problematic
> since it would exit
> >>> > >after submitting the jobs in detached mode, thus the operation
> might need to
> >>> > >be in the JobMaster side.
> >>> > >
> >>> > >For the listener interface, currently JobListener only resides in
> the client side
> >>> > >and contains unsuitable methods like onJobSubmitted for this
> scenario, and
> >>> > >the internal JobStatusListener is designed to be used inside JM and
> is not
> >>> > >serializable, thus we tend to add a new interface JobStatusHook,
> >>> > >which could be attached to the JobGraph and executed in the
> JobMaster.
> >>> > >The interface will also be marked as Internal.
> >>> > >
> >>> > >Best,
> >>> > >Yun
> >>> > >
> >>> > >
> >>> > >------------------------------------------------------------------
> >>> > >From:Mang Zhang <zh...@163.com>
> >>> > >Send Time:2022 May 25 (Wed.) 10:24
> >>> > >To:dev <de...@flink.apache.org>
> >>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> TABLE(CTAS)
> >>> > >
> >>> > >Hi, Martijn
> >>> > >Thanks for your reply!
> >>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
> >>> > >Feature T172 is "AS subquery clause in table definition".
> >>> > >
> >>> > >
> >>> > >
> >>> > >--
> >>> > >
> >>> > >Best regards,
> >>> > >Mang Zhang
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org>
> wrote:
> >>> > >>Hi everyone,
> >>> > >>
> >>> > >>Can we identify if this proposed syntax is part of the SQL
> standard?
> >>> > >>
> >>> > >>Best regards,
> >>> > >>
> >>> > >>Martijn Visser
> >>> > >>https://twitter.com/MartijnVisser82
> >>> > >>https://github.com/MartijnVisser
> >>> > >>
> >>> > >>
> >>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn>
> wrote:
> >>> > >>
> >>> > >>> Thanks for for driving this work, it's to be a useful feature.
> >>> > >>> About the flip-218, I have some questions.
> >>> > >>>
> >>> > >>> 1: Does our CTAS syntax support specify target table's schema
> including
> >>> > >>> column name and data type? I think it maybe a useful fature in
> case we want
> >>> > >>> to change the data types in target table instead of always copy
> the source
> >>> > >>> table's schema. It'll be more flexible with this feature.
> >>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this
> feature.
> >>> > >>>
> >>> > >>> 2: Seems it'll requre sink to implement an public interface to
> drop table,
> >>> > >>> so what's the interface will look like?
> >>> > >>>
> >>> > >>> [1]
> https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >>> > >>>
> >>> > >>> Best regards,
> >>> > >>> Yuxia
> >>> > >>>
> >>> > >>> ----- 原始邮件 -----
> >>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
> >>> > >>> 收件人: "dev" <de...@flink.apache.org>
> >>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE
> TABLE(CTAS)
> >>> > >>>
> >>> > >>> Hi, everyone
> >>> > >>>
> >>> > >>>
> >>> > >>> I would like to open a discussion for support select clause in
> CREATE
> >>> > >>> TABLE(CTAS),
> >>> > >>> With the development of business and the enhancement of flink sql
> >>> > >>> capabilities, queries become more and more complex.
> >>> > >>> Now the user needs to use the Create Table statement to create
> the target
> >>> > >>> table first, and then execute the insert statement.
> >>> > >>> However, the target table may have many columns, which will
> bring a lot of
> >>> > >>> work outside the business logic to the user.
> >>> > >>> At the same time, ensure that the schema of the created target
> table is
> >>> > >>> consistent with the schema of the query result.
> >>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the
> user.
> >>> > >>>
> >>> > >>>
> >>> > >>>
> >>> > >>> You can find more details in FLIP-218[1]. Looking forward to
> your feedback.
> >>> > >>>
> >>> > >>>
> >>> > >>>
> >>> > >>> [1]
> >>> > >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >>> > >>>
> >>> > >>>
> >>> > >>>
> >>> > >>>
> >>> > >>> --
> >>> > >>>
> >>> > >>> Best regards,
> >>> > >>> Mang Zhang
> >>> > >>>
> >>> > >
> >>
> >>
> >>------------------------------
> >>Best,
> >>Ron
>

Re:Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi all,
Thank you to all those who participated in the discussion and made suggestions!
After several rounds of online and offline discussions, the solution in FLIP has been updated.
Looking forward to more feedback from everyone.







--

Best regards,
Mang Zhang





At 2022-06-24 21:58:01, "Mang Zhang" <zh...@163.com> wrote:
>Hi godfrey and ron,
>Thank you very much for your replies and suggestions.
>Special thanks to ron for helping to review and improve the FLIP.
>Looking forward to further feedback from others. 
>
>
>
>--
>
>Best regards,
>Mang Zhang
>
>
>
>
>
>At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>>Thanks for godfrey further feedback, your suggestions are very good to me, the FLIP has updated according to your feedback. It will be very good if you look at it again。
>>
>>Also looking forward to further feedback from others.
>>
>>
>>> -----原始邮件-----
>>> 发件人: "godfrey he" <go...@gmail.com>
>>> 发送时间: 2022-06-24 17:00:51 (星期五)
>>> 收件人: dev <de...@flink.apache.org>
>>> 抄送: "Yun Gao" <yu...@aliyun.com>
>>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>>> 
>>> Hi all,
>>> 
>>> Sorry for the late reply.
>>> 
>>> >table.cor-table-as-select.atomicity-enabled
>>> Regarding `cor`,  this abbreviation is not commonly used.
>>> 
>>> >Create Table As Select(CTAS) feature depends on the serializability of the catalog. To quickly see if the catalog supports CTAS, we need to try to serialize the catalog when compile SQL in planner and if it fails, an exception will be >thrown to indicate to user that the catalog does not support CTAS because it cannot be serialized.
>>> This behavior is too cryptic, and will break the current catalog
>>> behavior when using 1.16.
>>> I suggest we introduce a new interface for atomic catalog which
>>> implements Serializable.
>>>  The existent catalogs can choose whether implements the new catalog interface.
>>> 
>>> > Catalog#inferTableOptions
>>> I strongly recommend not introducing this feature now, because the
>>> behavior is unclear.
>>> 1) if the catalog support managed table, the connector option is
>>> empty. but if user forget to
>>> set connector option for CTAS statement, the created table will be
>>> managed table.
>>> 2) the options and its values for catalog and for connector may be different,
>>> so use the catalog option may cause expected errors.
>>> 
>>> > StreamGraph#addJobStatusHook
>>> I prefer `registerJobStatusHook`
>>> 
>>> Best,
>>> Godfrey
>>> 
>>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>>> >
>>> > Hi Yun,
>>> > Thanks for your reply!
>>> > Through offline communication with Dalong, I updated the JobStatusHook part to FLIP, looking forward to your feedback.
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Best regards,
>>> > Mang Zhang
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID> wrote:
>>> > >Hi,
>>> > >
>>> > >Regarding the drop operation, with some offline discussion with Dalong and Zhu,
>>> > >we think that listening in the client side might be problematic since it would exit
>>> > >after submitting the jobs in detached mode, thus the operation might need to
>>> > >be in the JobMaster side.
>>> > >
>>> > >For the listener interface, currently JobListener only resides in the client side
>>> > >and contains unsuitable methods like onJobSubmitted for this scenario, and
>>> > >the internal JobStatusListener is designed to be used inside JM and is not
>>> > >serializable, thus we tend to add a new interface JobStatusHook,
>>> > >which could be attached to the JobGraph and executed in the JobMaster.
>>> > >The interface will also be marked as Internal.
>>> > >
>>> > >Best,
>>> > >Yun
>>> > >
>>> > >
>>> > >------------------------------------------------------------------
>>> > >From:Mang Zhang <zh...@163.com>
>>> > >Send Time:2022 May 25 (Wed.) 10:24
>>> > >To:dev <de...@flink.apache.org>
>>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>>> > >
>>> > >Hi, Martijn
>>> > >Thanks for your reply!
>>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
>>> > >Feature T172 is "AS subquery clause in table definition".
>>> > >
>>> > >
>>> > >
>>> > >--
>>> > >
>>> > >Best regards,
>>> > >Mang Zhang
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
>>> > >>Hi everyone,
>>> > >>
>>> > >>Can we identify if this proposed syntax is part of the SQL standard?
>>> > >>
>>> > >>Best regards,
>>> > >>
>>> > >>Martijn Visser
>>> > >>https://twitter.com/MartijnVisser82
>>> > >>https://github.com/MartijnVisser
>>> > >>
>>> > >>
>>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
>>> > >>
>>> > >>> Thanks for for driving this work, it's to be a useful feature.
>>> > >>> About the flip-218, I have some questions.
>>> > >>>
>>> > >>> 1: Does our CTAS syntax support specify target table's schema including
>>> > >>> column name and data type? I think it maybe a useful fature in case we want
>>> > >>> to change the data types in target table instead of always copy the source
>>> > >>> table's schema. It'll be more flexible with this feature.
>>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
>>> > >>>
>>> > >>> 2: Seems it'll requre sink to implement an public interface to drop table,
>>> > >>> so what's the interface will look like?
>>> > >>>
>>> > >>> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>>> > >>>
>>> > >>> Best regards,
>>> > >>> Yuxia
>>> > >>>
>>> > >>> ----- 原始邮件 -----
>>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>>> > >>> 收件人: "dev" <de...@flink.apache.org>
>>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>>> > >>>
>>> > >>> Hi, everyone
>>> > >>>
>>> > >>>
>>> > >>> I would like to open a discussion for support select clause in CREATE
>>> > >>> TABLE(CTAS),
>>> > >>> With the development of business and the enhancement of flink sql
>>> > >>> capabilities, queries become more and more complex.
>>> > >>> Now the user needs to use the Create Table statement to create the target
>>> > >>> table first, and then execute the insert statement.
>>> > >>> However, the target table may have many columns, which will bring a lot of
>>> > >>> work outside the business logic to the user.
>>> > >>> At the same time, ensure that the schema of the created target table is
>>> > >>> consistent with the schema of the query result.
>>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> You can find more details in FLIP-218[1]. Looking forward to your feedback.
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> [1]
>>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>>
>>> > >>> Best regards,
>>> > >>> Mang Zhang
>>> > >>>
>>> > >
>>
>>
>>------------------------------
>>Best,
>>Ron

Re:Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi godfrey and ron,
Thank you very much for your replies and suggestions.
Special thanks to ron for helping to review and improve the FLIP.
Looking forward to further feedback from others. 



--

Best regards,
Mang Zhang





At 2022-06-24 19:52:58, "ron" <ld...@zju.edu.cn> wrote:
>Thanks for godfrey further feedback, your suggestions are very good to me, the FLIP has updated according to your feedback. It will be very good if you look at it again。
>
>Also looking forward to further feedback from others.
>
>
>> -----原始邮件-----
>> 发件人: "godfrey he" <go...@gmail.com>
>> 发送时间: 2022-06-24 17:00:51 (星期五)
>> 收件人: dev <de...@flink.apache.org>
>> 抄送: "Yun Gao" <yu...@aliyun.com>
>> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>> 
>> Hi all,
>> 
>> Sorry for the late reply.
>> 
>> >table.cor-table-as-select.atomicity-enabled
>> Regarding `cor`,  this abbreviation is not commonly used.
>> 
>> >Create Table As Select(CTAS) feature depends on the serializability of the catalog. To quickly see if the catalog supports CTAS, we need to try to serialize the catalog when compile SQL in planner and if it fails, an exception will be >thrown to indicate to user that the catalog does not support CTAS because it cannot be serialized.
>> This behavior is too cryptic, and will break the current catalog
>> behavior when using 1.16.
>> I suggest we introduce a new interface for atomic catalog which
>> implements Serializable.
>>  The existent catalogs can choose whether implements the new catalog interface.
>> 
>> > Catalog#inferTableOptions
>> I strongly recommend not introducing this feature now, because the
>> behavior is unclear.
>> 1) if the catalog support managed table, the connector option is
>> empty. but if user forget to
>> set connector option for CTAS statement, the created table will be
>> managed table.
>> 2) the options and its values for catalog and for connector may be different,
>> so use the catalog option may cause expected errors.
>> 
>> > StreamGraph#addJobStatusHook
>> I prefer `registerJobStatusHook`
>> 
>> Best,
>> Godfrey
>> 
>> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>> >
>> > Hi Yun,
>> > Thanks for your reply!
>> > Through offline communication with Dalong, I updated the JobStatusHook part to FLIP, looking forward to your feedback.
>> >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Mang Zhang
>> >
>> >
>> >
>> >
>> >
>> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID> wrote:
>> > >Hi,
>> > >
>> > >Regarding the drop operation, with some offline discussion with Dalong and Zhu,
>> > >we think that listening in the client side might be problematic since it would exit
>> > >after submitting the jobs in detached mode, thus the operation might need to
>> > >be in the JobMaster side.
>> > >
>> > >For the listener interface, currently JobListener only resides in the client side
>> > >and contains unsuitable methods like onJobSubmitted for this scenario, and
>> > >the internal JobStatusListener is designed to be used inside JM and is not
>> > >serializable, thus we tend to add a new interface JobStatusHook,
>> > >which could be attached to the JobGraph and executed in the JobMaster.
>> > >The interface will also be marked as Internal.
>> > >
>> > >Best,
>> > >Yun
>> > >
>> > >
>> > >------------------------------------------------------------------
>> > >From:Mang Zhang <zh...@163.com>
>> > >Send Time:2022 May 25 (Wed.) 10:24
>> > >To:dev <de...@flink.apache.org>
>> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>> > >
>> > >Hi, Martijn
>> > >Thanks for your reply!
>> > >I looked at the SQL standard, CTAS is part of the SQL standard.
>> > >Feature T172 is "AS subquery clause in table definition".
>> > >
>> > >
>> > >
>> > >--
>> > >
>> > >Best regards,
>> > >Mang Zhang
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
>> > >>Hi everyone,
>> > >>
>> > >>Can we identify if this proposed syntax is part of the SQL standard?
>> > >>
>> > >>Best regards,
>> > >>
>> > >>Martijn Visser
>> > >>https://twitter.com/MartijnVisser82
>> > >>https://github.com/MartijnVisser
>> > >>
>> > >>
>> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
>> > >>
>> > >>> Thanks for for driving this work, it's to be a useful feature.
>> > >>> About the flip-218, I have some questions.
>> > >>>
>> > >>> 1: Does our CTAS syntax support specify target table's schema including
>> > >>> column name and data type? I think it maybe a useful fature in case we want
>> > >>> to change the data types in target table instead of always copy the source
>> > >>> table's schema. It'll be more flexible with this feature.
>> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
>> > >>>
>> > >>> 2: Seems it'll requre sink to implement an public interface to drop table,
>> > >>> so what's the interface will look like?
>> > >>>
>> > >>> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>> > >>>
>> > >>> Best regards,
>> > >>> Yuxia
>> > >>>
>> > >>> ----- 原始邮件 -----
>> > >>> 发件人: "Mang Zhang" <zh...@163.com>
>> > >>> 收件人: "dev" <de...@flink.apache.org>
>> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>> > >>>
>> > >>> Hi, everyone
>> > >>>
>> > >>>
>> > >>> I would like to open a discussion for support select clause in CREATE
>> > >>> TABLE(CTAS),
>> > >>> With the development of business and the enhancement of flink sql
>> > >>> capabilities, queries become more and more complex.
>> > >>> Now the user needs to use the Create Table statement to create the target
>> > >>> table first, and then execute the insert statement.
>> > >>> However, the target table may have many columns, which will bring a lot of
>> > >>> work outside the business logic to the user.
>> > >>> At the same time, ensure that the schema of the created target table is
>> > >>> consistent with the schema of the query result.
>> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>> > >>>
>> > >>>
>> > >>>
>> > >>> You can find more details in FLIP-218[1]. Looking forward to your feedback.
>> > >>>
>> > >>>
>> > >>>
>> > >>> [1]
>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Best regards,
>> > >>> Mang Zhang
>> > >>>
>> > >
>
>
>------------------------------
>Best,
>Ron

Re: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by ron <ld...@zju.edu.cn>.
Thanks for godfrey further feedback, your suggestions are very good to me, the FLIP has updated according to your feedback. It will be very good if you look at it again。

Also looking forward to further feedback from others.


> -----原始邮件-----
> 发件人: "godfrey he" <go...@gmail.com>
> 发送时间: 2022-06-24 17:00:51 (星期五)
> 收件人: dev <de...@flink.apache.org>
> 抄送: "Yun Gao" <yu...@aliyun.com>
> 主题: Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
> 
> Hi all,
> 
> Sorry for the late reply.
> 
> >table.cor-table-as-select.atomicity-enabled
> Regarding `cor`,  this abbreviation is not commonly used.
> 
> >Create Table As Select(CTAS) feature depends on the serializability of the catalog. To quickly see if the catalog supports CTAS, we need to try to serialize the catalog when compile SQL in planner and if it fails, an exception will be >thrown to indicate to user that the catalog does not support CTAS because it cannot be serialized.
> This behavior is too cryptic, and will break the current catalog
> behavior when using 1.16.
> I suggest we introduce a new interface for atomic catalog which
> implements Serializable.
>  The existent catalogs can choose whether implements the new catalog interface.
> 
> > Catalog#inferTableOptions
> I strongly recommend not introducing this feature now, because the
> behavior is unclear.
> 1) if the catalog support managed table, the connector option is
> empty. but if user forget to
> set connector option for CTAS statement, the created table will be
> managed table.
> 2) the options and its values for catalog and for connector may be different,
> so use the catalog option may cause expected errors.
> 
> > StreamGraph#addJobStatusHook
> I prefer `registerJobStatusHook`
> 
> Best,
> Godfrey
> 
> Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
> >
> > Hi Yun,
> > Thanks for your reply!
> > Through offline communication with Dalong, I updated the JobStatusHook part to FLIP, looking forward to your feedback.
> >
> >
> >
> > --
> >
> > Best regards,
> > Mang Zhang
> >
> >
> >
> >
> >
> > At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID> wrote:
> > >Hi,
> > >
> > >Regarding the drop operation, with some offline discussion with Dalong and Zhu,
> > >we think that listening in the client side might be problematic since it would exit
> > >after submitting the jobs in detached mode, thus the operation might need to
> > >be in the JobMaster side.
> > >
> > >For the listener interface, currently JobListener only resides in the client side
> > >and contains unsuitable methods like onJobSubmitted for this scenario, and
> > >the internal JobStatusListener is designed to be used inside JM and is not
> > >serializable, thus we tend to add a new interface JobStatusHook,
> > >which could be attached to the JobGraph and executed in the JobMaster.
> > >The interface will also be marked as Internal.
> > >
> > >Best,
> > >Yun
> > >
> > >
> > >------------------------------------------------------------------
> > >From:Mang Zhang <zh...@163.com>
> > >Send Time:2022 May 25 (Wed.) 10:24
> > >To:dev <de...@flink.apache.org>
> > >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
> > >
> > >Hi, Martijn
> > >Thanks for your reply!
> > >I looked at the SQL standard, CTAS is part of the SQL standard.
> > >Feature T172 is "AS subquery clause in table definition".
> > >
> > >
> > >
> > >--
> > >
> > >Best regards,
> > >Mang Zhang
> > >
> > >
> > >
> > >
> > >
> > >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
> > >>Hi everyone,
> > >>
> > >>Can we identify if this proposed syntax is part of the SQL standard?
> > >>
> > >>Best regards,
> > >>
> > >>Martijn Visser
> > >>https://twitter.com/MartijnVisser82
> > >>https://github.com/MartijnVisser
> > >>
> > >>
> > >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
> > >>
> > >>> Thanks for for driving this work, it's to be a useful feature.
> > >>> About the flip-218, I have some questions.
> > >>>
> > >>> 1: Does our CTAS syntax support specify target table's schema including
> > >>> column name and data type? I think it maybe a useful fature in case we want
> > >>> to change the data types in target table instead of always copy the source
> > >>> table's schema. It'll be more flexible with this feature.
> > >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
> > >>>
> > >>> 2: Seems it'll requre sink to implement an public interface to drop table,
> > >>> so what's the interface will look like?
> > >>>
> > >>> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> > >>>
> > >>> Best regards,
> > >>> Yuxia
> > >>>
> > >>> ----- 原始邮件 -----
> > >>> 发件人: "Mang Zhang" <zh...@163.com>
> > >>> 收件人: "dev" <de...@flink.apache.org>
> > >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> > >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
> > >>>
> > >>> Hi, everyone
> > >>>
> > >>>
> > >>> I would like to open a discussion for support select clause in CREATE
> > >>> TABLE(CTAS),
> > >>> With the development of business and the enhancement of flink sql
> > >>> capabilities, queries become more and more complex.
> > >>> Now the user needs to use the Create Table statement to create the target
> > >>> table first, and then execute the insert statement.
> > >>> However, the target table may have many columns, which will bring a lot of
> > >>> work outside the business logic to the user.
> > >>> At the same time, ensure that the schema of the created target table is
> > >>> consistent with the schema of the query result.
> > >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
> > >>>
> > >>>
> > >>>
> > >>> You can find more details in FLIP-218[1]. Looking forward to your feedback.
> > >>>
> > >>>
> > >>>
> > >>> [1]
> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Best regards,
> > >>> Mang Zhang
> > >>>
> > >


------------------------------
Best,
Ron

Re: Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by godfrey he <go...@gmail.com>.
Hi all,

Sorry for the late reply.

>table.cor-table-as-select.atomicity-enabled
Regarding `cor`,  this abbreviation is not commonly used.

>Create Table As Select(CTAS) feature depends on the serializability of the catalog. To quickly see if the catalog supports CTAS, we need to try to serialize the catalog when compile SQL in planner and if it fails, an exception will be >thrown to indicate to user that the catalog does not support CTAS because it cannot be serialized.
This behavior is too cryptic, and will break the current catalog
behavior when using 1.16.
I suggest we introduce a new interface for atomic catalog which
implements Serializable.
 The existent catalogs can choose whether implements the new catalog interface.

> Catalog#inferTableOptions
I strongly recommend not introducing this feature now, because the
behavior is unclear.
1) if the catalog support managed table, the connector option is
empty. but if user forget to
set connector option for CTAS statement, the created table will be
managed table.
2) the options and its values for catalog and for connector may be different,
so use the catalog option may cause expected errors.

> StreamGraph#addJobStatusHook
I prefer `registerJobStatusHook`

Best,
Godfrey

Mang Zhang <zh...@163.com> 于2022年6月13日周一 16:43写道:
>
> Hi Yun,
> Thanks for your reply!
> Through offline communication with Dalong, I updated the JobStatusHook part to FLIP, looking forward to your feedback.
>
>
>
> --
>
> Best regards,
> Mang Zhang
>
>
>
>
>
> At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID> wrote:
> >Hi,
> >
> >Regarding the drop operation, with some offline discussion with Dalong and Zhu,
> >we think that listening in the client side might be problematic since it would exit
> >after submitting the jobs in detached mode, thus the operation might need to
> >be in the JobMaster side.
> >
> >For the listener interface, currently JobListener only resides in the client side
> >and contains unsuitable methods like onJobSubmitted for this scenario, and
> >the internal JobStatusListener is designed to be used inside JM and is not
> >serializable, thus we tend to add a new interface JobStatusHook,
> >which could be attached to the JobGraph and executed in the JobMaster.
> >The interface will also be marked as Internal.
> >
> >Best,
> >Yun
> >
> >
> >------------------------------------------------------------------
> >From:Mang Zhang <zh...@163.com>
> >Send Time:2022 May 25 (Wed.) 10:24
> >To:dev <de...@flink.apache.org>
> >Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
> >
> >Hi, Martijn
> >Thanks for your reply!
> >I looked at the SQL standard, CTAS is part of the SQL standard.
> >Feature T172 is "AS subquery clause in table definition".
> >
> >
> >
> >--
> >
> >Best regards,
> >Mang Zhang
> >
> >
> >
> >
> >
> >At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
> >>Hi everyone,
> >>
> >>Can we identify if this proposed syntax is part of the SQL standard?
> >>
> >>Best regards,
> >>
> >>Martijn Visser
> >>https://twitter.com/MartijnVisser82
> >>https://github.com/MartijnVisser
> >>
> >>
> >>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
> >>
> >>> Thanks for for driving this work, it's to be a useful feature.
> >>> About the flip-218, I have some questions.
> >>>
> >>> 1: Does our CTAS syntax support specify target table's schema including
> >>> column name and data type? I think it maybe a useful fature in case we want
> >>> to change the data types in target table instead of always copy the source
> >>> table's schema. It'll be more flexible with this feature.
> >>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
> >>>
> >>> 2: Seems it'll requre sink to implement an public interface to drop table,
> >>> so what's the interface will look like?
> >>>
> >>> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
> >>>
> >>> Best regards,
> >>> Yuxia
> >>>
> >>> ----- 原始邮件 -----
> >>> 发件人: "Mang Zhang" <zh...@163.com>
> >>> 收件人: "dev" <de...@flink.apache.org>
> >>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
> >>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
> >>>
> >>> Hi, everyone
> >>>
> >>>
> >>> I would like to open a discussion for support select clause in CREATE
> >>> TABLE(CTAS),
> >>> With the development of business and the enhancement of flink sql
> >>> capabilities, queries become more and more complex.
> >>> Now the user needs to use the Create Table statement to create the target
> >>> table first, and then execute the insert statement.
> >>> However, the target table may have many columns, which will bring a lot of
> >>> work outside the business logic to the user.
> >>> At the same time, ensure that the schema of the created target table is
> >>> consistent with the schema of the query result.
> >>> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
> >>>
> >>>
> >>>
> >>> You can find more details in FLIP-218[1]. Looking forward to your feedback.
> >>>
> >>>
> >>>
> >>> [1]
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Best regards,
> >>> Mang Zhang
> >>>
> >

Re:Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi Yun,
Thanks for your reply!
Through offline communication with Dalong, I updated the JobStatusHook part to FLIP, looking forward to your feedback.



--

Best regards,
Mang Zhang





At 2022-05-31 14:34:25, "Yun Gao" <yu...@aliyun.com.INVALID> wrote:
>Hi, 
>
>Regarding the drop operation, with some offline discussion with Dalong and Zhu,
>we think that listening in the client side might be problematic since it would exit
>after submitting the jobs in detached mode, thus the operation might need to
>be in the JobMaster side. 
>
>For the listener interface, currently JobListener only resides in the client side
>and contains unsuitable methods like onJobSubmitted for this scenario, and 
>the internal JobStatusListener is designed to be used inside JM and is not 
>serializable, thus we tend to add a new interface JobStatusHook, 
>which could be attached to the JobGraph and executed in the JobMaster. 
>The interface will also be marked as Internal. 
>
>Best,
>Yun
>
>
>------------------------------------------------------------------
>From:Mang Zhang <zh...@163.com>
>Send Time:2022 May 25 (Wed.) 10:24
>To:dev <de...@flink.apache.org>
>Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>
>Hi, Martijn
>Thanks for your reply!
>I looked at the SQL standard, CTAS is part of the SQL standard.
>Feature T172 is "AS subquery clause in table definition".
>
>
>
>--
>
>Best regards,
>Mang Zhang
>
>
>
>
>
>At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
>>Hi everyone,
>>
>>Can we identify if this proposed syntax is part of the SQL standard?
>>
>>Best regards,
>>
>>Martijn Visser
>>https://twitter.com/MartijnVisser82
>>https://github.com/MartijnVisser
>>
>>
>>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
>>
>>> Thanks for for driving this work, it's to be a useful feature.
>>> About the flip-218, I have some questions.
>>>
>>> 1: Does our CTAS syntax support specify target table's schema including
>>> column name and data type? I think it maybe a useful fature in case we want
>>> to change the data types in target table instead of always copy the source
>>> table's schema. It'll be more flexible with this feature.
>>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
>>>
>>> 2: Seems it'll requre sink to implement an public interface to drop table,
>>> so what's the interface will look like?
>>>
>>> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>>>
>>> Best regards,
>>> Yuxia
>>>
>>> ----- 原始邮件 -----
>>> 发件人: "Mang Zhang" <zh...@163.com>
>>> 收件人: "dev" <de...@flink.apache.org>
>>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>>>
>>> Hi, everyone
>>>
>>>
>>> I would like to open a discussion for support select clause in CREATE
>>> TABLE(CTAS),
>>> With the development of business and the enhancement of flink sql
>>> capabilities, queries become more and more complex.
>>> Now the user needs to use the Create Table statement to create the target
>>> table first, and then execute the insert statement.
>>> However, the target table may have many columns, which will bring a lot of
>>> work outside the business logic to the user.
>>> At the same time, ensure that the schema of the created target table is
>>> consistent with the schema of the query result.
>>> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>>>
>>>
>>>
>>> You can find more details in FLIP-218[1]. Looking forward to your feedback.
>>>
>>>
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Best regards,
>>> Mang Zhang
>>>
>

Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Yun Gao <yu...@aliyun.com.INVALID>.
Hi, 

Regarding the drop operation, with some offline discussion with Dalong and Zhu,
we think that listening in the client side might be problematic since it would exit
after submitting the jobs in detached mode, thus the operation might need to
be in the JobMaster side. 

For the listener interface, currently JobListener only resides in the client side
and contains unsuitable methods like onJobSubmitted for this scenario, and 
the internal JobStatusListener is designed to be used inside JM and is not 
serializable, thus we tend to add a new interface JobStatusHook, 
which could be attached to the JobGraph and executed in the JobMaster. 
The interface will also be marked as Internal. 

Best,
Yun


------------------------------------------------------------------
From:Mang Zhang <zh...@163.com>
Send Time:2022 May 25 (Wed.) 10:24
To:dev <de...@flink.apache.org>
Subject:Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Hi, Martijn
Thanks for your reply!
I looked at the SQL standard, CTAS is part of the SQL standard.
Feature T172 is "AS subquery clause in table definition".



--

Best regards,
Mang Zhang





At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
>Hi everyone,
>
>Can we identify if this proposed syntax is part of the SQL standard?
>
>Best regards,
>
>Martijn Visser
>https://twitter.com/MartijnVisser82
>https://github.com/MartijnVisser
>
>
>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
>
>> Thanks for for driving this work, it's to be a useful feature.
>> About the flip-218, I have some questions.
>>
>> 1: Does our CTAS syntax support specify target table's schema including
>> column name and data type? I think it maybe a useful fature in case we want
>> to change the data types in target table instead of always copy the source
>> table's schema. It'll be more flexible with this feature.
>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
>>
>> 2: Seems it'll requre sink to implement an public interface to drop table,
>> so what's the interface will look like?
>>
>> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>>
>> Best regards,
>> Yuxia
>>
>> ----- 原始邮件 -----
>> 发件人: "Mang Zhang" <zh...@163.com>
>> 收件人: "dev" <de...@flink.apache.org>
>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>>
>> Hi, everyone
>>
>>
>> I would like to open a discussion for support select clause in CREATE
>> TABLE(CTAS),
>> With the development of business and the enhancement of flink sql
>> capabilities, queries become more and more complex.
>> Now the user needs to use the Create Table statement to create the target
>> table first, and then execute the insert statement.
>> However, the target table may have many columns, which will bring a lot of
>> work outside the business logic to the user.
>> At the same time, ensure that the schema of the created target table is
>> consistent with the schema of the query result.
>> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>>
>>
>>
>> You can find more details in FLIP-218[1]. Looking forward to your feedback.
>>
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>


Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Posted by Mang Zhang <zh...@163.com>.
Hi, Martijn
Thanks for your reply!
I looked at the SQL standard, CTAS is part of the SQL standard.
Feature T172 is "AS subquery clause in table definition".



--

Best regards,
Mang Zhang





At 2022-05-04 21:49:00, "Martijn Visser" <ma...@apache.org> wrote:
>Hi everyone,
>
>Can we identify if this proposed syntax is part of the SQL standard?
>
>Best regards,
>
>Martijn Visser
>https://twitter.com/MartijnVisser82
>https://github.com/MartijnVisser
>
>
>On Fri, 29 Apr 2022 at 11:19, yuxia <lu...@alumni.sjtu.edu.cn> wrote:
>
>> Thanks for for driving this work, it's to be a useful feature.
>> About the flip-218, I have some questions.
>>
>> 1: Does our CTAS syntax support specify target table's schema including
>> column name and data type? I think it maybe a useful fature in case we want
>> to change the data types in target table instead of always copy the source
>> table's schema. It'll be more flexible with this feature.
>> Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
>>
>> 2: Seems it'll requre sink to implement an public interface to drop table,
>> so what's the interface will look like?
>>
>> [1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>>
>> Best regards,
>> Yuxia
>>
>> ----- 原始邮件 -----
>> 发件人: "Mang Zhang" <zh...@163.com>
>> 收件人: "dev" <de...@flink.apache.org>
>> 发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>> 主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>>
>> Hi, everyone
>>
>>
>> I would like to open a discussion for support select clause in CREATE
>> TABLE(CTAS),
>> With the development of business and the enhancement of flink sql
>> capabilities, queries become more and more complex.
>> Now the user needs to use the Create Table statement to create the target
>> table first, and then execute the insert statement.
>> However, the target table may have many columns, which will bring a lot of
>> work outside the business logic to the user.
>> At the same time, ensure that the schema of the created target table is
>> consistent with the schema of the query result.
>> Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>>
>>
>>
>> You can find more details in FLIP-218[1]. Looking forward to your feedback.
>>
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Mang Zhang
>>