You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Timo Walther <tw...@apache.org> on 2019/06/24 07:07:36 UTC

Re: [DISCUSS] Ground Source and Sink Concepts in Flink SQL

Thanks for working on this great design document Jark. I think having 
well-defined terminilogy and semantics around tables, changelogs, table 
sources/sinks, and DDL should have been done much earlier. I will take a 
closer look at the concepts and give feedback soon. I think having those 
concepts defined and implemented should be the goal for Flink 1.10. It 
also allows us to align it to the efforts of FLIP-27.

Introducing a DDL is a step that cannot be evolved easily as a DDL is 
basically just a string that is being parsed. We should aim to involve 
as many people as possible to have a future-proof design.

Thanks,
Timo

Am 27.05.19 um 10:40 schrieb Kurt Young:
> Thanks Jark for bringing this topic. I think proper concepts is very
> important for users who are using Table API & SQL. Especially for
> them to have a clear understanding about the behavior of the SQL job. Also
> this is essential for connector developers to have a better
> understanding why we abstracted the interfaces in this way, and have a
> smooth experience when developing connectors for Table & SQL.
>
> Best,
> Kurt
>
>
> On Mon, May 27, 2019 at 3:35 PM Jark Wu <im...@gmail.com> wrote:
>
>> Hi all,
>>
>> We have prepared a design doc [1] about source and sink concepts in Flink
>> SQL. This is actually an extended discussion about SQL DDL [2].
>>
>> In the design doc, we want to figure out some concept problems. For
>> examples:
>>
>> 1. How to define boundedness in DDL
>> 2. How to define a changelog in DDL, what's the behavior of a changelog
>> source and changelog sink?
>> 3. How to define primary key in DDL and what's the semantic when we have a
>> primary key on a table and stream?
>>
>> They are mostly related to DDL because DDL is plain text and we need to
>> keep close to standard as much as possible.
>>
>> This is an important step before we starting to refactor our
>> TableSource/TableSink/TableFactory interfaces. Because we need to know what
>> changes we need to introduce to support these concepts.
>>
>> Please feel free to leave feedbacks in the thread or the design doc.
>>
>> Regards,
>> Jark
>>
>> [1].
>>
>> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
>> [2].
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
>>


Re: [DISCUSS] Ground Source and Sink Concepts in Flink SQL

Posted by Hequn Cheng <ch...@gmail.com>.
Hi Jark,

Impressive document!
I have gone over the document quickly and left some comments. I will have a
detailed look later. Below are two main thoughts from my side:

1. In the TableSource interface, can we move the getBoundedness() method
into the underneath Source?
This brings some benefits like we don't have to add `boundedSource()` to
the env in FLIP-27 and it can also be used in the Table API level. We may
also need to target FLIP-27 for the Flink 1.10 and coordinate these two big
design.

2. How are we going to address the compatible problem?
Are we going to add a totally new TableSource class or made some compatible
design? Maybe a new TableSource class is better? as we change the interface
somehow big.

What do you think?

Best, Hequn


On Mon, Jun 24, 2019 at 3:29 PM Jark Wu <im...@gmail.com> wrote:

> Thanks Timo,
>
> I think it's fine to target it for Flink 1.10.  Looking forward for your
> feedback.
>
> On Mon, 24 Jun 2019 at 15:07, Timo Walther <tw...@apache.org> wrote:
>
> > Thanks for working on this great design document Jark. I think having
> > well-defined terminilogy and semantics around tables, changelogs, table
> > sources/sinks, and DDL should have been done much earlier. I will take a
> > closer look at the concepts and give feedback soon. I think having those
> > concepts defined and implemented should be the goal for Flink 1.10. It
> > also allows us to align it to the efforts of FLIP-27.
> >
> > Introducing a DDL is a step that cannot be evolved easily as a DDL is
> > basically just a string that is being parsed. We should aim to involve
> > as many people as possible to have a future-proof design.
> >
> > Thanks,
> > Timo
> >
> > Am 27.05.19 um 10:40 schrieb Kurt Young:
> > > Thanks Jark for bringing this topic. I think proper concepts is very
> > > important for users who are using Table API & SQL. Especially for
> > > them to have a clear understanding about the behavior of the SQL job.
> > Also
> > > this is essential for connector developers to have a better
> > > understanding why we abstracted the interfaces in this way, and have a
> > > smooth experience when developing connectors for Table & SQL.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Mon, May 27, 2019 at 3:35 PM Jark Wu <im...@gmail.com> wrote:
> > >
> > >> Hi all,
> > >>
> > >> We have prepared a design doc [1] about source and sink concepts in
> > Flink
> > >> SQL. This is actually an extended discussion about SQL DDL [2].
> > >>
> > >> In the design doc, we want to figure out some concept problems. For
> > >> examples:
> > >>
> > >> 1. How to define boundedness in DDL
> > >> 2. How to define a changelog in DDL, what's the behavior of a
> changelog
> > >> source and changelog sink?
> > >> 3. How to define primary key in DDL and what's the semantic when we
> > have a
> > >> primary key on a table and stream?
> > >>
> > >> They are mostly related to DDL because DDL is plain text and we need
> to
> > >> keep close to standard as much as possible.
> > >>
> > >> This is an important step before we starting to refactor our
> > >> TableSource/TableSink/TableFactory interfaces. Because we need to know
> > what
> > >> changes we need to introduce to support these concepts.
> > >>
> > >> Please feel free to leave feedbacks in the thread or the design doc.
> > >>
> > >> Regards,
> > >> Jark
> > >>
> > >> [1].
> > >>
> > >>
> >
> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
> > >> [2].
> > >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
> > >>
> >
> >
>

Re: [DISCUSS] Ground Source and Sink Concepts in Flink SQL

Posted by Jark Wu <im...@gmail.com>.
Thanks Timo,

I think it's fine to target it for Flink 1.10.  Looking forward for your
feedback.

On Mon, 24 Jun 2019 at 15:07, Timo Walther <tw...@apache.org> wrote:

> Thanks for working on this great design document Jark. I think having
> well-defined terminilogy and semantics around tables, changelogs, table
> sources/sinks, and DDL should have been done much earlier. I will take a
> closer look at the concepts and give feedback soon. I think having those
> concepts defined and implemented should be the goal for Flink 1.10. It
> also allows us to align it to the efforts of FLIP-27.
>
> Introducing a DDL is a step that cannot be evolved easily as a DDL is
> basically just a string that is being parsed. We should aim to involve
> as many people as possible to have a future-proof design.
>
> Thanks,
> Timo
>
> Am 27.05.19 um 10:40 schrieb Kurt Young:
> > Thanks Jark for bringing this topic. I think proper concepts is very
> > important for users who are using Table API & SQL. Especially for
> > them to have a clear understanding about the behavior of the SQL job.
> Also
> > this is essential for connector developers to have a better
> > understanding why we abstracted the interfaces in this way, and have a
> > smooth experience when developing connectors for Table & SQL.
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, May 27, 2019 at 3:35 PM Jark Wu <im...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> We have prepared a design doc [1] about source and sink concepts in
> Flink
> >> SQL. This is actually an extended discussion about SQL DDL [2].
> >>
> >> In the design doc, we want to figure out some concept problems. For
> >> examples:
> >>
> >> 1. How to define boundedness in DDL
> >> 2. How to define a changelog in DDL, what's the behavior of a changelog
> >> source and changelog sink?
> >> 3. How to define primary key in DDL and what's the semantic when we
> have a
> >> primary key on a table and stream?
> >>
> >> They are mostly related to DDL because DDL is plain text and we need to
> >> keep close to standard as much as possible.
> >>
> >> This is an important step before we starting to refactor our
> >> TableSource/TableSink/TableFactory interfaces. Because we need to know
> what
> >> changes we need to introduce to support these concepts.
> >>
> >> Please feel free to leave feedbacks in the thread or the design doc.
> >>
> >> Regards,
> >> Jark
> >>
> >> [1].
> >>
> >>
> https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#
> >> [2].
> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-SQL-DDL-Design-tt25006.html
> >>
>
>