You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Timo Walther <tw...@apache.org> on 2018/10/01 08:52:58 UTC

[DISCUSS] Improvements to the Unified SQL Connector API

Hi everyone,

as some of you might have noticed, in the last two releases we aimed to 
unify SQL connectors and make them more modular. The first connectors 
and formats have been implemented and are usable via the SQL Client and 
Java/Scala/SQL APIs.

However, after writing more connectors/example programs and talking to 
users, there are still a couple of improvements that should be applied 
to unified SQL connector API.

I wrote a design document [1] that discusses limitations that I have 
observed and consideres feedback that I have collected over the last 
months. I don't know whether we will implement all of these 
improvements, but it would be great to get feedback for a satisfactory 
API and for future priorization.

The general goal should be to connect to external systems as convenient 
and type-safe as possible. Any feedback is highly appreciated.

Thanks,

Timo

[1] 
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Hequn Cheng <ch...@gmail.com>.

Hi,

It is a good question that how to avoid write to a table accidentally.
I think there are other ways to solve the problem, such as we can provide a
view instead of a table to the users or add a table constraint.

Best,
Hequn

On Fri, Oct 5, 2018 at 1:30 PM Shuyi Chen <su...@gmail.com> wrote:

> In the case of normal Flink job, I agree we can infer the table type from
> the queries. However, for SQL client, the query is adhoc and not known
> beforehand. In such case, we might want to enforce the table open mode at
> startup time, so users won't accidentally write to a Kafka topic that is
> supposed to be written only by some producer. What do you guys think?
>
> Shuyi
>
> On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <ch...@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks a lot for the proposal. I like the idea to unify table
> definitions.
> > I think we can drop the table type since the type can be derived from the
> > sql, i.e, a table be inserted can only be a sink table.
> >
> > I left some minor suggestions in the document, mainly include:
> > - Maybe we also need to allow define properties for tables.
> > - Support specify Computed Columns in a table
> > - Support define keys for sources.
> >
> > Best, Hequn
> >
> >
> > On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <su...@gmail.com> wrote:
> >
> > > Thanks a lot for the proposal, Timo. I left a few comments. Also, it
> > seems
> > > the example in the doc does not have the table type (source, sink and
> > both)
> > > property anymore. Are you suggesting drop it? I think the table type
> > > properties is still useful as it can restrict a certain connector to be
> > > only source/sink, for example, we usually want a Kafka topic to be
> either
> > > read-only or write-only, but not both.
> > >
> > > Shuyi
> > >
> > > On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <tw...@apache.org>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > as some of you might have noticed, in the last two releases we aimed
> to
> > > > unify SQL connectors and make them more modular. The first connectors
> > > > and formats have been implemented and are usable via the SQL Client
> and
> > > > Java/Scala/SQL APIs.
> > > >
> > > > However, after writing more connectors/example programs and talking
> to
> > > > users, there are still a couple of improvements that should be
> applied
> > > > to unified SQL connector API.
> > > >
> > > > I wrote a design document [1] that discusses limitations that I have
> > > > observed and consideres feedback that I have collected over the last
> > > > months. I don't know whether we will implement all of these
> > > > improvements, but it would be great to get feedback for a
> satisfactory
> > > > API and for future priorization.
> > > >
> > > > The general goal should be to connect to external systems as
> convenient
> > > > and type-safe as possible. Any feedback is highly appreciated.
> > > >
> > > > Thanks,
> > > >
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > >
> > > >
> > >
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Timo Walther <tw...@apache.org>.

Hi everyone,

thanks for the feedback that we got so far. I will update the document 
in the next couple of hours such that we can continue with the discussion.

Regarding the table type: Actually I just didn't mention it in the 
document, because the table type is a SQL Client/External catalog 
interface specific property that is evaluated before the unified 
connector API (depending on the table type a source and/or sink is 
discovered). I agree with Shuyi's comments that it should be possible to 
restrict read/write access. The general goal should be that properties 
defined in the design document apply to both sources and sinks, i.e., no 
special source-only or sink-only properties.

@Rong: Currently, a user can not change the way how a table is used in 
the interactive shell. Tables defined in an environment file are 
immutable. This will be possible using a SQL DDL in the future.

Regards,
Timo


Am 05.10.18 um 07:30 schrieb Shuyi Chen:
> In the case of normal Flink job, I agree we can infer the table type from
> the queries. However, for SQL client, the query is adhoc and not known
> beforehand. In such case, we might want to enforce the table open mode at
> startup time, so users won't accidentally write to a Kafka topic that is
> supposed to be written only by some producer. What do you guys think?
>
> Shuyi
>
> On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <ch...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks a lot for the proposal. I like the idea to unify table definitions.
>> I think we can drop the table type since the type can be derived from the
>> sql, i.e, a table be inserted can only be a sink table.
>>
>> I left some minor suggestions in the document, mainly include:
>> - Maybe we also need to allow define properties for tables.
>> - Support specify Computed Columns in a table
>> - Support define keys for sources.
>>
>> Best, Hequn
>>
>>
>> On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <su...@gmail.com> wrote:
>>
>>> Thanks a lot for the proposal, Timo. I left a few comments. Also, it
>> seems
>>> the example in the doc does not have the table type (source, sink and
>> both)
>>> property anymore. Are you suggesting drop it? I think the table type
>>> properties is still useful as it can restrict a certain connector to be
>>> only source/sink, for example, we usually want a Kafka topic to be either
>>> read-only or write-only, but not both.
>>>
>>> Shuyi
>>>
>>> On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> as some of you might have noticed, in the last two releases we aimed to
>>>> unify SQL connectors and make them more modular. The first connectors
>>>> and formats have been implemented and are usable via the SQL Client and
>>>> Java/Scala/SQL APIs.
>>>>
>>>> However, after writing more connectors/example programs and talking to
>>>> users, there are still a couple of improvements that should be applied
>>>> to unified SQL connector API.
>>>>
>>>> I wrote a design document [1] that discusses limitations that I have
>>>> observed and consideres feedback that I have collected over the last
>>>> months. I don't know whether we will implement all of these
>>>> improvements, but it would be great to get feedback for a satisfactory
>>>> API and for future priorization.
>>>>
>>>> The general goal should be to connect to external systems as convenient
>>>> and type-safe as possible. Any feedback is highly appreciated.
>>>>
>>>> Thanks,
>>>>
>>>> Timo
>>>>
>>>> [1]
>>>>
>>>>
>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>>>>
>>> --
>>> "So you have to trust that the dots will somehow connect in your future."
>>>
>

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Shuyi Chen <su...@gmail.com>.

In the case of normal Flink job, I agree we can infer the table type from
the queries. However, for SQL client, the query is adhoc and not known
beforehand. In such case, we might want to enforce the table open mode at
startup time, so users won't accidentally write to a Kafka topic that is
supposed to be written only by some producer. What do you guys think?

Shuyi

On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <ch...@gmail.com> wrote:

> Hi,
>
> Thanks a lot for the proposal. I like the idea to unify table definitions.
> I think we can drop the table type since the type can be derived from the
> sql, i.e, a table be inserted can only be a sink table.
>
> I left some minor suggestions in the document, mainly include:
> - Maybe we also need to allow define properties for tables.
> - Support specify Computed Columns in a table
> - Support define keys for sources.
>
> Best, Hequn
>
>
> On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <su...@gmail.com> wrote:
>
> > Thanks a lot for the proposal, Timo. I left a few comments. Also, it
> seems
> > the example in the doc does not have the table type (source, sink and
> both)
> > property anymore. Are you suggesting drop it? I think the table type
> > properties is still useful as it can restrict a certain connector to be
> > only source/sink, for example, we usually want a Kafka topic to be either
> > read-only or write-only, but not both.
> >
> > Shuyi
> >
> > On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <tw...@apache.org> wrote:
> >
> > > Hi everyone,
> > >
> > > as some of you might have noticed, in the last two releases we aimed to
> > > unify SQL connectors and make them more modular. The first connectors
> > > and formats have been implemented and are usable via the SQL Client and
> > > Java/Scala/SQL APIs.
> > >
> > > However, after writing more connectors/example programs and talking to
> > > users, there are still a couple of improvements that should be applied
> > > to unified SQL connector API.
> > >
> > > I wrote a design document [1] that discusses limitations that I have
> > > observed and consideres feedback that I have collected over the last
> > > months. I don't know whether we will implement all of these
> > > improvements, but it would be great to get feedback for a satisfactory
> > > API and for future priorization.
> > >
> > > The general goal should be to connect to external systems as convenient
> > > and type-safe as possible. Any feedback is highly appreciated.
> > >
> > > Thanks,
> > >
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > >
> > >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>


-- 
"So you have to trust that the dots will somehow connect in your future."

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Rong Rong <wa...@gmail.com>.

Hi Timo,

Thanks for putting together the proposal!
I really love the idea to combining solution for historic and recent data
and left some suggestions on that part.

Regarding the table type, e.g. for kafka streams, I agree with @hequn's
idea that it should be pretty much inferable from the SQL context.
I think there might be some questions need to be addressed when unifying
the definition, for example:
- Should a Kafka table used in "INSERT INTO" statement be used again in
"FROM" statement, and vise versa ?
- How to enforce checks in combo-table use cases ?
- Can user change the way a table is used (e.g. source/sink) in interactive
env such as sql-client ?

Thanks,
Rong

On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <ch...@gmail.com> wrote:

> Hi,
>
> Thanks a lot for the proposal. I like the idea to unify table definitions.
> I think we can drop the table type since the type can be derived from the
> sql, i.e, a table be inserted can only be a sink table.
>
> I left some minor suggestions in the document, mainly include:
> - Maybe we also need to allow define properties for tables.
> - Support specify Computed Columns in a table
> - Support define keys for sources.
>
> Best, Hequn
>
>
> On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <su...@gmail.com> wrote:
>
> > Thanks a lot for the proposal, Timo. I left a few comments. Also, it
> seems
> > the example in the doc does not have the table type (source, sink and
> both)
> > property anymore. Are you suggesting drop it? I think the table type
> > properties is still useful as it can restrict a certain connector to be
> > only source/sink, for example, we usually want a Kafka topic to be either
> > read-only or write-only, but not both.
> >
> > Shuyi
> >
> > On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <tw...@apache.org> wrote:
> >
> > > Hi everyone,
> > >
> > > as some of you might have noticed, in the last two releases we aimed to
> > > unify SQL connectors and make them more modular. The first connectors
> > > and formats have been implemented and are usable via the SQL Client and
> > > Java/Scala/SQL APIs.
> > >
> > > However, after writing more connectors/example programs and talking to
> > > users, there are still a couple of improvements that should be applied
> > > to unified SQL connector API.
> > >
> > > I wrote a design document [1] that discusses limitations that I have
> > > observed and consideres feedback that I have collected over the last
> > > months. I don't know whether we will implement all of these
> > > improvements, but it would be great to get feedback for a satisfactory
> > > API and for future priorization.
> > >
> > > The general goal should be to connect to external systems as convenient
> > > and type-safe as possible. Any feedback is highly appreciated.
> > >
> > > Thanks,
> > >
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > >
> > >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Hequn Cheng <ch...@gmail.com>.

Hi,

Thanks a lot for the proposal. I like the idea to unify table definitions.
I think we can drop the table type since the type can be derived from the
sql, i.e, a table be inserted can only be a sink table.

I left some minor suggestions in the document, mainly include:
- Maybe we also need to allow define properties for tables.
- Support specify Computed Columns in a table
- Support define keys for sources.

Best, Hequn


On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <su...@gmail.com> wrote:

> Thanks a lot for the proposal, Timo. I left a few comments. Also, it seems
> the example in the doc does not have the table type (source, sink and both)
> property anymore. Are you suggesting drop it? I think the table type
> properties is still useful as it can restrict a certain connector to be
> only source/sink, for example, we usually want a Kafka topic to be either
> read-only or write-only, but not both.
>
> Shuyi
>
> On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <tw...@apache.org> wrote:
>
> > Hi everyone,
> >
> > as some of you might have noticed, in the last two releases we aimed to
> > unify SQL connectors and make them more modular. The first connectors
> > and formats have been implemented and are usable via the SQL Client and
> > Java/Scala/SQL APIs.
> >
> > However, after writing more connectors/example programs and talking to
> > users, there are still a couple of improvements that should be applied
> > to unified SQL connector API.
> >
> > I wrote a design document [1] that discusses limitations that I have
> > observed and consideres feedback that I have collected over the last
> > months. I don't know whether we will implement all of these
> > improvements, but it would be great to get feedback for a satisfactory
> > API and for future priorization.
> >
> > The general goal should be to connect to external systems as convenient
> > and type-safe as possible. Any feedback is highly appreciated.
> >
> > Thanks,
> >
> > Timo
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> >
> >
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Shuyi Chen <su...@gmail.com>.

Thanks a lot for the proposal, Timo. I left a few comments. Also, it seems
the example in the doc does not have the table type (source, sink and both)
property anymore. Are you suggesting drop it? I think the table type
properties is still useful as it can restrict a certain connector to be
only source/sink, for example, we usually want a Kafka topic to be either
read-only or write-only, but not both.

Shuyi

On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <tw...@apache.org> wrote:

> Hi everyone,
>
> as some of you might have noticed, in the last two releases we aimed to
> unify SQL connectors and make them more modular. The first connectors
> and formats have been implemented and are usable via the SQL Client and
> Java/Scala/SQL APIs.
>
> However, after writing more connectors/example programs and talking to
> users, there are still a couple of improvements that should be applied
> to unified SQL connector API.
>
> I wrote a design document [1] that discusses limitations that I have
> observed and consideres feedback that I have collected over the last
> months. I don't know whether we will implement all of these
> improvements, but it would be great to get feedback for a satisfactory
> API and for future priorization.
>
> The general goal should be to connect to external systems as convenient
> and type-safe as possible. Any feedback is highly appreciated.
>
> Thanks,
>
> Timo
>
> [1]
>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>
>

-- 
"So you have to trust that the dots will somehow connect in your future."

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Aljoscha Krettek <al...@apache.org>.

Thanks for the proposal!

I like the proposed changes a lot, especially support for reading/writing key data of systems that have a key/value split will be very nice to have.

> On 2. Oct 2018, at 11:58, Timo Walther <tw...@apache.org> wrote:
> 
> Thanks for the feedback Fabian. I updated the document and addressed your comments.
> 
> I agree that tables which are stored in different systems need more discussion. I would suggest to deprecate the field mapping interfaces in this release and remove it in the next release.
> 
> Regards,
> Timo
> 
> 
> Am 02.10.18 um 11:06 schrieb Fabian Hueske:
>> Thanks for the proposal Timo!
>> 
>> I've done a pass and added some comments (mostly asking for clarification,
>> details).
>> Overall, this is going into a very good direction.
>> I think the tables which are stored in different systems and using a format
>> definition to define other formats require some more discussions.
>> However, these are also not the features that we would start with.
>> 
>> >From a compatibility point of view, an important question to answer would
>> be whether we can drop the support for field mapping, i.e., do we have
>> users who take advantage of mapping format fields to fields with a
>> different name in the schema.
>> Besides that, all existing functionality is preserved although the syntax
>> changes a bit.
>> 
>> Best,
>> Fabian
>> 
>> Am Mo., 1. Okt. 2018 um 10:53 Uhr schrieb Timo Walther <tw...@apache.org>:
>> 
>>> Hi everyone,
>>> 
>>> as some of you might have noticed, in the last two releases we aimed to
>>> unify SQL connectors and make them more modular. The first connectors
>>> and formats have been implemented and are usable via the SQL Client and
>>> Java/Scala/SQL APIs.
>>> 
>>> However, after writing more connectors/example programs and talking to
>>> users, there are still a couple of improvements that should be applied
>>> to unified SQL connector API.
>>> 
>>> I wrote a design document [1] that discusses limitations that I have
>>> observed and consideres feedback that I have collected over the last
>>> months. I don't know whether we will implement all of these
>>> improvements, but it would be great to get feedback for a satisfactory
>>> API and for future priorization.
>>> 
>>> The general goal should be to connect to external systems as convenient
>>> and type-safe as possible. Any feedback is highly appreciated.
>>> 
>>> Thanks,
>>> 
>>> Timo
>>> 
>>> [1]
>>> 
>>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>>> 
>>> 
>

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Timo Walther <tw...@apache.org>.

Thanks for the feedback Fabian. I updated the document and addressed 
your comments.

I agree that tables which are stored in different systems need more 
discussion. I would suggest to deprecate the field mapping interfaces in 
this release and remove it in the next release.

Regards,
Timo


Am 02.10.18 um 11:06 schrieb Fabian Hueske:
> Thanks for the proposal Timo!
>
> I've done a pass and added some comments (mostly asking for clarification,
> details).
> Overall, this is going into a very good direction.
> I think the tables which are stored in different systems and using a format
> definition to define other formats require some more discussions.
> However, these are also not the features that we would start with.
>
> >From a compatibility point of view, an important question to answer would
> be whether we can drop the support for field mapping, i.e., do we have
> users who take advantage of mapping format fields to fields with a
> different name in the schema.
> Besides that, all existing functionality is preserved although the syntax
> changes a bit.
>
> Best,
> Fabian
>
> Am Mo., 1. Okt. 2018 um 10:53 Uhr schrieb Timo Walther <tw...@apache.org>:
>
>> Hi everyone,
>>
>> as some of you might have noticed, in the last two releases we aimed to
>> unify SQL connectors and make them more modular. The first connectors
>> and formats have been implemented and are usable via the SQL Client and
>> Java/Scala/SQL APIs.
>>
>> However, after writing more connectors/example programs and talking to
>> users, there are still a couple of improvements that should be applied
>> to unified SQL connector API.
>>
>> I wrote a design document [1] that discusses limitations that I have
>> observed and consideres feedback that I have collected over the last
>> months. I don't know whether we will implement all of these
>> improvements, but it would be great to get feedback for a satisfactory
>> API and for future priorization.
>>
>> The general goal should be to connect to external systems as convenient
>> and type-safe as possible. Any feedback is highly appreciated.
>>
>> Thanks,
>>
>> Timo
>>
>> [1]
>>
>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>>
>>

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Posted by Fabian Hueske <fh...@gmail.com>.

Thanks for the proposal Timo!

I've done a pass and added some comments (mostly asking for clarification,
details).
Overall, this is going into a very good direction.
I think the tables which are stored in different systems and using a format
definition to define other formats require some more discussions.
However, these are also not the features that we would start with.

From a compatibility point of view, an important question to answer would
be whether we can drop the support for field mapping, i.e., do we have
users who take advantage of mapping format fields to fields with a
different name in the schema.
Besides that, all existing functionality is preserved although the syntax
changes a bit.

Best,
Fabian

Am Mo., 1. Okt. 2018 um 10:53 Uhr schrieb Timo Walther <tw...@apache.org>:

> Hi everyone,
>
> as some of you might have noticed, in the last two releases we aimed to
> unify SQL connectors and make them more modular. The first connectors
> and formats have been implemented and are usable via the SQL Client and
> Java/Scala/SQL APIs.
>
> However, after writing more connectors/example programs and talking to
> users, there are still a couple of improvements that should be applied
> to unified SQL connector API.
>
> I wrote a design document [1] that discusses limitations that I have
> observed and consideres feedback that I have collected over the last
> months. I don't know whether we will implement all of these
> improvements, but it would be great to get feedback for a satisfactory
> API and for future priorization.
>
> The general goal should be to connect to external systems as convenient
> and type-safe as possible. Any feedback is highly appreciated.
>
> Thanks,
>
> Timo
>
> [1]
>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>
>