You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Kurt Young <yk...@gmail.com> on 2020/02/05 06:36:28 UTC

[DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Hi all,

I'd like to bring up a discussion about removing registration of
TableSource and
TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
affected
method would be:

TableEnvironment::registerTableSource
TableEnvironment::fromTableSource
TableEnvironment::registerTableSink
ConnectTableDescriptor::registerTableSource
ConnectTableDescriptor::registerTableSink
ConnectTableDescriptor::registerTableSourceAndSink

(Most of them are already deprecated, except for
TableEnvironment::fromTableSource,
which was intended to deprecate but missed by accident).

FLIP-64 [1] already explained why we want to deprecate TableSource &
TableSink from
user's interface. In a short word, these interfaces should only read &
write the physical
representation of the table, and they are not fitting well after we already
introduced some
logical table fields such as computed column, watermarks.

Another reason is the exposure of registerTableSource in Table Env just
make the whole
SQL protocol opposite. TableSource should be used as a reader of table, it
should rely on
other metadata information held by framework, which eventually comes from
DDL or
ConnectDescriptor. But if we register a TableSource to Table Env, we have
no choice but
have to rely on TableSource::getTableSchema. It will make the design
obscure, sometimes
TableSource should trust the information comes from framework, but
sometimes it should
also generate its own schema information.

Furthermore, if the authority about schema information is not clear, it
will make things much
more complicated if we want to improve the table api usability such as
introducing automatic
schema inference in the near future.

Since this is an API break change, I've also included user mailing list to
gather more feedbacks.

Best,
Kurt

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Danny Chan <yu...@gmail.com>.
Thanks for driving this Kurt.

Because we already have DDL and Descriptor as an alternative of these deprecated methods, removing them would reduce ambiguity and make the near future work more easier.

As we discussed offline, although some of the connectors may still have attributes that cannot be specified though properties, removing them force us to re-think the TableFactory/properties interface.

Best,
Danny Chan
在 2020年2月5日 +0800 PM3:58,Dawid Wysakowicz <dw...@apache.org>,写道:
> Hi Kurt,
>
> I fully agree with the proposal. Yes it was an omission that we did not
> deprecate the TableEnvironment#fromTableSource in the previous version.
>
> I would vote to remove all those methods altogether.
>
> Best,
>
> Dawid
>
> On 05/02/2020 07:36, Kurt Young wrote:
> > Hi all,
> >
> > I'd like to bring up a discussion about removing registration of
> > TableSource and
> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> > affected
> > method would be:
> >
> > TableEnvironment::registerTableSource
> > TableEnvironment::fromTableSource
> > TableEnvironment::registerTableSink
> > ConnectTableDescriptor::registerTableSource
> > ConnectTableDescriptor::registerTableSink
> > ConnectTableDescriptor::registerTableSourceAndSink
> >
> > (Most of them are already deprecated, except for
> > TableEnvironment::fromTableSource,
> > which was intended to deprecate but missed by accident).
> >
> > FLIP-64 [1] already explained why we want to deprecate TableSource &
> > TableSink from
> > user's interface. In a short word, these interfaces should only read &
> > write the physical
> > representation of the table, and they are not fitting well after we already
> > introduced some
> > logical table fields such as computed column, watermarks.
> >
> > Another reason is the exposure of registerTableSource in Table Env just
> > make the whole
> > SQL protocol opposite. TableSource should be used as a reader of table, it
> > should rely on
> > other metadata information held by framework, which eventually comes from
> > DDL or
> > ConnectDescriptor. But if we register a TableSource to Table Env, we have
> > no choice but
> > have to rely on TableSource::getTableSchema. It will make the design
> > obscure, sometimes
> > TableSource should trust the information comes from framework, but
> > sometimes it should
> > also generate its own schema information.
> >
> > Furthermore, if the authority about schema information is not clear, it
> > will make things much
> > more complicated if we want to improve the table api usability such as
> > introducing automatic
> > schema inference in the near future.
> >
> > Since this is an API break change, I've also included user mailing list to
> > gather more feedbacks.
> >
> > Best,
> > Kurt
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
>

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Jingsong Li <ji...@gmail.com>.
HI Kurt,

+1 to remove these methods.

But one concern is that some of the current TableSource/TableSink may not
be ready, such as the JDBCUpsertTableSink, which accepts a JDBCDialect, but
through the TableFactory, there is no way to pass in the JDBCDialect at
present. But I also believe we have enough time on 1.11 to prepare them.
Then unified to TableFactory.

I think there may be complaints from users because they used to only
implement a TableSource or TableSink, but now they have to implement
TableFactory. But I think it's also good for conceptual clarity to force
them to implement TableFactory.

Another idea is can we provide some utils to help user implement
TableFactory? The current method may be a little bit complex, and it needs
to be added to
the "META_INF/services/org.apache.flink.table.factories.TableFactory" file.

Best,
Jingsong Lee

On Wed, Feb 5, 2020 at 3:58 PM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi Kurt,
>
> I fully agree with the proposal. Yes it was an omission that we did not
> deprecate the TableEnvironment#fromTableSource in the previous version.
>
> I would vote to remove all those methods altogether.
>
> Best,
>
> Dawid
>
> On 05/02/2020 07:36, Kurt Young wrote:
> > Hi all,
> >
> > I'd like to bring up a discussion about removing registration of
> > TableSource and
> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> > affected
> > method would be:
> >
> > TableEnvironment::registerTableSource
> > TableEnvironment::fromTableSource
> > TableEnvironment::registerTableSink
> > ConnectTableDescriptor::registerTableSource
> > ConnectTableDescriptor::registerTableSink
> > ConnectTableDescriptor::registerTableSourceAndSink
> >
> > (Most of them are already deprecated, except for
> > TableEnvironment::fromTableSource,
> > which was intended to deprecate but missed by accident).
> >
> > FLIP-64 [1] already explained why we want to deprecate TableSource &
> > TableSink from
> > user's interface. In a short word, these interfaces should only read &
> > write the physical
> > representation of the table, and they are not fitting well after we
> already
> > introduced some
> > logical table fields such as computed column, watermarks.
> >
> > Another reason is the exposure of registerTableSource in Table Env just
> > make the whole
> > SQL protocol opposite. TableSource should be used as a reader of table,
> it
> > should rely on
> > other metadata information held by framework, which eventually comes from
> > DDL or
> > ConnectDescriptor. But if we register a TableSource to Table Env, we have
> > no choice but
> > have to rely on TableSource::getTableSchema. It will make the design
> > obscure, sometimes
> > TableSource should trust the information comes from framework, but
> > sometimes it should
> > also generate its own schema information.
> >
> > Furthermore, if the authority about schema information is not clear, it
> > will make things much
> > more complicated if we want to improve the table api usability such as
> > introducing automatic
> > schema inference in the near future.
> >
> > Since this is an API break change, I've also included user mailing list
> to
> > gather more feedbacks.
> >
> > Best,
> > Kurt
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Kurt,

I fully agree with the proposal. Yes it was an omission that we did not
deprecate the TableEnvironment#fromTableSource in the previous version.

I would vote to remove all those methods altogether.

Best,

Dawid

On 05/02/2020 07:36, Kurt Young wrote:
> Hi all,
>
> I'd like to bring up a discussion about removing registration of
> TableSource and
> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> affected
> method would be:
>
> TableEnvironment::registerTableSource
> TableEnvironment::fromTableSource
> TableEnvironment::registerTableSink
> ConnectTableDescriptor::registerTableSource
> ConnectTableDescriptor::registerTableSink
> ConnectTableDescriptor::registerTableSourceAndSink
>
> (Most of them are already deprecated, except for
> TableEnvironment::fromTableSource,
> which was intended to deprecate but missed by accident).
>
> FLIP-64 [1] already explained why we want to deprecate TableSource &
> TableSink from
> user's interface. In a short word, these interfaces should only read &
> write the physical
> representation of the table, and they are not fitting well after we already
> introduced some
> logical table fields such as computed column, watermarks.
>
> Another reason is the exposure of registerTableSource in Table Env just
> make the whole
> SQL protocol opposite. TableSource should be used as a reader of table, it
> should rely on
> other metadata information held by framework, which eventually comes from
> DDL or
> ConnectDescriptor. But if we register a TableSource to Table Env, we have
> no choice but
> have to rely on TableSource::getTableSchema. It will make the design
> obscure, sometimes
> TableSource should trust the information comes from framework, but
> sometimes it should
> also generate its own schema information.
>
> Furthermore, if the authority about schema information is not clear, it
> will make things much
> more complicated if we want to improve the table api usability such as
> introducing automatic
> schema inference in the near future.
>
> Since this is an API break change, I've also included user mailing list to
> gather more feedbacks.
>
> Best,
> Kurt
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>


Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Timo Walther <tw...@apache.org>.
Hi Kurt,

no there is no JIRA ticket yet. But in any case, I think it is better to 
have good testing infrastructure that abstracts source generation, sink 
generation, testing data etc. If we will introduce tableEnv.values() it 
will also not solve everything because time-based operations might need 
time attributes and so on.

Using DDL in tests should also be avoided because strings are even more 
difficult to maintain.

Regards,
Timo


On 08.02.20 04:29, Kurt Young wrote:
> Hi Timo,
> 
> tableEnv.fromElements/values() sounds good, do we have a jira ticket to
> track the issue?
> 
> Best,
> Kurt
> 
> 
> On Fri, Feb 7, 2020 at 10:56 PM Timo Walther <tw...@apache.org> wrote:
> 
>> Hi Kurt,
>>
>> Dawid is currently working on making a tableEnv.fromElements/values()
>> kind of source possible in the future. We can use this to replace some
>> of the tests. Otherwise I guess we should come up with a better test
>> infrastructure to make defining source not necessary anymore.
>>
>> Regards,
>> Timo
>>
>>
>> On 07.02.20 11:24, Kurt Young wrote:
>>> Thanks all for your feedback, since no objection has been raised, I've
>>> created
>>> https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.
>>>
>>> Since this issue would require lots of tests adjustment before it really
>>> happen,
>>> it won't be done in a short time. Feel free to give feedback anytime here
>>> or in jira
>>> if you have other opinions.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <yk...@gmail.com> wrote:
>>>
>>>> Hi Zhenghua,
>>>>
>>>> After removing TableSource::getTableSchema, during optimization, I could
>>>> imagine
>>>> the schema information might come from relational nodes such as
>> TableScan.
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <yk...@gmail.com> wrote:
>>>>
>>>>> Hi Jingsong,
>>>>>
>>>>> Yes current TableFactory is not ideal for users to use either. I think
>> we
>>>>> should
>>>>> also spend some time in 1.11 to improve the usability of
>> TableEnvironment
>>>>> when
>>>>> users trying to read or write something. Automatic scheme inference
>> would
>>>>> be
>>>>> one of them. Other from this, we also support convert a DataStream to
>>>>> Table, which
>>>>> can serve some flexible requirements to read or write data.
>>>>>
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>>
>>>>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <do...@gmail.com> wrote:
>>>>>
>>>>>> +1 to remove these methods.
>>>>>>
>>>>>> One concern about invocations of TableSource::getTableSchema:
>>>>>> By removing such methods, we can stop calling
>> TableSource::getTableSchema
>>>>>> in some place(such
>>>>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>>>>>> ConnectorCatalogTable, TableSourceQueryOperation).
>>>>>>
>>>>>> But in other place we need field types and names of the table
>> source(such
>>>>>> as
>>>>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>>>>>> PushProjectIntoTableSourceScanRule,
>>>>>> CommonLookupJoin).  So how should we deal with this?
>>>>>>
>>>>>> *Best Regards,*
>>>>>> *Zhenghua Gao*
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <yk...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'd like to bring up a discussion about removing registration of
>>>>>>> TableSource and
>>>>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor.
>> The
>>>>>>> affected
>>>>>>> method would be:
>>>>>>>
>>>>>>> TableEnvironment::registerTableSource
>>>>>>> TableEnvironment::fromTableSource
>>>>>>> TableEnvironment::registerTableSink
>>>>>>> ConnectTableDescriptor::registerTableSource
>>>>>>> ConnectTableDescriptor::registerTableSink
>>>>>>> ConnectTableDescriptor::registerTableSourceAndSink
>>>>>>>
>>>>>>> (Most of them are already deprecated, except for
>>>>>>> TableEnvironment::fromTableSource,
>>>>>>> which was intended to deprecate but missed by accident).
>>>>>>>
>>>>>>> FLIP-64 [1] already explained why we want to deprecate TableSource &
>>>>>>> TableSink from
>>>>>>> user's interface. In a short word, these interfaces should only read
>> &
>>>>>>> write the physical
>>>>>>> representation of the table, and they are not fitting well after we
>>>>>> already
>>>>>>> introduced some
>>>>>>> logical table fields such as computed column, watermarks.
>>>>>>>
>>>>>>> Another reason is the exposure of registerTableSource in Table Env
>> just
>>>>>>> make the whole
>>>>>>> SQL protocol opposite. TableSource should be used as a reader of
>>>>>> table, it
>>>>>>> should rely on
>>>>>>> other metadata information held by framework, which eventually comes
>>>>>> from
>>>>>>> DDL or
>>>>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we
>>>>>> have
>>>>>>> no choice but
>>>>>>> have to rely on TableSource::getTableSchema. It will make the design
>>>>>>> obscure, sometimes
>>>>>>> TableSource should trust the information comes from framework, but
>>>>>>> sometimes it should
>>>>>>> also generate its own schema information.
>>>>>>>
>>>>>>> Furthermore, if the authority about schema information is not clear,
>> it
>>>>>>> will make things much
>>>>>>> more complicated if we want to improve the table api usability such
>> as
>>>>>>> introducing automatic
>>>>>>> schema inference in the near future.
>>>>>>>
>>>>>>> Since this is an API break change, I've also included user mailing
>>>>>> list to
>>>>>>> gather more feedbacks.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kurt
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
> 


Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Kurt Young <yk...@gmail.com>.
Hi Timo,

tableEnv.fromElements/values() sounds good, do we have a jira ticket to
track the issue?

Best,
Kurt


On Fri, Feb 7, 2020 at 10:56 PM Timo Walther <tw...@apache.org> wrote:

> Hi Kurt,
>
> Dawid is currently working on making a tableEnv.fromElements/values()
> kind of source possible in the future. We can use this to replace some
> of the tests. Otherwise I guess we should come up with a better test
> infrastructure to make defining source not necessary anymore.
>
> Regards,
> Timo
>
>
> On 07.02.20 11:24, Kurt Young wrote:
> > Thanks all for your feedback, since no objection has been raised, I've
> > created
> > https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.
> >
> > Since this issue would require lots of tests adjustment before it really
> > happen,
> > it won't be done in a short time. Feel free to give feedback anytime here
> > or in jira
> > if you have other opinions.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <yk...@gmail.com> wrote:
> >
> >> Hi Zhenghua,
> >>
> >> After removing TableSource::getTableSchema, during optimization, I could
> >> imagine
> >> the schema information might come from relational nodes such as
> TableScan.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <yk...@gmail.com> wrote:
> >>
> >>> Hi Jingsong,
> >>>
> >>> Yes current TableFactory is not ideal for users to use either. I think
> we
> >>> should
> >>> also spend some time in 1.11 to improve the usability of
> TableEnvironment
> >>> when
> >>> users trying to read or write something. Automatic scheme inference
> would
> >>> be
> >>> one of them. Other from this, we also support convert a DataStream to
> >>> Table, which
> >>> can serve some flexible requirements to read or write data.
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <do...@gmail.com> wrote:
> >>>
> >>>> +1 to remove these methods.
> >>>>
> >>>> One concern about invocations of TableSource::getTableSchema:
> >>>> By removing such methods, we can stop calling
> TableSource::getTableSchema
> >>>> in some place(such
> >>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
> >>>> ConnectorCatalogTable, TableSourceQueryOperation).
> >>>>
> >>>> But in other place we need field types and names of the table
> source(such
> >>>> as
> >>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
> >>>> PushProjectIntoTableSourceScanRule,
> >>>> CommonLookupJoin).  So how should we deal with this?
> >>>>
> >>>> *Best Regards,*
> >>>> *Zhenghua Gao*
> >>>>
> >>>>
> >>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <yk...@gmail.com> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'd like to bring up a discussion about removing registration of
> >>>>> TableSource and
> >>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor.
> The
> >>>>> affected
> >>>>> method would be:
> >>>>>
> >>>>> TableEnvironment::registerTableSource
> >>>>> TableEnvironment::fromTableSource
> >>>>> TableEnvironment::registerTableSink
> >>>>> ConnectTableDescriptor::registerTableSource
> >>>>> ConnectTableDescriptor::registerTableSink
> >>>>> ConnectTableDescriptor::registerTableSourceAndSink
> >>>>>
> >>>>> (Most of them are already deprecated, except for
> >>>>> TableEnvironment::fromTableSource,
> >>>>> which was intended to deprecate but missed by accident).
> >>>>>
> >>>>> FLIP-64 [1] already explained why we want to deprecate TableSource &
> >>>>> TableSink from
> >>>>> user's interface. In a short word, these interfaces should only read
> &
> >>>>> write the physical
> >>>>> representation of the table, and they are not fitting well after we
> >>>> already
> >>>>> introduced some
> >>>>> logical table fields such as computed column, watermarks.
> >>>>>
> >>>>> Another reason is the exposure of registerTableSource in Table Env
> just
> >>>>> make the whole
> >>>>> SQL protocol opposite. TableSource should be used as a reader of
> >>>> table, it
> >>>>> should rely on
> >>>>> other metadata information held by framework, which eventually comes
> >>>> from
> >>>>> DDL or
> >>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we
> >>>> have
> >>>>> no choice but
> >>>>> have to rely on TableSource::getTableSchema. It will make the design
> >>>>> obscure, sometimes
> >>>>> TableSource should trust the information comes from framework, but
> >>>>> sometimes it should
> >>>>> also generate its own schema information.
> >>>>>
> >>>>> Furthermore, if the authority about schema information is not clear,
> it
> >>>>> will make things much
> >>>>> more complicated if we want to improve the table api usability such
> as
> >>>>> introducing automatic
> >>>>> schema inference in the near future.
> >>>>>
> >>>>> Since this is an API break change, I've also included user mailing
> >>>> list to
> >>>>> gather more feedbacks.
> >>>>>
> >>>>> Best,
> >>>>> Kurt
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>>>>
> >>>>
> >>>
> >
>
>

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Timo Walther <tw...@apache.org>.
Hi Kurt,

Dawid is currently working on making a tableEnv.fromElements/values() 
kind of source possible in the future. We can use this to replace some 
of the tests. Otherwise I guess we should come up with a better test 
infrastructure to make defining source not necessary anymore.

Regards,
Timo


On 07.02.20 11:24, Kurt Young wrote:
> Thanks all for your feedback, since no objection has been raised, I've
> created
> https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.
> 
> Since this issue would require lots of tests adjustment before it really
> happen,
> it won't be done in a short time. Feel free to give feedback anytime here
> or in jira
> if you have other opinions.
> 
> Best,
> Kurt
> 
> 
> On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <yk...@gmail.com> wrote:
> 
>> Hi Zhenghua,
>>
>> After removing TableSource::getTableSchema, during optimization, I could
>> imagine
>> the schema information might come from relational nodes such as TableScan.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <yk...@gmail.com> wrote:
>>
>>> Hi Jingsong,
>>>
>>> Yes current TableFactory is not ideal for users to use either. I think we
>>> should
>>> also spend some time in 1.11 to improve the usability of TableEnvironment
>>> when
>>> users trying to read or write something. Automatic scheme inference would
>>> be
>>> one of them. Other from this, we also support convert a DataStream to
>>> Table, which
>>> can serve some flexible requirements to read or write data.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <do...@gmail.com> wrote:
>>>
>>>> +1 to remove these methods.
>>>>
>>>> One concern about invocations of TableSource::getTableSchema:
>>>> By removing such methods, we can stop calling TableSource::getTableSchema
>>>> in some place(such
>>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>>>> ConnectorCatalogTable, TableSourceQueryOperation).
>>>>
>>>> But in other place we need field types and names of the table source(such
>>>> as
>>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>>>> PushProjectIntoTableSourceScanRule,
>>>> CommonLookupJoin).  So how should we deal with this?
>>>>
>>>> *Best Regards,*
>>>> *Zhenghua Gao*
>>>>
>>>>
>>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <yk...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to bring up a discussion about removing registration of
>>>>> TableSource and
>>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
>>>>> affected
>>>>> method would be:
>>>>>
>>>>> TableEnvironment::registerTableSource
>>>>> TableEnvironment::fromTableSource
>>>>> TableEnvironment::registerTableSink
>>>>> ConnectTableDescriptor::registerTableSource
>>>>> ConnectTableDescriptor::registerTableSink
>>>>> ConnectTableDescriptor::registerTableSourceAndSink
>>>>>
>>>>> (Most of them are already deprecated, except for
>>>>> TableEnvironment::fromTableSource,
>>>>> which was intended to deprecate but missed by accident).
>>>>>
>>>>> FLIP-64 [1] already explained why we want to deprecate TableSource &
>>>>> TableSink from
>>>>> user's interface. In a short word, these interfaces should only read &
>>>>> write the physical
>>>>> representation of the table, and they are not fitting well after we
>>>> already
>>>>> introduced some
>>>>> logical table fields such as computed column, watermarks.
>>>>>
>>>>> Another reason is the exposure of registerTableSource in Table Env just
>>>>> make the whole
>>>>> SQL protocol opposite. TableSource should be used as a reader of
>>>> table, it
>>>>> should rely on
>>>>> other metadata information held by framework, which eventually comes
>>>> from
>>>>> DDL or
>>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we
>>>> have
>>>>> no choice but
>>>>> have to rely on TableSource::getTableSchema. It will make the design
>>>>> obscure, sometimes
>>>>> TableSource should trust the information comes from framework, but
>>>>> sometimes it should
>>>>> also generate its own schema information.
>>>>>
>>>>> Furthermore, if the authority about schema information is not clear, it
>>>>> will make things much
>>>>> more complicated if we want to improve the table api usability such as
>>>>> introducing automatic
>>>>> schema inference in the near future.
>>>>>
>>>>> Since this is an API break change, I've also included user mailing
>>>> list to
>>>>> gather more feedbacks.
>>>>>
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>>>>
>>>>
>>>
> 


Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Kurt Young <yk...@gmail.com>.
Thanks all for your feedback, since no objection has been raised, I've
created
https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.

Since this issue would require lots of tests adjustment before it really
happen,
it won't be done in a short time. Feel free to give feedback anytime here
or in jira
if you have other opinions.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <yk...@gmail.com> wrote:

> Hi Zhenghua,
>
> After removing TableSource::getTableSchema, during optimization, I could
> imagine
> the schema information might come from relational nodes such as TableScan.
>
> Best,
> Kurt
>
>
> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <yk...@gmail.com> wrote:
>
>> Hi Jingsong,
>>
>> Yes current TableFactory is not ideal for users to use either. I think we
>> should
>> also spend some time in 1.11 to improve the usability of TableEnvironment
>> when
>> users trying to read or write something. Automatic scheme inference would
>> be
>> one of them. Other from this, we also support convert a DataStream to
>> Table, which
>> can serve some flexible requirements to read or write data.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <do...@gmail.com> wrote:
>>
>>> +1 to remove these methods.
>>>
>>> One concern about invocations of TableSource::getTableSchema:
>>> By removing such methods, we can stop calling TableSource::getTableSchema
>>> in some place(such
>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>>> ConnectorCatalogTable, TableSourceQueryOperation).
>>>
>>> But in other place we need field types and names of the table source(such
>>> as
>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>>> PushProjectIntoTableSourceScanRule,
>>> CommonLookupJoin).  So how should we deal with this?
>>>
>>> *Best Regards,*
>>> *Zhenghua Gao*
>>>
>>>
>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <yk...@gmail.com> wrote:
>>>
>>> > Hi all,
>>> >
>>> > I'd like to bring up a discussion about removing registration of
>>> > TableSource and
>>> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
>>> > affected
>>> > method would be:
>>> >
>>> > TableEnvironment::registerTableSource
>>> > TableEnvironment::fromTableSource
>>> > TableEnvironment::registerTableSink
>>> > ConnectTableDescriptor::registerTableSource
>>> > ConnectTableDescriptor::registerTableSink
>>> > ConnectTableDescriptor::registerTableSourceAndSink
>>> >
>>> > (Most of them are already deprecated, except for
>>> > TableEnvironment::fromTableSource,
>>> > which was intended to deprecate but missed by accident).
>>> >
>>> > FLIP-64 [1] already explained why we want to deprecate TableSource &
>>> > TableSink from
>>> > user's interface. In a short word, these interfaces should only read &
>>> > write the physical
>>> > representation of the table, and they are not fitting well after we
>>> already
>>> > introduced some
>>> > logical table fields such as computed column, watermarks.
>>> >
>>> > Another reason is the exposure of registerTableSource in Table Env just
>>> > make the whole
>>> > SQL protocol opposite. TableSource should be used as a reader of
>>> table, it
>>> > should rely on
>>> > other metadata information held by framework, which eventually comes
>>> from
>>> > DDL or
>>> > ConnectDescriptor. But if we register a TableSource to Table Env, we
>>> have
>>> > no choice but
>>> > have to rely on TableSource::getTableSchema. It will make the design
>>> > obscure, sometimes
>>> > TableSource should trust the information comes from framework, but
>>> > sometimes it should
>>> > also generate its own schema information.
>>> >
>>> > Furthermore, if the authority about schema information is not clear, it
>>> > will make things much
>>> > more complicated if we want to improve the table api usability such as
>>> > introducing automatic
>>> > schema inference in the near future.
>>> >
>>> > Since this is an API break change, I've also included user mailing
>>> list to
>>> > gather more feedbacks.
>>> >
>>> > Best,
>>> > Kurt
>>> >
>>> > [1]
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>> >
>>>
>>

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Kurt Young <yk...@gmail.com>.
Hi Zhenghua,

After removing TableSource::getTableSchema, during optimization, I could
imagine
the schema information might come from relational nodes such as TableScan.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <yk...@gmail.com> wrote:

> Hi Jingsong,
>
> Yes current TableFactory is not ideal for users to use either. I think we
> should
> also spend some time in 1.11 to improve the usability of TableEnvironment
> when
> users trying to read or write something. Automatic scheme inference would
> be
> one of them. Other from this, we also support convert a DataStream to
> Table, which
> can serve some flexible requirements to read or write data.
>
> Best,
> Kurt
>
>
> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <do...@gmail.com> wrote:
>
>> +1 to remove these methods.
>>
>> One concern about invocations of TableSource::getTableSchema:
>> By removing such methods, we can stop calling TableSource::getTableSchema
>> in some place(such
>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>> ConnectorCatalogTable, TableSourceQueryOperation).
>>
>> But in other place we need field types and names of the table source(such
>> as
>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>> PushProjectIntoTableSourceScanRule,
>> CommonLookupJoin).  So how should we deal with this?
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <yk...@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I'd like to bring up a discussion about removing registration of
>> > TableSource and
>> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
>> > affected
>> > method would be:
>> >
>> > TableEnvironment::registerTableSource
>> > TableEnvironment::fromTableSource
>> > TableEnvironment::registerTableSink
>> > ConnectTableDescriptor::registerTableSource
>> > ConnectTableDescriptor::registerTableSink
>> > ConnectTableDescriptor::registerTableSourceAndSink
>> >
>> > (Most of them are already deprecated, except for
>> > TableEnvironment::fromTableSource,
>> > which was intended to deprecate but missed by accident).
>> >
>> > FLIP-64 [1] already explained why we want to deprecate TableSource &
>> > TableSink from
>> > user's interface. In a short word, these interfaces should only read &
>> > write the physical
>> > representation of the table, and they are not fitting well after we
>> already
>> > introduced some
>> > logical table fields such as computed column, watermarks.
>> >
>> > Another reason is the exposure of registerTableSource in Table Env just
>> > make the whole
>> > SQL protocol opposite. TableSource should be used as a reader of table,
>> it
>> > should rely on
>> > other metadata information held by framework, which eventually comes
>> from
>> > DDL or
>> > ConnectDescriptor. But if we register a TableSource to Table Env, we
>> have
>> > no choice but
>> > have to rely on TableSource::getTableSchema. It will make the design
>> > obscure, sometimes
>> > TableSource should trust the information comes from framework, but
>> > sometimes it should
>> > also generate its own schema information.
>> >
>> > Furthermore, if the authority about schema information is not clear, it
>> > will make things much
>> > more complicated if we want to improve the table api usability such as
>> > introducing automatic
>> > schema inference in the near future.
>> >
>> > Since this is an API break change, I've also included user mailing list
>> to
>> > gather more feedbacks.
>> >
>> > Best,
>> > Kurt
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>> >
>>
>

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Kurt Young <yk...@gmail.com>.
Hi Jingsong,

Yes current TableFactory is not ideal for users to use either. I think we
should
also spend some time in 1.11 to improve the usability of TableEnvironment
when
users trying to read or write something. Automatic scheme inference would
be
one of them. Other from this, we also support convert a DataStream to
Table, which
can serve some flexible requirements to read or write data.

Best,
Kurt


On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <do...@gmail.com> wrote:

> +1 to remove these methods.
>
> One concern about invocations of TableSource::getTableSchema:
> By removing such methods, we can stop calling TableSource::getTableSchema
> in some place(such
> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
> ConnectorCatalogTable, TableSourceQueryOperation).
>
> But in other place we need field types and names of the table source(such
> as
> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
> PushProjectIntoTableSourceScanRule,
> CommonLookupJoin).  So how should we deal with this?
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <yk...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to bring up a discussion about removing registration of
> > TableSource and
> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> > affected
> > method would be:
> >
> > TableEnvironment::registerTableSource
> > TableEnvironment::fromTableSource
> > TableEnvironment::registerTableSink
> > ConnectTableDescriptor::registerTableSource
> > ConnectTableDescriptor::registerTableSink
> > ConnectTableDescriptor::registerTableSourceAndSink
> >
> > (Most of them are already deprecated, except for
> > TableEnvironment::fromTableSource,
> > which was intended to deprecate but missed by accident).
> >
> > FLIP-64 [1] already explained why we want to deprecate TableSource &
> > TableSink from
> > user's interface. In a short word, these interfaces should only read &
> > write the physical
> > representation of the table, and they are not fitting well after we
> already
> > introduced some
> > logical table fields such as computed column, watermarks.
> >
> > Another reason is the exposure of registerTableSource in Table Env just
> > make the whole
> > SQL protocol opposite. TableSource should be used as a reader of table,
> it
> > should rely on
> > other metadata information held by framework, which eventually comes from
> > DDL or
> > ConnectDescriptor. But if we register a TableSource to Table Env, we have
> > no choice but
> > have to rely on TableSource::getTableSchema. It will make the design
> > obscure, sometimes
> > TableSource should trust the information comes from framework, but
> > sometimes it should
> > also generate its own schema information.
> >
> > Furthermore, if the authority about schema information is not clear, it
> > will make things much
> > more complicated if we want to improve the table api usability such as
> > introducing automatic
> > schema inference in the near future.
> >
> > Since this is an API break change, I've also included user mailing list
> to
> > gather more feedbacks.
> >
> > Best,
> > Kurt
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
>

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Posted by Zhenghua Gao <do...@gmail.com>.
+1 to remove these methods.

One concern about invocations of TableSource::getTableSchema:
By removing such methods, we can stop calling TableSource::getTableSchema
in some place(such
as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
ConnectorCatalogTable, TableSourceQueryOperation).

But in other place we need field types and names of the table source(such
as
BatchExecLookupJoinRule/StreamExecLookupJoinRule,
PushProjectIntoTableSourceScanRule,
CommonLookupJoin).  So how should we deal with this?

*Best Regards,*
*Zhenghua Gao*


On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <yk...@gmail.com> wrote:

> Hi all,
>
> I'd like to bring up a discussion about removing registration of
> TableSource and
> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> affected
> method would be:
>
> TableEnvironment::registerTableSource
> TableEnvironment::fromTableSource
> TableEnvironment::registerTableSink
> ConnectTableDescriptor::registerTableSource
> ConnectTableDescriptor::registerTableSink
> ConnectTableDescriptor::registerTableSourceAndSink
>
> (Most of them are already deprecated, except for
> TableEnvironment::fromTableSource,
> which was intended to deprecate but missed by accident).
>
> FLIP-64 [1] already explained why we want to deprecate TableSource &
> TableSink from
> user's interface. In a short word, these interfaces should only read &
> write the physical
> representation of the table, and they are not fitting well after we already
> introduced some
> logical table fields such as computed column, watermarks.
>
> Another reason is the exposure of registerTableSource in Table Env just
> make the whole
> SQL protocol opposite. TableSource should be used as a reader of table, it
> should rely on
> other metadata information held by framework, which eventually comes from
> DDL or
> ConnectDescriptor. But if we register a TableSource to Table Env, we have
> no choice but
> have to rely on TableSource::getTableSchema. It will make the design
> obscure, sometimes
> TableSource should trust the information comes from framework, but
> sometimes it should
> also generate its own schema information.
>
> Furthermore, if the authority about schema information is not clear, it
> will make things much
> more complicated if we want to improve the table api usability such as
> introducing automatic
> schema inference in the near future.
>
> Since this is an API break change, I've also included user mailing list to
> gather more feedbacks.
>
> Best,
> Kurt
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>