You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Timo Walther <tw...@apache.org> on 2020/04/03 08:59:51 UTC

[RESULT][VOTE] FLIP-95: New TableSource and TableSink interfaces

Hi all,

The voting time for FLIP-95 has passed. I'm closing the vote now.

There were 6 +1 votes, 4 of which are binding:
- Jark (binding)
- Benchao (non-binding)
- Kurt (binding)
- Jingsong (binding)
- Leonard (non-binding)
- Dawid (binding)

There were no disapproving votes.

Thus, FLIP-95 has been accepted.

Thanks everyone for joining the discussion and giving feedback!

Best,
Timo


On 03.04.20 10:52, Timo Walther wrote:
> Hi Dawid,
> 
> thanks for your response. Let's postpone the discussion around a single 
> interface into a later issue. It seems that we reached a consensus on 
> the remaining interfaces. Given the size of the FLIP, I think it is fine 
> to conclude the voting now.
> 
> Thanks,
> Timo
> 
> On 03.04.20 10:42, Dawid Wysakowicz wrote:
>> Hi all,
>>
>> Just to make it clear. I don't want to block the whole effort. I'm big
>> +1 on the whole document and -0 on using the TableSchema for a
>> projection pushdown.
>>
>> My personal opinion was that TableSchema is misleading for Projection
>> pushdown and I still think that way. That's why I wanted to bring it up
>> to see if I am the only one. If that's the case, let's just proceed with
>> the TableSchema.
>>
>> Best,
>>
>> Dawid
>>
>> On 02/04/2020 17:19, Jark Wu wrote:
>>> Hi Timo,
>>>
>>> I don't think source should work with `CatalogTableSchema`. So far, a 
>>> table
>>> source doesn't need to know the logic information of computed column and
>>> watermark.
>>> IMO, we should provide a method to convert from `CatalogTableSchema` 
>>> into
>>> `TableSchema` without computed columns in source factory,
>>> and a source should just hold the `TableSchema`.
>>>
>>> I agree doing the intersection/diff logic is trivial, but maybe we can
>>> provide utilities to do that? So that we can keep the interface clean.
>>>
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> On Thu, 2 Apr 2020 at 20:17, Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi Jark,
>>>>
>>>> if catalogs use `CatalogTableSchema` in the future. The source would
>>>> internally also work with `CatalogTableSchema`. I'm fine with cleaning
>>>> up the `TableSchema` class but should a source deal with two different
>>>> schema classes then?
>>>>
>>>> Another problem that I see is that connectors usually need to perform
>>>> some index arithmetics. Dealing with TableSchema and additionally 
>>>> within
>>>> a field with DataType might be a bit inconvenient. A dedicated class
>>>> with utilities might be helpful such that not every source needs to
>>>> implement the same intersection/diff logic again.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 02.04.20 14:06, Jark Wu wrote:
>>>>> Hi Dawid,
>>>>>
>>>>>> How to express projections with TableSchema?
>>>>> The TableSource holds the original TableSchema (i.e. from DDL) and the
>>>>> pushed TableSchema represents the schema after projection.
>>>>> Thus the table source can compare them to figure out changed field 
>>>>> orders
>>>>> or not matched types.
>>>>> For most sources who maps physical storage by field names (e.g. jdbc,
>>>>> hbase, json) they can just simply apply the pushed TableSchema.
>>>>> But sources who maps by field indexes (e.g. csv), they need to 
>>>>> figure out
>>>>> the projected indexes by comparing the original and projected schema.
>>>>> For example, the original schema is [a: String, b: Int, c: Timestamp],
>>>> and
>>>>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. 
>>>>> So the
>>>>> source can figure out index=1 is pruned.
>>>>>
>>>>>> How do we express projection of a nested field with TableSchema?
>>>>> This is the same to the above one. For example, the original schema is
>>>> [rk:
>>>>> String, f1 Row<q1 Int, q2 Double>].
>>>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
>>>>> Double>].
>>>>>
>>>>>> TableSchema might be used at too many different places for different
>>>>> responsibilities.
>>>>> Agree. We have recognized that a structure and builder for pure table
>>>>> schema is required in many places. But we mixed many concepts of 
>>>>> catalog
>>>>> table schema in TableSchema.
>>>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new
>>>>> `CatalogTableSchema` to represent the schema part of a DDL,
>>>>> and remove all the watermark, computed column information from
>>>> TableSchema?
>>>>> Then `TableSchema` can continue to serve as a pure table schema and it
>>>>> stays in a good package.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <tw...@apache.org> wrote:
>>>>>
>>>>>> Hi Dawid,
>>>>>>
>>>>>> thanks for your feedback. I agree with your concerns. I also observed
>>>>>> that TableSchema might be used at too many different places for
>>>>>> different responsibilities.
>>>>>>
>>>>>> How about we introduce a helper class for 
>>>>>> `SupportsProjectionPushDown`
>>>>>> and also `LookupTableSource#Context#getKeys()` to represent nested
>>>>>> structure of names. Data types, constraints, or computed columns 
>>>>>> are not
>>>>>> necessary at those locations.
>>>>>>
>>>>>> We can also add utility methods for connectors to this helper class
>>>>>> there to quickly figuring out differences between the original table
>>>>>> schema and the new one.
>>>>>>
>>>>>> SelectedFields {
>>>>>>
>>>>>>           private LogicalType orignalRowType; // set by the planner
>>>>>>
>>>>>>           private int[][] indices;
>>>>>>
>>>>>>           getNames(int... at): String[]
>>>>>>
>>>>>>           getNames(String... at): String[]
>>>>>>
>>>>>>           getIndices(int... at): int[]
>>>>>>
>>>>>>           getNames(String... at): String[]
>>>>>>
>>>>>>           toTableSchema(): TableSchema
>>>>>> }
>>>>>>
>>>>>> What do others think?
>>>>>>
>>>>>> Thanks,
>>>>>> Timo
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 02.04.20 12:28, Dawid Wysakowicz wrote:
>>>>>>> Generally +1
>>>>>>>
>>>>>>> One slight concern I have is about the 
>>>>>>> |SupportsProjectionPushDown.|I
>>>>>>> don't necessarily understand how can we express projections with
>>>>>>> TableSchema. It's unclear for me what happens when a type of a field
>>>>>>> changes, fields are in a different order, when types do not 
>>>>>>> match. How
>>>>>>> do we express projection of a nested field with TableSchema?
>>>>>>>
>>>>>>> I don't think this changes the core design presented in the FLIP,
>>>>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my
>>>>>>> concerns, so that maybe we can adjust the passed around structures
>>>>>> slightly.
>>>>>>> Best,
>>>>>>>
>>>>>>> Dawid
>>>>>>> ||
>>>>>>>
>>>>>>> On 30/03/2020 14:42, Leonard Xu wrote:
>>>>>>>> +1(non-binding)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Leonard Xu
>>>>>>>>
>>>>>>>>> 在 2020年3月30日,16:43,Jingsong Li<ji...@gmail.com>  
>>>>>>>>> 写道:
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jingsong Lee
>>>>>>>>>
>>>>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<ku...@apache.org>  
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kurt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<li...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>
>>>>>>>>>>> Jark Wu<im...@gmail.com>  于2020年3月30日周一 下午3:57写道:
>>>>>>>>>>>
>>>>>>>>>>>> +1 from my side.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks Timo for driving this.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jark
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<tw...@apache.org>
>>>>>> wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is
>>>> discussed
>>>>>>>>>> and
>>>>>>>>>>>>> reached a consensus in the discussion thread [2].
>>>>>>>>>>>>>
>>>>>>>>>>>>> The vote will be open until April 2nd (72h), unless there 
>>>>>>>>>>>>> is an
>>>>>>>>>>>>> objection or not enough votes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces 
>>>>
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E 
>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>>
>>>>>>>>>>> Benchao Li
>>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking
>>>>>> University
>>>>>>>>>>> Tel:+86-15650713730
>>>>>>>>>>> Email:libenchao@gmail.com;libenchao@pku.edu.cn
>>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Best, Jingsong Lee
>>>>>>
>>>>
>>