You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Dongwon Kim <ea...@gmail.com> on 2020/08/11 11:12:16 UTC

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

 Big +1 for this FLIP.

Recently I'm working on some Kafka topics that have timestamps as
metadata, not in the message body. I want to declare a table from the
topics with DDL but "rowtime_column_name" in <watermark_definition> seems
to accept only existing columns.

> <watermark_definition>:
>   WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
>
>
I raised an issue in user@ list but committers advise to use alternative
approaches that call for detailed knowledge of Flink like custom decoding
format or conversion between DataStream API and TableEnvironment. It is
definitely against the main advantage of Flink SQL, simplicity and ease of
use. This FLIP must be implemented IMHO in order for users to derive tables
freely from any Kafka topic without having to involve DataStream API.

Best,

Dongwon

On 2020/03/01 14:30:31, Dawid Wysakowicz <d....@apache.org> wrote:
> Hi,>
>
> I would like to propose an improvement that would enable reading table>
> columns from different parts of source records. Besides the main payload>
> majority (if not all of the sources) expose additional information. It>
> can be simply a read-only metadata such as offset, ingestion time or a>
> read and write  parts of the record that contain data but additionally>
> serve different purposes (partitioning, compaction etc.), e.g. key or>
> timestamp in Kafka.>
>
> We should make it possible to read and write data from all of those>
> locations. In this proposal I discuss reading partitioning data, for>
> completeness this proposal discusses also the partitioning when writing>
> data out.>
>
> I am looking forward to your comments.>
>
> You can access the FLIP here:>
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode>

>
> Best,>
>
> Dawid>
>
>
>

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

Posted by Leonard Xu <xb...@gmail.com>.

+1 for FLIP-107

Reading different parts of source code should be the key feature for Flink SQL, like metadata in CDC data, key and  timestamp in Kafka records.
The scope of FLIP-107 is too big to finish in one version IMO, maybe we can start part work in 1.12. 

Best
Leonard

> 在 2020年8月11日，19:51，Kurt Young <yk...@gmail.com> 写道：
> 
> The content length of FLIP-107 is relatively short but the scope and
> implications it will cause is actually very big.
> From what I can tell now, I think there is a good chance that we can
> deliver part of this FLIP in 1.12, e.g.
> accessing the metadata field just like you mentioned.
> 
> Best,
> Kurt
> 
> 
> On Tue, Aug 11, 2020 at 7:18 PM Dongwon Kim <ea...@gmail.com> wrote:
> 
>> Big +1 for this FLIP.
>> 
>> Recently I'm working on some Kafka topics that have timestamps as
>> metadata, not in the message body. I want to declare a table from the
>> topics with DDL but "rowtime_column_name" in <watermark_definition> seems
>> to accept only existing columns.
>> 
>>> <watermark_definition>:
>>>  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
>>> 
>>> 
>> I raised an issue in user@ list but committers advise to use alternative
>> approaches that call for detailed knowledge of Flink like custom decoding
>> format or conversion between DataStream API and TableEnvironment. It is
>> definitely against the main advantage of Flink SQL, simplicity and ease of
>> use. This FLIP must be implemented IMHO in order for users to derive tables
>> freely from any Kafka topic without having to involve DataStream API.
>> 
>> Best,
>> 
>> Dongwon
>> 
>> On 2020/03/01 14:30:31, Dawid Wysakowicz <d....@apache.org> wrote:
>>> Hi,>
>>> 
>>> I would like to propose an improvement that would enable reading table>
>>> columns from different parts of source records. Besides the main payload>
>>> majority (if not all of the sources) expose additional information. It>
>>> can be simply a read-only metadata such as offset, ingestion time or a>
>>> read and write  parts of the record that contain data but additionally>
>>> serve different purposes (partitioning, compaction etc.), e.g. key or>
>>> timestamp in Kafka.>
>>> 
>>> We should make it possible to read and write data from all of those>
>>> locations. In this proposal I discuss reading partitioning data, for>
>>> completeness this proposal discusses also the partitioning when writing>
>>> data out.>
>>> 
>>> I am looking forward to your comments.>
>>> 
>>> You can access the FLIP here:>
>>> 
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode
>>> 
>> 
>>> 
>>> Best,>
>>> 
>>> Dawid>
>>> 
>>> 
>>> 
>>

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

Posted by Kurt Young <yk...@gmail.com>.

The content length of FLIP-107 is relatively short but the scope and
implications it will cause is actually very big.
From what I can tell now, I think there is a good chance that we can
deliver part of this FLIP in 1.12, e.g.
accessing the metadata field just like you mentioned.

Best,
Kurt


On Tue, Aug 11, 2020 at 7:18 PM Dongwon Kim <ea...@gmail.com> wrote:

>  Big +1 for this FLIP.
>
> Recently I'm working on some Kafka topics that have timestamps as
> metadata, not in the message body. I want to declare a table from the
> topics with DDL but "rowtime_column_name" in <watermark_definition> seems
> to accept only existing columns.
>
> > <watermark_definition>:
> >   WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
> >
> >
> I raised an issue in user@ list but committers advise to use alternative
> approaches that call for detailed knowledge of Flink like custom decoding
> format or conversion between DataStream API and TableEnvironment. It is
> definitely against the main advantage of Flink SQL, simplicity and ease of
> use. This FLIP must be implemented IMHO in order for users to derive tables
> freely from any Kafka topic without having to involve DataStream API.
>
> Best,
>
> Dongwon
>
> On 2020/03/01 14:30:31, Dawid Wysakowicz <d....@apache.org> wrote:
> > Hi,>
> >
> > I would like to propose an improvement that would enable reading table>
> > columns from different parts of source records. Besides the main payload>
> > majority (if not all of the sources) expose additional information. It>
> > can be simply a read-only metadata such as offset, ingestion time or a>
> > read and write  parts of the record that contain data but additionally>
> > serve different purposes (partitioning, compaction etc.), e.g. key or>
> > timestamp in Kafka.>
> >
> > We should make it possible to read and write data from all of those>
> > locations. In this proposal I discuss reading partitioning data, for>
> > completeness this proposal discusses also the partitioning when writing>
> > data out.>
> >
> > I am looking forward to your comments.>
> >
> > You can access the FLIP here:>
> >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode
> >
>
> >
> > Best,>
> >
> > Dawid>
> >
> >
> >
>