You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Averell <lv...@gmail.com> on 2019/05/03 13:22:48 UTC

CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

Hi,

Back to my story about enriching two different streams with data from one
(slow stream) using Flink's low lever functions like CoProcessFunction
(mentioned in this thread:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoFlatMapFunction-with-more-than-two-input-streams-td22320.html)

Now I see that Flink Table also support doing something similar with
Temporal Table [1]. With this, I would only need to convert my enrichment
stream to be a Temporal table, and the two other streams into two unbounded
tables.

*/In term of performance and resource usage/*, would this way of
implementation (using Flink Table) be better than the option no.1 mentioned
in my other thread: creating two different (though similar)
CoProcessFunction's, maintaining two state tables (for the enrichment
stream, one in each function)?

Thanks and best regards,
Averell 

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/temporal_tables.html



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,

Sorry for late response, somehow I wasn’t notified about your e-mail.

> 
> So you meant implementation in DataStreamAPI with cutting corners would,
> generally, shorter than Table Join. I thought that using Tables would be
> more intuitive and shorter, hence my initial question :)

It depends what you are trying to do. Take a look at the class TemporalRowtimeJoin and/or class TemporalProcessTimeJoin classes in Flink to judge the complexity of writing your version of it vs using Table API just for that.

> 
> Regarding all the limitations with Table API that you mentioned, is there
> any summary page in Flink docs for that?

I don’t recall such summary :( Maybe someone else knows?

Piotrek

> On 4 May 2019, at 01:38, Averell <lv...@gmail.com> wrote:
> 
> Thank you Piotr for the thorough answer.
> 
> So you meant implementation in DataStreamAPI with cutting corners would,
> generally, shorter than Table Join. I thought that using Tables would be
> more intuitive and shorter, hence my initial question :)
> 
> Regarding all the limitations with Table API that you mentioned, is there
> any summary page in Flink docs for that?
> 
> Thanks and regards,
> Averell
> 
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

Posted by Averell <lv...@gmail.com>.
Thank you Piotr for the thorough answer.

So you meant implementation in DataStreamAPI with cutting corners would,
generally, shorter than Table Join. I thought that using Tables would be
more intuitive and shorter, hence my initial question :)

Regarding all the limitations with Table API that you mentioned, is there
any summary page in Flink docs for that?

Thanks and regards,
Averell




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: CoProcessFunction vs Temporal Table to enrich multiple streams with a slow stream

Posted by Piotr Nowojski <pi...@ververica.com>.
Hi Averell,

I will be referring to your original two options: 1 (duplicating stream_C) and 2 (multiplexing stream_A and stream_B).

Both of them could be expressed using Temporal Table Join. You could multiplex stream_A and stream_B in Table API, temporal table join them with stream_C and then de multiplex them in DataStream API.

Resource usage/consumption would be more or less the same, but it depends to what you are comparing it. Temporal Table Joins in Table API when using processing time have little no overhead. When using event time, there is much more complicated logic how handle out of order data, when to emit the data (on watermark as the Table API’s implementation? Asap?). I could imagine different implementations cutting some corners here and there, but if you would like to implement the same set of features that Temporal Table Join provides in DataStream API, you would end up with roughly the same code (if not, if you end up with something better please contribute it! :) ). Please check the implementation details of org.apache.flink.table.runtime.join.TemporalRowtimeJoin and org.apache.flink.table.runtime.join.TemporalProcessTimeJoin.

Having said that, you have to answer yourself whether it’s better to implement the Temporal Join on your own in DataStream API or wether to go through the hassle of converting your DataStream to Tables and back again. I would guess no - if you are already working in DataStream API environment, using Table API will have some limitations, like possible data conversion or the fact that you are loosing the control over the state of your operator - Table API doesn’t provide support for keeping the state of the job/query during upgrading Flink versions or if you would like to modify your Table API job graph/query. While with DataStream API both of those things are supported.

Piotrek

> On 3 May 2019, at 15:22, Averell <lv...@gmail.com> wrote:
> 
> Hi,
> 
> Back to my story about enriching two different streams with data from one
> (slow stream) using Flink's low lever functions like CoProcessFunction
> (mentioned in this thread:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoFlatMapFunction-with-more-than-two-input-streams-td22320.html)
> 
> Now I see that Flink Table also support doing something similar with
> Temporal Table [1]. With this, I would only need to convert my enrichment
> stream to be a Temporal table, and the two other streams into two unbounded
> tables.
> 
> */In term of performance and resource usage/*, would this way of
> implementation (using Flink Table) be better than the option no.1 mentioned
> in my other thread: creating two different (though similar)
> CoProcessFunction's, maintaining two state tables (for the enrichment
> stream, one in each function)?
> 
> Thanks and best regards,
> Averell 
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/temporal_tables.html
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/