You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Chesnay Schepler <ch...@apache.org> on 2022/01/12 09:03:50 UTC
Re: what is efficient way to write Left join in flink

Your best bet is to try out both approaches with some representative data.

On 12/01/2022 08:11, Ronak Beejawat (rbeejawa) wrote:
> Hi Team,
>
> Can you please help me with the below query, I wanted to know which approach will be better and efficient for multiple left join within one min tumbling window concept (Datastream Vs SQL API wrt. performance and memory management)
>
> Use case :
> 1. We have topic one (testtopic1) which will get half a million data every minute.
> 2. We have topic two (testtopic2) which will get 23 data points as static or reference.
> 3. We have topic two (testtopic3) which will get one million data every minute.
>
>
> So we are doing join as (select * testtopic1  left join  testtopic2 left join testtopic3  group by tumble window of 1 min duration)
>
> So the question is which API will be more efficient and faster for such use case (datastream API or sql API) for intensive joining logic?
>
> Thanks
> Ronak Beejawat
>
>
>
> From: Ronak Beejawat (rbeejawa)
> Sent: Tuesday, January 11, 2022 6:12 PM
> To: 'dev@flink.apache.org' <de...@flink.apache.org>; 'community@flink.apache.org' <co...@flink.apache.org>; 'user@flink.apache.org' <us...@flink.apache.org>
> Cc: 'Hang Ruan' <ru...@gmail.com>; Shrinath Shenoy K (sshenoyk) <ss...@cisco.com>
> Subject: RE: what is efficient way to write Left join in flink
>
> Can please someone help / reply on below Question ?
>
> From: Ronak Beejawat (rbeejawa)
> Sent: Monday, January 10, 2022 7:40 PM
> To: dev@flink.apache.org<ma...@flink.apache.org>; community@flink.apache.org<ma...@flink.apache.org>; user@flink.apache.org<ma...@flink.apache.org>
> Cc: Hang Ruan <ru...@gmail.com>>; Shrinath Shenoy K (sshenoyk) <ss...@cisco.com>>
> Subject: what is efficient way to write Left join in flink
>
> Hi Team,
>
> We want a clarification on one real time processing scenario for below mentioned use case.
>
> Use case :
> 1. We have topic one (testtopic1) which will get half a million data every minute.
> 2. We have topic two (testtopic2) which will get one million data every minute.
>
> So we are doing join as testtopic1  left join  testtopic2 which has a correlated data 1:2
>
> So the question is which API will be more efficient and faster for such use case (datastream API or sql API) for intensive joining logic?
>
> Thanks
> Ronak Beejawat