You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Shamit <ja...@gmail.com> on 2021/02/09 21:46:51 UTC

Join two streams from Kafka

Hello Flink Users,

I am newbie and have question on join of two streams (stream1 and stream2 )
from Kafka topic based on some key.

In my use case I need to join with stream2 data which might be year old and
more. 

Now if on stream1 the data gets arrived today and I need to join with
stream2 based on some key Please let me know how efficiently I can do. 

stream2 might have lots of records(in millions).

Please help.

Regards,
Shamit Jain



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Join two streams from Kafka

Posted by Arvid Heise <ar...@apache.org>.
Hi Shamit,

unless you have some temporal relationship between the records to be
joined, you have to use a regular join over stream 1 and stream 2.
Since you cannot define any window, all data will be held in Flink's state,
which is not an issue for a few millions but probably means you have to use
rocksdb statebackend [1] or else you may run out of main memory.

I recommend using Flink SQL or Table API, which will also prune all
unnecessary columns from your data. If you want to use DataStream API
instead, I recommend to drop all unrelated columns prior to the join.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend

On Tue, Feb 9, 2021 at 10:47 PM Shamit <ja...@gmail.com> wrote:

> Hello Flink Users,
>
> I am newbie and have question on join of two streams (stream1 and stream2 )
> from Kafka topic based on some key.
>
> In my use case I need to join with stream2 data which might be year old and
> more.
>
> Now if on stream1 the data gets arrived today and I need to join with
> stream2 based on some key Please let me know how efficiently I can do.
>
> stream2 might have lots of records(in millions).
>
> Please help.
>
> Regards,
> Shamit Jain
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>