You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Shamit <ja...@gmail.com> on 2021/02/09 21:46:51 UTC
Join two streams from Kafka
Hello Flink Users,
I am newbie and have question on join of two streams (stream1 and stream2 )
from Kafka topic based on some key.
In my use case I need to join with stream2 data which might be year old and
more.
Now if on stream1 the data gets arrived today and I need to join with
stream2 based on some key Please let me know how efficiently I can do.
stream2 might have lots of records(in millions).
Please help.
Regards,
Shamit Jain
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Join two streams from Kafka
Posted by Arvid Heise <ar...@apache.org>.
Hi Shamit,
unless you have some temporal relationship between the records to be
joined, you have to use a regular join over stream 1 and stream 2.
Since you cannot define any window, all data will be held in Flink's state,
which is not an issue for a few millions but probably means you have to use
rocksdb statebackend [1] or else you may run out of main memory.
I recommend using Flink SQL or Table API, which will also prune all
unnecessary columns from your data. If you want to use DataStream API
instead, I recommend to drop all unrelated columns prior to the join.
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend
On Tue, Feb 9, 2021 at 10:47 PM Shamit <ja...@gmail.com> wrote:
> Hello Flink Users,
>
> I am newbie and have question on join of two streams (stream1 and stream2 )
> from Kafka topic based on some key.
>
> In my use case I need to join with stream2 data which might be year old and
> more.
>
> Now if on stream1 the data gets arrived today and I need to join with
> stream2 based on some key Please let me know how efficiently I can do.
>
> stream2 might have lots of records(in millions).
>
> Please help.
>
> Regards,
> Shamit Jain
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>