You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Peter Zende <pe...@gmail.com> on 2018/05/09 15:49:35 UTC

MapWithState for two keyed stream

Hi all,

Is it possible to define two DataStream sources - one which reads from
Kafka, the other reads from HDFS -  and apply mapWithState with
CoFlatMapFunction? The idea would be to read historical data from HDFS
along with the live stream from Kafka and based on some business  write the
output to the sink in correct update order?

Or is it easier to just union those two streams? In mapWithState we should
tell from which stream the record originates from to be able to correctly
build up the state store.

Many thanks.
Peter

Re: MapWithState for two keyed stream

Posted by TechnoMage <ml...@technomage.com>.
CoRichFlatMap or union will work.  If you need to know which is historical the flatmap will be better as you can tell which stream it cam from.  But, be careful about reading historical data and trying to process it all before processing the new data.  That can lead to buffering a lot of incoming data while processing the historical data.

Michael

> On May 9, 2018, at 9:49 AM, Peter Zende <pe...@gmail.com> wrote:
> 
> Hi all,
> 
> Is it possible to define two DataStream sources - one which reads from Kafka, the other reads from HDFS -  and apply mapWithState with CoFlatMapFunction? The idea would be to read historical data from HDFS along with the live stream from Kafka and based on some business  write the output to the sink in correct update order?
> 
> Or is it easier to just union those two streams? In mapWithState we should tell from which stream the record originates from to be able to correctly build up the state store.
> 
> Many thanks.
> Peter