You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Yan, , FDS, , ­ <yz...@coupang.com> on 2017/10/09 21:11:40 UTC

DataStream joining without window

It seems like flink only supports DataStream joining within same time
window. Why is it restricted in this way?

I think I can implement a TwoInputStreamOperator to join two DataStreams
without considering the window.  And inside the operator, create two state
to cache records of two streams and join the streams within
methods processElement1/processElement2. Should I go head with this
approach? Is there any performance consideration here? If the concern is
that the cache might take a lot of memory, we can introduce some cache
policy and reduce the size. Or can we use rocksDB state?

Please advise.

Best
Yan

Re: DataStream joining without window

Posted by Yan, , FDS, , ­ <yz...@coupang.com>.
Thank you for the reply. It's very helpful.

Best
Yan

On Tue, Oct 10, 2017 at 7:57 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
>
> Yes, using a TwoInputStreamOperator (or even better, a CoProcessFunction,
> because TwoInputStreamOperator is a low-level interface that might change
> in the future) is the recommended way for implementing a stream-stream
> join, currently.
>
> As you already guessed, you need a policy for cleanup up the state that
> you hold. You can do this using the timer features of CoProcessFunction.
>
> Also, if you keep your buffered elements using the Flink state interfaces
> you can switch the state backend to the RocksDB backend and if you have
> concerns about the state growing too big.
>
> Best,
> Aljoscha
>
> > On 9. Oct 2017, at 23:11, Yan Zhou [FDS Science] ­ <yz...@coupang.com>
> wrote:
> >
> > It seems like flink only supports DataStream joining within same time
> window. Why is it restricted in this way?
> >
> > I think I can implement a TwoInputStreamOperator to join two DataStreams
> without considering the window.  And inside the operator, create two state
> to cache records of two streams and join the streams within methods
> processElement1/processElement2. Should I go head with this approach? Is
> there any performance consideration here? If the concern is that the cache
> might take a lot of memory, we can introduce some cache policy and reduce
> the size. Or can we use rocksDB state?
> >
> > Please advise.
> >
> > Best
> > Yan
> >
>
>

Re: DataStream joining without window

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,

Yes, using a TwoInputStreamOperator (or even better, a CoProcessFunction, because TwoInputStreamOperator is a low-level interface that might change in the future) is the recommended way for implementing a stream-stream join, currently.

As you already guessed, you need a policy for cleanup up the state that you hold. You can do this using the timer features of CoProcessFunction.

Also, if you keep your buffered elements using the Flink state interfaces you can switch the state backend to the RocksDB backend and if you have concerns about the state growing too big.

Best,
Aljoscha

> On 9. Oct 2017, at 23:11, Yan Zhou [FDS Science] ­ <yz...@coupang.com> wrote:
> 
> It seems like flink only supports DataStream joining within same time window. Why is it restricted in this way? 
> 
> I think I can implement a TwoInputStreamOperator to join two DataStreams without considering the window.  And inside the operator, create two state to cache records of two streams and join the streams within methods processElement1/processElement2. Should I go head with this approach? Is there any performance consideration here? If the concern is that the cache might take a lot of memory, we can introduce some cache policy and reduce the size. Or can we use rocksDB state?
> 
> Please advise.
> 
> Best
> Yan
>