You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by kant kodali <ka...@gmail.com> on 2018/03/06 16:45:56 UTC

Does Flink support stream-stream outer joins in the latest version?

Hi All,

Does Flink support stream-stream outer joins in the latest version?

Thanks!

Re: Does Flink support stream-stream outer joins in the latest version?

Posted by Xingcan Cui <xi...@gmail.com>.
Hi Kant,

the non windowed stream-stream join is not equivalent to the full-history join, though they get the same SQL form. The retention times for records must be set to leverage the storage consumption and completeness of the results.

Best,
Xingcan

> On 7 Mar 2018, at 8:02 PM, kant kodali <ka...@gmail.com> wrote:
> 
> Hi Cheng,
> 
> The docs here <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins> states full outer joins are only available for batch (I am not sure if I am reading that correctly). I am trying to understand how two unbounded streams can be joined like a batch? If we have to do batch join then it must be bounded right? If so, how do we bound? I can think Time Window is one way to bound but other than that if I execute the below join query on the unbounded stream I am not even sure how that works? A row from one table can join with a row from another table and that row can come anytime in future right if it is unbounded. so I am sorry I am failing to understand.
> 
> 
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id <http://o.id/> = s.orderId
> 
> Thanks!
> 
> On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng <chenghequn@gmail.com <ma...@gmail.com>> wrote:
> Hi kant,
> 
> It seems that you mean the Time-windowed Join. The Time-windowed Joins are supported now. You can check more details with the docs given by Xingcan.
> As for the non-window join, it is used to join two unbounded stream and the semantic is very like batch join.
> 
> Time-windowed Join:
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id <http://o.id/> = s.orderId AND
>       o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
>  
> Non-windowed Join:
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id <http://o.id/> = s.orderId
> 
> On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <kanth909@gmail.com <ma...@gmail.com>> wrote:
> Hi! 
> 
> Thanks for all this. and yes I was indeed talking about SQL/Table API so I will keep track of these tickets! BTW, What is non-windowed Join? I thought stream-stream-joins by default is a stateful operation so it has to be within some time window right? Also does the output of stream-stream joins emit every time so we can see the state of the join at any given time or only when the watermark elapses and join result fully materializes? 
> 
> On a side note, Full outer join seems to be the most useful for my use case. so the moment its available in master I can start playing and testing it!
> 
> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <chenghequn@gmail.com <ma...@gmail.com>> wrote:
> Hi Kant,
> 
> The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. 
> 
> Best, Hequn
> 
> [1] https://issues.apache.org/jira/browse/FLINK-5878 <https://issues.apache.org/jira/browse/FLINK-5878>
> 
> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <xingcanc@gmail.com <ma...@gmail.com>> wrote:
> Hi Kant,
> 
> I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.
> 
> There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.
> 
> Hope that helps.
> 
> Best,
> Xingcan
> 
> [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins>
> [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins>
> 
> 
>> On 7 Mar 2018, at 12:45 AM, kant kodali <kanth909@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi All,
>> 
>> Does Flink support stream-stream outer joins in the latest version?
>> 
>> Thanks!
> 
> 
> 
> 
> 


Re: Does Flink support stream-stream outer joins in the latest version?

Posted by Hequn Cheng <ch...@gmail.com>.
Hi kant,

You are right. Batch joins require the inputs are bounded. To join two
unbounded streams without window, all data will be stored in join's states,
so the late right row will join the previous left row when it is input.
As for state retention time, if the input tables of join are both keyed
table and key number of the keyed tables are limited, then you don't have
to set state retention time, otherwise it is suggested to set the state
retention time.

Best, Hequn

On Wed, Mar 7, 2018 at 8:02 PM, kant kodali <ka...@gmail.com> wrote:

> Hi Cheng,
>
> The docs here
> <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins> states
> full outer joins are only available for batch (I am not sure if I am
> reading that correctly). I am trying to understand how two unbounded
> streams can be joined like a batch? If we have to do batch join then it *must
> be* bounded right? If so, how do we bound? I can think Time Window is one
> way to bound but other than that if I execute the below join query on the
> unbounded stream I am not even sure how that works? A row from one table
> can join with a row from another table and that row can come anytime in
> future right if it is unbounded. so I am sorry I am failing to understand.
>
>
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id = s.orderId
>
> Thanks!
>
> On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng <ch...@gmail.com> wrote:
>
>> Hi kant,
>>
>> It seems that you mean the Time-windowed Join. The Time-windowed Joins are supported
>> now. You can check more details with the docs given by Xingcan.
>> As for the non-window join, it is used to join two unbounded stream and
>> the semantic is very like batch join.
>>
>> Time-windowed Join:
>>
>>> SELECT *
>>> FROM Orders o, Shipments s
>>> WHERE o.id = s.orderId AND
>>>       o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
>>
>>
>> Non-windowed Join:
>>
>>> SELECT *
>>> FROM Orders o, Shipments s
>>> WHERE o.id = s.orderId
>>
>>
>> On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> Thanks for all this. and yes I was indeed talking about SQL/Table API so
>>> I will keep track of these tickets! BTW, What is non-windowed Join? I
>>> thought stream-stream-joins by default is a stateful operation so it has to
>>> be within some time window right? Also does the output of stream-stream
>>> joins emit every time so we can see the state of the join at any given time
>>> or only when the watermark elapses and join result fully materializes?
>>>
>>> On a side note, Full outer join seems to be the most useful for my use
>>> case. so the moment its available in master I can start playing and testing
>>> it!
>>>
>>> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <ch...@gmail.com>
>>> wrote:
>>>
>>>> Hi Kant,
>>>>
>>>> The stream-stream outer joins are work in progress
>>>> now(left/right/full), and will probably be ready before the end of this
>>>> month. You can check the progress from[1].
>>>>
>>>> Best, Hequn
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-5878
>>>>
>>>> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <xi...@gmail.com> wrote:
>>>>
>>>>> Hi Kant,
>>>>>
>>>>> I suppose you refer to the stream join in SQL/Table API since the
>>>>> outer join for windowed-streams can always be achieved with the
>>>>> `JoinFunction` in DataStream API.
>>>>>
>>>>> There are two kinds of stream joins, namely, the time-windowed join
>>>>> and the non-windowed join in Flink SQL/Table API [1, 2]. The
>>>>> time-windowed outer join has been supported since version 1.5 and the
>>>>> non-windowed outer join is still work in progress.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> Best,
>>>>> Xingcan
>>>>>
>>>>> [1] https://ci.apache.org/projects/flink/flink-docs-master/d
>>>>> ev/table/tableApi.html#joins
>>>>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d
>>>>> ev/table/sql.html#joins
>>>>>
>>>>>
>>>>> On 7 Mar 2018, at 12:45 AM, kant kodali <ka...@gmail.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>> Does Flink support stream-stream outer joins in the latest version?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Does Flink support stream-stream outer joins in the latest version?

Posted by kant kodali <ka...@gmail.com>.
Hi Cheng,

The docs here
<https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins>
states
full outer joins are only available for batch (I am not sure if I am
reading that correctly). I am trying to understand how two unbounded
streams can be joined like a batch? If we have to do batch join then it *must
be* bounded right? If so, how do we bound? I can think Time Window is one
way to bound but other than that if I execute the below join query on the
unbounded stream I am not even sure how that works? A row from one table
can join with a row from another table and that row can come anytime in
future right if it is unbounded. so I am sorry I am failing to understand.


SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

Thanks!

On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng <ch...@gmail.com> wrote:

> Hi kant,
>
> It seems that you mean the Time-windowed Join. The Time-windowed Joins are supported
> now. You can check more details with the docs given by Xingcan.
> As for the non-window join, it is used to join two unbounded stream and
> the semantic is very like batch join.
>
> Time-windowed Join:
>
>> SELECT *
>> FROM Orders o, Shipments s
>> WHERE o.id = s.orderId AND
>>       o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
>
>
> Non-windowed Join:
>
>> SELECT *
>> FROM Orders o, Shipments s
>> WHERE o.id = s.orderId
>
>
> On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <ka...@gmail.com> wrote:
>
>> Hi!
>>
>> Thanks for all this. and yes I was indeed talking about SQL/Table API so
>> I will keep track of these tickets! BTW, What is non-windowed Join? I
>> thought stream-stream-joins by default is a stateful operation so it has to
>> be within some time window right? Also does the output of stream-stream
>> joins emit every time so we can see the state of the join at any given time
>> or only when the watermark elapses and join result fully materializes?
>>
>> On a side note, Full outer join seems to be the most useful for my use
>> case. so the moment its available in master I can start playing and testing
>> it!
>>
>> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <ch...@gmail.com>
>> wrote:
>>
>>> Hi Kant,
>>>
>>> The stream-stream outer joins are work in progress now(left/right/full),
>>> and will probably be ready before the end of this month. You can check the
>>> progress from[1].
>>>
>>> Best, Hequn
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-5878
>>>
>>> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <xi...@gmail.com> wrote:
>>>
>>>> Hi Kant,
>>>>
>>>> I suppose you refer to the stream join in SQL/Table API since the outer
>>>> join for windowed-streams can always be achieved with the `JoinFunction` in
>>>> DataStream API.
>>>>
>>>> There are two kinds of stream joins, namely, the time-windowed join and
>>>> the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed
>>>> outer join has been supported since version 1.5 and the non-windowed outer
>>>> join is still work in progress.
>>>>
>>>> Hope that helps.
>>>>
>>>> Best,
>>>> Xingcan
>>>>
>>>> [1] https://ci.apache.org/projects/flink/flink-docs-master/d
>>>> ev/table/tableApi.html#joins
>>>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d
>>>> ev/table/sql.html#joins
>>>>
>>>>
>>>> On 7 Mar 2018, at 12:45 AM, kant kodali <ka...@gmail.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> Does Flink support stream-stream outer joins in the latest version?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>
>>
>

Re: Does Flink support stream-stream outer joins in the latest version?

Posted by Hequn Cheng <ch...@gmail.com>.
Hi kant,

It seems that you mean the Time-windowed Join. The Time-windowed Joins
are supported
now. You can check more details with the docs given by Xingcan.
As for the non-window join, it is used to join two unbounded stream
and the semantic
is very like batch join.

Time-windowed Join:

> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id = s.orderId AND
>       o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime


Non-windowed Join:

> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id = s.orderId


On Wed, Mar 7, 2018 at 7:02 PM, kant kodali <ka...@gmail.com> wrote:

> Hi!
>
> Thanks for all this. and yes I was indeed talking about SQL/Table API so I
> will keep track of these tickets! BTW, What is non-windowed Join? I
> thought stream-stream-joins by default is a stateful operation so it has to
> be within some time window right? Also does the output of stream-stream
> joins emit every time so we can see the state of the join at any given time
> or only when the watermark elapses and join result fully materializes?
>
> On a side note, Full outer join seems to be the most useful for my use
> case. so the moment its available in master I can start playing and testing
> it!
>
> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <ch...@gmail.com> wrote:
>
>> Hi Kant,
>>
>> The stream-stream outer joins are work in progress now(left/right/full),
>> and will probably be ready before the end of this month. You can check the
>> progress from[1].
>>
>> Best, Hequn
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-5878
>>
>> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <xi...@gmail.com> wrote:
>>
>>> Hi Kant,
>>>
>>> I suppose you refer to the stream join in SQL/Table API since the outer
>>> join for windowed-streams can always be achieved with the `JoinFunction` in
>>> DataStream API.
>>>
>>> There are two kinds of stream joins, namely, the time-windowed join and
>>> the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed
>>> outer join has been supported since version 1.5 and the non-windowed outer
>>> join is still work in progress.
>>>
>>> Hope that helps.
>>>
>>> Best,
>>> Xingcan
>>>
>>> [1] https://ci.apache.org/projects/flink/flink-docs-master/d
>>> ev/table/tableApi.html#joins
>>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d
>>> ev/table/sql.html#joins
>>>
>>>
>>> On 7 Mar 2018, at 12:45 AM, kant kodali <ka...@gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> Does Flink support stream-stream outer joins in the latest version?
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>

Re: Does Flink support stream-stream outer joins in the latest version?

Posted by kant kodali <ka...@gmail.com>.
Hi!

Thanks for all this. and yes I was indeed talking about SQL/Table API so I
will keep track of these tickets! BTW, What is non-windowed Join? I thought
stream-stream-joins by default is a stateful operation so it has to be
within some time window right? Also does the output of stream-stream joins
emit every time so we can see the state of the join at any given time or
only when the watermark elapses and join result fully materializes?

On a side note, Full outer join seems to be the most useful for my use
case. so the moment its available in master I can start playing and testing
it!

On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng <ch...@gmail.com> wrote:

> Hi Kant,
>
> The stream-stream outer joins are work in progress now(left/right/full),
> and will probably be ready before the end of this month. You can check the
> progress from[1].
>
> Best, Hequn
>
> [1] https://issues.apache.org/jira/browse/FLINK-5878
>
> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <xi...@gmail.com> wrote:
>
>> Hi Kant,
>>
>> I suppose you refer to the stream join in SQL/Table API since the outer
>> join for windowed-streams can always be achieved with the `JoinFunction` in
>> DataStream API.
>>
>> There are two kinds of stream joins, namely, the time-windowed join and
>> the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed
>> outer join has been supported since version 1.5 and the non-windowed outer
>> join is still work in progress.
>>
>> Hope that helps.
>>
>> Best,
>> Xingcan
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-master/
>> dev/table/tableApi.html#joins
>> [2] https://ci.apache.org/projects/flink/flink-docs-master/
>> dev/table/sql.html#joins
>>
>>
>> On 7 Mar 2018, at 12:45 AM, kant kodali <ka...@gmail.com> wrote:
>>
>> Hi All,
>>
>> Does Flink support stream-stream outer joins in the latest version?
>>
>> Thanks!
>>
>>
>>
>

Re: Does Flink support stream-stream outer joins in the latest version?

Posted by Hequn Cheng <ch...@gmail.com>.
Hi Kant,

The stream-stream outer joins are work in progress now(left/right/full),
and will probably be ready before the end of this month. You can check the
progress from[1].

Best, Hequn

[1] https://issues.apache.org/jira/browse/FLINK-5878

On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui <xi...@gmail.com> wrote:

> Hi Kant,
>
> I suppose you refer to the stream join in SQL/Table API since the outer
> join for windowed-streams can always be achieved with the `JoinFunction` in
> DataStream API.
>
> There are two kinds of stream joins, namely, the time-windowed join and
> the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed
> outer join has been supported since version 1.5 and the non-windowed outer
> join is still work in progress.
>
> Hope that helps.
>
> Best,
> Xingcan
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> master/dev/table/tableApi.html#joins
> [2] https://ci.apache.org/projects/flink/flink-docs-
> master/dev/table/sql.html#joins
>
>
> On 7 Mar 2018, at 12:45 AM, kant kodali <ka...@gmail.com> wrote:
>
> Hi All,
>
> Does Flink support stream-stream outer joins in the latest version?
>
> Thanks!
>
>
>

Re: Does Flink support stream-stream outer joins in the latest version?

Posted by Xingcan Cui <xi...@gmail.com>.
Hi Kant,

I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API.

There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join has been supported since version 1.5 and the non-windowed outer join is still work in progress.

Hope that helps.

Best,
Xingcan

[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins>
[2] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins>


> On 7 Mar 2018, at 12:45 AM, kant kodali <ka...@gmail.com> wrote:
> 
> Hi All,
> 
> Does Flink support stream-stream outer joins in the latest version?
> 
> Thanks!