You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by real-cheng-su <ch...@fb.com.INVALID> on 2020/11/20 19:15:30 UTC

[SS] full outer stream-stream join

Hi,
 
Stream-stream join in spark structured streaming right now supports INNER,
LEFT OUTER, RIGHT OUTER and LEFT SEMI join type. But it does not support
FULL OUTER join and we are working on to add it in
https://github.com/apache/spark/pull/30395 .
 
Given LEFT OUTER and RIGHT OUTER stream-stream join is supported, the code
needed for FULL OUTER join is actually quite straightforward:

* For left side input row, check if there's a match on right side state
store. if there's a match, output the joined row, o.w. output nothing. Put
the row in left side state store.
* For right side input row, check if there's a match on left side state
store. if there's a match, output the joined row, o.w. output nothing. Put
the row in right side state store.
* State store eviction: evict rows from left/right side state store below
watermark, and output rows never matched before (a combination of left outer
and right outer join).

Given FULL OUTER join consumes same amount of space in state store, compared
with INNER/LEFT OUTER/RIGH OUTER join, and pretty easy to add. I don’t see
any issues from system perspective that FULL OUTER join should not be added.

I am wondering is there any major blocker to add FULL OUTER stream-stream
join? Asking in dev mailing list in case we miss anything besides PR review
participation, thanks.

Cheng Su



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: [SS] full outer stream-stream join

Posted by Jungtaek Lim <ka...@gmail.com>.
Adding rationalization here, my request for raising the thead to dev
mailing list is, to figure out possible reasons not having full outer join
at the moment when adding left/right outer join.

This is rather historical knowledge, so I have no idea about this. Most
likely a limited number of folks could answer and I hope we could get some
historical information.

Note that I don't object the change. Just wanted to make clear we don't
miss something.

On Sat, Nov 21, 2020 at 4:15 AM real-cheng-su <ch...@fb.com.invalid>
wrote:

> Hi,
>
> Stream-stream join in spark structured streaming right now supports INNER,
> LEFT OUTER, RIGHT OUTER and LEFT SEMI join type. But it does not support
> FULL OUTER join and we are working on to add it in
> https://github.com/apache/spark/pull/30395 .
>
> Given LEFT OUTER and RIGHT OUTER stream-stream join is supported, the code
> needed for FULL OUTER join is actually quite straightforward:
>
> * For left side input row, check if there's a match on right side state
> store. if there's a match, output the joined row, o.w. output nothing. Put
> the row in left side state store.
> * For right side input row, check if there's a match on left side state
> store. if there's a match, output the joined row, o.w. output nothing. Put
> the row in right side state store.
> * State store eviction: evict rows from left/right side state store below
> watermark, and output rows never matched before (a combination of left
> outer
> and right outer join).
>
> Given FULL OUTER join consumes same amount of space in state store,
> compared
> with INNER/LEFT OUTER/RIGH OUTER join, and pretty easy to add. I don’t see
> any issues from system perspective that FULL OUTER join should not be
> added.
>
> I am wondering is there any major blocker to add FULL OUTER stream-stream
> join? Asking in dev mailing list in case we miss anything besides PR review
> participation, thanks.
>
> Cheng Su
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>