You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Su (Jira)" <ji...@apache.org> on 2020/09/12 07:46:00 UTC

[jira] [Created] (SPARK-32862) Left semi stream-stream join

Cheng Su created SPARK-32862:
--------------------------------

             Summary: Left semi stream-stream join
                 Key: SPARK-32862
                 URL: https://issues.apache.org/jira/browse/SPARK-32862
             Project: Spark
          Issue Type: New Feature
          Components: Structured Streaming
    Affects Versions: 3.1.0
            Reporter: Cheng Su


Current stream-stream join supports inner, left outer and right outer join ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166] ). We do see internally a lot of users are using left semi stream-stream join (not spark structured streaming), e.g. I want to get the ad impression (join left side) which has click (joint right side), but I don't care how many clicks per ad (left semi semantics).

 

Left semi stream-stream join will work as followed:

(1).for left side input row, check if there's a match on right side state store

  (1.1). if there's a match, output the left side row.

  (1.2). if there's no match, put the row in left side state store (with "matched" field to set to false in state store).

(2).for right side input row, check if there's a match on left side state store. If there's a match, update left side row state with "matched" field to set to true. Put the right side row in right side state store.

(3).for left side row needs to be evicted from state store, output the row if "matched" field is true.

(4).for right side row needs to be evicted from state store, doing nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org