You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cheng Su (Jira)" <ji...@apache.org> on 2021/02/24 03:27:00 UTC

[jira] [Created] (SPARK-34514) Push down limit for LEFT SEMI and LEFT ANTI join

Cheng Su created SPARK-34514:
--------------------------------

             Summary: Push down limit for LEFT SEMI and LEFT ANTI join
                 Key: SPARK-34514
                 URL: https://issues.apache.org/jira/browse/SPARK-34514
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.0
            Reporter: Cheng Su


I found out during code review of [https://github.com/apache/spark/pull/31567|https://github.com/apache/spark/pull/31567,]( [https://github.com/apache/spark/pull/31567#discussion_r577379572] ), where we can push down limit to the left side of LEFT SEMI and LEFT ANTI join, if the join condition is empty.

Why it's safe to push down limit:

The semantics of LEFT SEMI join without condition:

(1). if right side is non-empty, output all rows from left side.

(2). if right side is empty, output nothing.

 

The semantics of LEFT ANTI join without condition:

(1). if right side is non-empty, output nothing.

(2). if right side is empty, output all rows from left side.

 

With the semantics of output all rows from left side or nothing (all or nothing), it's safe to push down limit to left side.

NOTE: LEFT SEMI / LEFT ANTI join with non-empty condition is not safe for limit push down, because output can be a portion of left side rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org