You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/02 05:38:30 UTC

[GitHub] [spark] aayushmaanjain opened a new pull request #24271: [SPAR-27342][SQL] Optimize Limit 0 queries

aayushmaanjain opened a new pull request #24271: [SPAR-27342][SQL] Optimize Limit 0 queries
URL: https://github.com/apache/spark/pull/24271
 
 
   ## What changes were proposed in this pull request?
   With this change, unnecessary file scans are avoided in case of Limit 0 queries. 
   
   I added a case (rule) to `PropagateEmptyRelation` to replace `GlobalLimit 0` and `LocalLimit 0` nodes with an empty `LocalRelation`. This prunes the subtree under the Limit 0 node and further allows other rules of `PropagateEmptyRelation` to optimize the Logical Plan - while remaining semantically consistent with the Limit 0 query.
   
   For instance:
   **Query:**
   `SELECT * FROM table1 INNER JOIN (SELECT * FROM table2 LIMIT 0) AS table2 ON table1.id = table2.id`
   **Optimized Plan without fix:**
   `Join Inner, (id#79 = id#87)
   :- Filter isnotnull(id#79)
   :  +- Relation[id#79,num1#80] parquet
   +- Filter isnotnull(id#87)
      +- GlobalLimit 0
         +- LocalLimit 0
            +- Relation[id#87,num2#88] parquet`
   **Optimized Plan with fix:**
   `LocalRelation <empty>, [id#75, num1#76, id#77, num2#78]`
   
   ## How was this patch tested?
   Added unit tests to verify Limit 0 optimization for:
   - Simple query containing Limit 0
   - Inner Join, Left Outer Join, Right Outer Join, Full Outer Join queries containing Limit 0 as one of their children
   - Nested Inner Joins between 3 tables with one of them having a Limit 0 clause.
   - Intersect query wherein one of the subqueries was a Limit 0 query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org