You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Mingcong Han (JIRA)" <ji...@apache.org> on 2019/04/02 12:52:00 UTC

[jira] [Commented] (SPARK-27342) Optimize limit 0 queries

    [ https://issues.apache.org/jira/browse/SPARK-27342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807734#comment-16807734 ] 

Mingcong Han commented on SPARK-27342:
--------------------------------------

I'm wondering what kind of optimization is expected?
For example:
{code:sql}
SELECT * 
FROM table1 
LEFT OUTER JOIN (SELECT * FROM table2 LIMIT 0) AS table2 
ON table1.id = table2.id
{code}

How do we optimize this query? 



> Optimize limit 0 queries
> ------------------------
>
>                 Key: SPARK-27342
>                 URL: https://issues.apache.org/jira/browse/SPARK-27342
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Aayushmaan Jain
>            Priority: Major
>
> Limit 0 queries do not require (table) files to be scanned. However, currently, for queries containing a Limit 0 clause, the tables are read entirely. This leads to needless file I/O(s) and network transmission(s).
> For instance:
>  * *Select Query*
> +QUERY:+ 
> {code:java}
> SELECT * FROM table1 LIMIT 0
> {code}
> +OPTIMIZED PLAN:+
> {code:java}
> GlobalLimit 0
> +- LocalLimit 0
>    +- Relation[id#79,num1#80] parquet{code}
>  * *Inner Join Query*
> +QUERY:+
> {code:java}
> SELECT * FROM table1 INNER JOIN (SELECT * FROM table2 LIMIT 0) AS table2 ON table1.id = table2.id{code}
> +OPTIMIZED PLAN:+
> {code:java}
> Join Inner, (id#79 = id#87)
> :- Filter isnotnull(id#79)
> :  +- Relation[id#79,num1#80] parquet
> +- Filter isnotnull(id#87)
>    +- GlobalLimit 0
>       +- LocalLimit 0
>          +- Relation[id#87,num2#88] parquet{code}
>  
>  
> Similarly, other operations such as Left Outer Join, Right Outer Join, Intersect, etc. are also affected.
> As a result, this causes the Spark application to waste execution time and network bandwidth.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org