You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Aayushmaan Jain (JIRA)" <ji...@apache.org> on 2019/04/02 05:21:00 UTC

[jira] [Created] (SPARK-27342) Optimize limit 0 queries

Aayushmaan Jain created SPARK-27342:
---------------------------------------

             Summary: Optimize limit 0 queries
                 Key: SPARK-27342
                 URL: https://issues.apache.org/jira/browse/SPARK-27342
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Aayushmaan Jain


Limit 0 queries do not require (table) files to be scanned. However, currently, for queries containing a Limit 0 clause, the tables are read entirely. This leads to needless file I/O(s) and network transmission(s).

For instance:
 * *Select Query*

+QUERY:+ 
{code:java}
SELECT * FROM table1 LIMIT 0
{code}
+OPTIMIZED PLAN:+
{code:java}
GlobalLimit 0
+- LocalLimit 0
   +- Relation[id#79,num1#80] parquet{code}
 * *Inner Join Query*

+QUERY:+
{code:java}
SELECT * FROM table1 INNER JOIN (SELECT * FROM table2 LIMIT 0) AS table2 ON table1.id = table2.id{code}
+OPTIMIZED PLAN:+
{code:java}
Join Inner, (id#79 = id#87)
:- Filter isnotnull(id#79)
:  +- Relation[id#79,num1#80] parquet
+- Filter isnotnull(id#87)
   +- GlobalLimit 0
      +- LocalLimit 0
         +- Relation[id#87,num2#88] parquet{code}
 

 

Similarly, other operations such as Left Outer Join, Right Outer Join, Intersect, etc. are also affected.

As a result, this causes the Spark application to waste execution time and network bandwidth.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org