You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Aayushmaan Jain (JIRA)" <ji...@apache.org> on 2019/04/02 05:21:00 UTC
[jira] [Created] (SPARK-27342) Optimize limit 0 queries
Aayushmaan Jain created SPARK-27342:
---------------------------------------
Summary: Optimize limit 0 queries
Key: SPARK-27342
URL: https://issues.apache.org/jira/browse/SPARK-27342
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.4.0
Reporter: Aayushmaan Jain
Limit 0 queries do not require (table) files to be scanned. However, currently, for queries containing a Limit 0 clause, the tables are read entirely. This leads to needless file I/O(s) and network transmission(s).
For instance:
* *Select Query*
+QUERY:+
{code:java}
SELECT * FROM table1 LIMIT 0
{code}
+OPTIMIZED PLAN:+
{code:java}
GlobalLimit 0
+- LocalLimit 0
+- Relation[id#79,num1#80] parquet{code}
* *Inner Join Query*
+QUERY:+
{code:java}
SELECT * FROM table1 INNER JOIN (SELECT * FROM table2 LIMIT 0) AS table2 ON table1.id = table2.id{code}
+OPTIMIZED PLAN:+
{code:java}
Join Inner, (id#79 = id#87)
:- Filter isnotnull(id#79)
: +- Relation[id#79,num1#80] parquet
+- Filter isnotnull(id#87)
+- GlobalLimit 0
+- LocalLimit 0
+- Relation[id#87,num2#88] parquet{code}
Similarly, other operations such as Left Outer Join, Right Outer Join, Intersect, etc. are also affected.
As a result, this causes the Spark application to waste execution time and network bandwidth.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org