You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "jobit mathew (JIRA)" <ji...@apache.org> on 2019/08/13 06:30:00 UTC

[jira] [Created] (SPARK-28707) Spark SQL select query result size issue

jobit mathew created SPARK-28707:
------------------------------------

             Summary: Spark SQL select query result size issue
                 Key: SPARK-28707
                 URL: https://issues.apache.org/jira/browse/SPARK-28707
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: jobit mathew


Spark sql select * from table; query fails with the validation of spark.driver.maxResultSize.
But select * from table  limit 1000; pass with the same table data.
*Test steps*
spark.driver.maxResultSize=5120 in spark-default.conf

1.Create a table with more than 5KB size in my example 23KB text file with name consecutive2.txt
local path /opt/jobit/consecutive2.txt
AUS,30.33,CRICKET,1000
AUS,30.33,CRICKET,1001
--
AUS,30.33,CRICKET,1999
2.launch spark-sql --master yarn
3.create table cons5(country String,avg float, sports String, year int) row format delimited fields terminated by ',';
4.load data local inpath '/opt/jobit/consecutive2.txt' into table cons5;
5. select count(*)from cons5; gives 1000;
6. select * from cons5;
getting a error as mentioned below.

6.select * from cons5 limit 1000;  query and displays  the 1000 data .Not getting any error and query executing successfully.

*ERROR*
select * from cons5;
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 2 tasks (7.5 KB) is bigger than spark.driver.maxResultSize (5.0 KB)
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

As per my observation limit query also should validate maxResultSize check if select * does.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org