You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Liz Bai <li...@icloud.com> on 2016/10/26 11:25:13 UTC

LIMIT statement on SparkSQL

Hi all,

We used Parquet and Spark 2.0 to do the testing. The table below is the summary of what we have found about `Limit` keyword. Query-2 reveals that SparkSQL does early stop upon getting adequate results. But we are curious of Query-1 and Query-2. It seems that, either writing result RDD as Parquet or filtering on columns will lead to scanning much more data.
No.
SQL statement
Filter
Method of saving result
Runtime(s)
Input data size
1
select ColA from Table limit 1
no
writeParquet
216
205MB
2
select ColA from Table limit 1
no
Collect
22
38.3KB
3
select ColA from Table where ColB = 50 limit 1
yes
Collect
229
1776.4MB
We are wondering if this is a bug or something else. Could you please help on it?
Thanks.

Best regards,
Liz

Re: LIMIT statement on SparkSQL

Posted by Liz Bai <li...@icloud.com>.
Sorry for the typo in last mail.
Compared with the Query-2, we have questions in Query-1 and Query-3. 
Also, may I know the difference between CollectLimit and BaseLimit?
Thanks so much.

Best,
Liz
> On 26 Oct 2016, at 7:25 PM, Liz Bai <li...@icloud.com> wrote:
> 
> Hi all,
> 
> We used Parquet and Spark 2.0 to do the testing. The table below is the summary of what we have found about `Limit` keyword. Query-2 reveals that SparkSQL does early stop upon getting adequate results. But we are curious of Query-1 and Query-2.
*But we are curious of Query-1 and Query-3.
> It seems that, either writing result RDD as Parquet or filtering on columns will lead to scanning much more data.
> No.
> SQL statement
> Filter
> Method of saving result
> Runtime(s)
> Input data size
> 1
> select ColA from Table limit 1
> no
> writeParquet
> 216
> 205MB
> 2
> select ColA from Table limit 1
> no
> Collect
> 22
> 38.3KB
> 3
> select ColA from Table where ColB = 50 limit 1
> yes
> Collect
> 229
> 1776.4MB
> We are wondering if this is a bug or something else. Could you please help on it?
> Thanks.
> 
> Best regards,
> Liz