You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by snjv <sn...@gmail.com> on 2018/04/03 05:42:16 UTC

[Spark sql]: Re-execution of same operation takes less time than 1st

Hi,

When we execute the same operation twice, spark takes less time ( ~40%) than
the first.
Our operation is like this: 
Read 150M rows ( spread in multiple parquet files) into DF
Read 10M rows ( spread in multiple parquet files) into other DF.
Do an intersect operation.

Size of 150M row file: 587MB
size of 10M file: 50M

If first execution takes around 20 sec the next one will take just 10-12
sec.
Any specific reason for this? Is any optimization is there that we can
utilize during the first operation?

Regards
Sanjeev



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: [Spark sql]: Re-execution of same operation takes less time than 1st

Posted by naresh Goud <na...@gmail.com>.

Whenever spark read the data from it will have it in executor memory until
and unless there is no room for new data read or processed. This is the
beauty of spark.


On Tue, Apr 3, 2018 at 12:42 AM snjv <sn...@gmail.com> wrote:

> Hi,
>
> When we execute the same operation twice, spark takes less time ( ~40%)
> than
> the first.
> Our operation is like this:
> Read 150M rows ( spread in multiple parquet files) into DF
> Read 10M rows ( spread in multiple parquet files) into other DF.
> Do an intersect operation.
>
> Size of 150M row file: 587MB
> size of 10M file: 50M
>
> If first execution takes around 20 sec the next one will take just 10-12
> sec.
> Any specific reason for this? Is any optimization is there that we can
> utilize during the first operation?
>
> Regards
> Sanjeev
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> --
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/