You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mskh <ma...@yahoo.com> on 2014/02/04 13:44:19 UTC

Spark and disk cache.

Hi,

When I cache a table in memory for the first time in Spark (version 0.8.0),
it usually takes 10 mins. If I were to quit Spark and restart it then
re-cache the same table in memory, the operation would take 4 mins. I had
the assumption that quitting the Spark session will un-cache the table from
memory. Does any OS caching take place since re-caching the table takes half
the original time?

Thanks
Mskh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-disk-cache-tp1180.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark and disk cache.

Posted by Woody Christy <wc...@cloudera.com>.
It sounds like your underlying data set is in the OS page cache.  If you
want to do test that does it purely from disk do this on each node before
you re-cache the same table:

echo 3 > /proc/sys/vm/drop_caches



On Tue, Feb 4, 2014 at 7:44 AM, Mskh <ma...@yahoo.com> wrote:

> Hi,
>
> When I cache a table in memory for the first time in Spark (version 0.8.0),
> it usually takes 10 mins. If I were to quit Spark and restart it then
> re-cache the same table in memory, the operation would take 4 mins. I had
> the assumption that quitting the Spark session will un-cache the table from
> memory. Does any OS caching take place since re-caching the table takes
> half
> the original time?
>
> Thanks
> Mskh
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-disk-cache-tp1180.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>



-- 

Woody Christy
Solutions Architect | Partner Engineering | Cloudera Inc
@woodychristy