You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ankits <an...@gmail.com> on 2014/12/11 02:34:52 UTC

RDDs being cleaned too fast

I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too fast.
How can i inspect the size of RDD in memory and get more information about
why it was cleaned up. There should be more than enough memory available on
the cluster to store them, and by default, the spark.cleaner.ttl is
infinite, so I want more information about why this is happening and how to
prevent it.

Spark just logs this when removing RDDs:

[2014-12-11 01:19:34,006] INFO  spark.storage.BlockManager [] [] - Removing
RDD 33
[2014-12-11 01:19:34,010] INFO  pache.spark.ContextCleaner []
[akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
[2014-12-11 01:19:34,012] INFO  spark.storage.BlockManager [] [] - Removing
RDD 33
[2014-12-11 01:19:34,016] INFO  pache.spark.ContextCleaner []
[akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RDDs being cleaned too fast

Posted by Harihar Nahak <hn...@wynyardgroup.com>.

RDD.persist() can be useful here.

On 11 December 2014 at 14:34, ankits [via Apache Spark User List] <
ml-node+s1001560n20613h67@n3.nabble.com> wrote:
>
> I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too
> fast. How can i inspect the size of RDD in memory and get more information
> about why it was cleaned up. There should be more than enough memory
> available on the cluster to store them, and by default, the
> spark.cleaner.ttl is infinite, so I want more information about why this is
> happening and how to prevent it.
>
> Spark just logs this when removing RDDs:
>
> [2014-12-11 01:19:34,006] INFO  spark.storage.BlockManager [] [] -
> Removing RDD 33
> [2014-12-11 01:19:34,010] INFO  pache.spark.ContextCleaner []
> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
> [2014-12-11 01:19:34,012] INFO  spark.storage.BlockManager [] [] -
> Removing RDD 33
> [2014-12-11 01:19:34,016] INFO  pache.spark.ContextCleaner []
> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h95@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hnahak@wynyardgroup.com | Extn: 8019




-----
--Harihar
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613p20738.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: RDDs being cleaned too fast

Posted by Ranga <sr...@gmail.com>.

I was having similar issues with my persistent RDDs. After some digging
around, I noticed that the partitions were not balanced evenly across the
available nodes. After a "repartition", the RDD was spread evenly across
all available memory. Not sure if that is something that would help your
use-case though.
You could also increase the spark.storage.memoryFraction if that is an
option.


- Ranga

On Wed, Dec 10, 2014 at 10:23 PM, Aaron Davidson <il...@gmail.com> wrote:

> The ContextCleaner uncaches RDDs that have gone out of scope on the
> driver. So it's possible that the given RDD is no longer reachable in your
> program's control flow, or else it'd be a bug in the ContextCleaner.
>
> On Wed, Dec 10, 2014 at 5:34 PM, ankits <an...@gmail.com> wrote:
>
>> I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too
>> fast.
>> How can i inspect the size of RDD in memory and get more information about
>> why it was cleaned up. There should be more than enough memory available
>> on
>> the cluster to store them, and by default, the spark.cleaner.ttl is
>> infinite, so I want more information about why this is happening and how
>> to
>> prevent it.
>>
>> Spark just logs this when removing RDDs:
>>
>> [2014-12-11 01:19:34,006] INFO  spark.storage.BlockManager [] [] -
>> Removing
>> RDD 33
>> [2014-12-11 01:19:34,010] INFO  pache.spark.ContextCleaner []
>> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
>> [2014-12-11 01:19:34,012] INFO  spark.storage.BlockManager [] [] -
>> Removing
>> RDD 33
>> [2014-12-11 01:19:34,016] INFO  pache.spark.ContextCleaner []
>> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: RDDs being cleaned too fast

Posted by Aaron Davidson <il...@gmail.com>.

The ContextCleaner uncaches RDDs that have gone out of scope on the driver.
So it's possible that the given RDD is no longer reachable in your
program's control flow, or else it'd be a bug in the ContextCleaner.

On Wed, Dec 10, 2014 at 5:34 PM, ankits <an...@gmail.com> wrote:

> I'm using spark 1.1.0 and am seeing persisted RDDs being cleaned up too
> fast.
> How can i inspect the size of RDD in memory and get more information about
> why it was cleaned up. There should be more than enough memory available on
> the cluster to store them, and by default, the spark.cleaner.ttl is
> infinite, so I want more information about why this is happening and how to
> prevent it.
>
> Spark just logs this when removing RDDs:
>
> [2014-12-11 01:19:34,006] INFO  spark.storage.BlockManager [] [] - Removing
> RDD 33
> [2014-12-11 01:19:34,010] INFO  pache.spark.ContextCleaner []
> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
> [2014-12-11 01:19:34,012] INFO  spark.storage.BlockManager [] [] - Removing
> RDD 33
> [2014-12-11 01:19:34,016] INFO  pache.spark.ContextCleaner []
> [akka://JobServer/user/context-supervisor/job-context1] - Cleaned RDD 33
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-being-cleaned-too-fast-tp20613.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>