You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by mi...@barclays.com on 2014/07/09 12:31:04 UTC

FW: memory question

Hi,

Does anyone know if it is possible to call the MetadaCleaner on demand? i.e. rather than set spark.cleaner.ttl and have this run periodically, I'd like to run it on demand. The problem with periodic cleaning is that it can remove rdd that we still require (some calcs are short, others very long).

We're using Spark 0.9.0 with Cloudera distribution.

I have a simple test calculation in a loop as follows:

val test = new TestCalc(sparkContext)
    for (i <- 1 to 100000) {
      val (x) = test.evaluate(rdd)
}

Where TestCalc is defined as:
class TestCalc(sparkContext: SparkContext) extends Serializable  {

  def aplus(a: Double, b:Double) :Double = a+b;

  def evaluate(rdd : RDD[Double]) = {
     /* do some dummy calc. */
      val x = rdd.groupBy(x => x /2.0)
      val y = x.fold((0.0,Seq[Double]()))((a,b)=>(aplus(a._1,b._1),Seq()))
      val z = y._1
    /* try with/without this... */
      val e :SparkEnv = SparkEnv.getThreadLocal
      e.blockManager.master.removeRdd(x.id,true) // still see memory consumption go up...
    (z)
  }
}

What I can see on the cluster is the memory usage on the node executing this continually
climbs. I'd expect it to level off and not jump up over 1G...
I thought that putting in the line 'removeRdd' might help, but it doesn't seem to make a difference....


Regards,
Mike


_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

Re: FW: memory question

Posted by Aaron Davidson <il...@gmail.com>.
Spark 1.0.0 introduced the ContextCleaner to replace the MetadataCleaner
API for this exact issue. The ContextClenaer automatically cleans up your
RDD metadata once the RDD gets garbage collected on the driver.


On Wed, Jul 9, 2014 at 3:31 AM, <mi...@barclays.com> wrote:

> Hi,
>
>
>
> Does anyone know if it is possible to call the MetadaCleaner on demand?
> i.e. rather than set spark.cleaner.ttl and have this run periodically,
> I’d like to run it on demand. The problem with periodic cleaning is that
> it can remove rdd that we still require (some calcs are short, others
> very long).
>
>
>
> We’re using Spark 0.9.0 with Cloudera distribution.
>
>
>
> I have a simple test calculation in a loop as follows:
>
>
>
> val test = new TestCalc(sparkContext)
>
>     for (i <- 1 to 100000) {
>
>       val (x) = test.evaluate(rdd)
>
> }
>
>
>
> Where TestCalc is defined as:
>
> class TestCalc(sparkContext: SparkContext) extends Serializable  {
>
>
>
>   def aplus(a: Double, b:Double) :Double = a+b;
>
>
>
>   def evaluate(rdd : RDD[Double]) = {
>
>      /* do some dummy calc. */
>
>       val x = rdd.groupBy(x => x /2.0)
>
>       val y = x.fold((0.0,Seq[Double]()))((a,b)=>(aplus(a._1,b._1),Seq()))
>
>       val z = y._1
>
>     /* try with/without this… */
>
>       val e :SparkEnv = SparkEnv.getThreadLocal
>
>       e.blockManager.master.removeRdd(x.id,true) // still see memory
> consumption go up…
>
>     (z)
>
>   }
>
> }
>
>
>
> What I can see on the cluster is the memory usage on the node executing
> this continually
>
> climbs. I’d expect it to level off and not jump up over 1G…
>
> I thought that putting in the line ‘removeRdd’ might help, but it doesn’t
> seem to make a difference….
>
>
>
>
>
> Regards,
>
> Mike
>
>
>
> _______________________________________________
>
> This message is for information purposes only, it is not a recommendation,
> advice, offer or solicitation to buy or sell a product or service nor an
> official confirmation of any transaction. It is directed at persons who are
> professionals and is not intended for retail customer use. Intended for
> recipient only. This message is subject to the terms at:
> www.barclays.com/emaildisclaimer.
>
> For important disclosures, please see:
> www.barclays.com/salesandtradingdisclaimer regarding market commentary
> from Barclays Sales and/or Trading, who are active market participants; and
> in respect of Barclays Research, including disclosures relating to specific
> issuers, please see http://publicresearch.barclays.com.
>
> _______________________________________________
>