You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jluan <ja...@gmail.com> on 2016/01/09 03:50:28 UTC

how garbage collection works on parallelize

Hi,

I am curious about garbage collect on an object which gets parallelized. Say
if we have a really large array (say 40GB in ram) that we want to
parallelize across our machines. 

I have the following function:

def doSomething(): RDD[Double] = {
val reallyBigArray = Array[Double[(some really big value)
sc.parallelize(reallyBigArray)
}

Theoretically, will reallyBigArray be marked for GC? Or will reallyBigArray
not be GC'd because parallelize somehow has a reference on reallyBigArray?




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-garbage-collection-works-on-parallelize-tp25926.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: how garbage collection works on parallelize

Posted by Josh Rosen <jo...@databricks.com>.
It won't be GC'd as long as the RDD which results from `parallelize()` is
kept around; that RDD keeps strong references to the parallelized
collection's elements in order to enable fault-tolerance.

On Fri, Jan 8, 2016 at 6:50 PM, jluan <ja...@gmail.com> wrote:

> Hi,
>
> I am curious about garbage collect on an object which gets parallelized.
> Say
> if we have a really large array (say 40GB in ram) that we want to
> parallelize across our machines.
>
> I have the following function:
>
> def doSomething(): RDD[Double] = {
> val reallyBigArray = Array[Double[(some really big value)
> sc.parallelize(reallyBigArray)
> }
>
> Theoretically, will reallyBigArray be marked for GC? Or will reallyBigArray
> not be GC'd because parallelize somehow has a reference on reallyBigArray?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-garbage-collection-works-on-parallelize-tp25926.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>