You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jluan <ja...@gmail.com> on 2016/01/09 03:50:28 UTC
how garbage collection works on parallelize
Hi,
I am curious about garbage collect on an object which gets parallelized. Say
if we have a really large array (say 40GB in ram) that we want to
parallelize across our machines.
I have the following function:
def doSomething(): RDD[Double] = {
val reallyBigArray = Array[Double[(some really big value)
sc.parallelize(reallyBigArray)
}
Theoretically, will reallyBigArray be marked for GC? Or will reallyBigArray
not be GC'd because parallelize somehow has a reference on reallyBigArray?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-garbage-collection-works-on-parallelize-tp25926.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: how garbage collection works on parallelize
Posted by Josh Rosen <jo...@databricks.com>.
It won't be GC'd as long as the RDD which results from `parallelize()` is
kept around; that RDD keeps strong references to the parallelized
collection's elements in order to enable fault-tolerance.
On Fri, Jan 8, 2016 at 6:50 PM, jluan <ja...@gmail.com> wrote:
> Hi,
>
> I am curious about garbage collect on an object which gets parallelized.
> Say
> if we have a really large array (say 40GB in ram) that we want to
> parallelize across our machines.
>
> I have the following function:
>
> def doSomething(): RDD[Double] = {
> val reallyBigArray = Array[Double[(some really big value)
> sc.parallelize(reallyBigArray)
> }
>
> Theoretically, will reallyBigArray be marked for GC? Or will reallyBigArray
> not be GC'd because parallelize somehow has a reference on reallyBigArray?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-garbage-collection-works-on-parallelize-tp25926.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>