You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Corey Nolet <cj...@gmail.com> on 2014/12/30 00:43:54 UTC

How to tell if RDD no longer has any children

Let's say I have an RDD which gets cached and has two children which do
something with it:

val rdd1 = .......cache()

rdd1.saveAsSequenceFile()

rdd1.groupBy()......saveAsSequenceFile()

If I were to submit both calls to saveAsSequenceFile() in  thread to take
advantage of concurrency (where possible), what's the best way to determine
when rdd1 is no longer being used by anything?

I'm hoping the best way is not to do reference counting in the futures that
are running the saveAsSequenceFile().