You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by canan chen <cc...@gmail.com> on 2015/07/09 02:21:45 UTC

What does RDD lineage refer to ?

Lots of places refer RDD lineage, I'd like to know what it refer to
exactly.  My understanding is that it means the RDD dependencies and the
intermediate MapOutput info in MapOutputTracker.  Correct me if I am wrong.
Thanks

Re: What does RDD lineage refer to ?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Yes, just to add see the following scenario of rdd lineage:

RDD1 -> RDD2 -> RDD3 -> RDD4


here RDD2 depends on the RDD1's output and the lineage goes till RDD4.

Now, for some reason RDD3 is lost, and spark will recompute it from RDD2.

Thanks
Best Regards

On Thu, Jul 9, 2015 at 5:51 AM, canan chen <cc...@gmail.com> wrote:

> Lots of places refer RDD lineage, I'd like to know what it refer to
> exactly.  My understanding is that it means the RDD dependencies and the
> intermediate MapOutput info in MapOutputTracker.  Correct me if I am wrong.
> Thanks
>
>
>