You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "bit1129@163.com" <bi...@163.com> on 2015/07/31 04:39:48 UTC

How RDD lineage works

Hi,

I don't get a good understanding how RDD lineage works, so I would ask whether spark provides a unit test in the code base to illustrate how RDD lineage works.
If there is, What's the class name is it? 
Thanks!



bit1129@163.com

Re: Re: How RDD lineage works

Posted by "bit1129@163.com" <bi...@163.com>.
Thanks TD, I have got some understanding now.



bit1129@163.com
 
From: Tathagata Das
Date: 2015-07-31 13:45
To: bit1129@163.com
CC: yuzhihong; user
Subject: Re: Re: How RDD lineage works
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FailureSuite.scala
This may help.

On Thu, Jul 30, 2015 at 10:42 PM, bit1129@163.com <bi...@163.com> wrote:
The following is copied from the paper, is something related with rdd lineage. Is there a unit test that covers this scenario(rdd partition lost and recovery)?
Thanks. 

If a partition of an RDD is lost, the RDD has enough information about how it was derived from other RDDs to recompute 
just that partition. Thus, lost data can be recovered, often quite quickly, without requiring costly replication.



bit1129@163.com
 
From: bit1129@163.com
Date: 2015-07-31 13:11
To: Tathagata Das; yuzhihong
CC: user
Subject: Re: Re: How RDD lineage works
Thanks TD and Zhihong for the guide. I will check it




bit1129@163.com
 
From: Tathagata Das
Date: 2015-07-31 12:27
To: Ted Yu
CC: bit1129@163.com; user
Subject: Re: How RDD lineage works
You have to read the original Spark paper to understand how RDD lineage works. 
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu <yu...@gmail.com> wrote:
Please take a look at:
core/src/test/scala/org/apache/spark/CheckpointSuite.scala

Cheers

On Thu, Jul 30, 2015 at 7:39 PM, bit1129@163.com <bi...@163.com> wrote:
Hi,

I don't get a good understanding how RDD lineage works, so I would ask whether spark provides a unit test in the code base to illustrate how RDD lineage works.
If there is, What's the class name is it? 
Thanks!



bit1129@163.com




Re: Re: How RDD lineage works

Posted by Tathagata Das <ta...@gmail.com>.
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FailureSuite.scala
This may help.

On Thu, Jul 30, 2015 at 10:42 PM, bit1129@163.com <bi...@163.com> wrote:

> The following is copied from the paper, is something related with rdd
> lineage. Is there a unit test that covers this scenario(rdd partition lost
> and recovery)?
> Thanks.
>
> If a partition of an RDD is lost, the RDD has enough information about how
> it was derived from other RDDs to recompute
> just that partition. Thus, lost data can be recovered, often quite
> quickly, without requiring costly replication.
>
> ------------------------------
> bit1129@163.com
>
>
> *From:* bit1129@163.com
> *Date:* 2015-07-31 13:11
> *To:* Tathagata Das <ta...@gmail.com>; yuzhihong
> <yu...@gmail.com>
> *CC:* user <us...@spark.apache.org>
> *Subject:* Re: Re: How RDD lineage works
> Thanks TD and Zhihong for the guide. I will check it
>
>
> ------------------------------
> bit1129@163.com
>
>
> *From:* Tathagata Das <ta...@gmail.com>
> *Date:* 2015-07-31 12:27
> *To:* Ted Yu <yu...@gmail.com>
> *CC:* bit1129@163.com; user <us...@spark.apache.org>
> *Subject:* Re: How RDD lineage works
> You have to read the original Spark paper to understand how RDD lineage
> works.
> https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
>
> On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Please take a look at:
>> core/src/test/scala/org/apache/spark/CheckpointSuite.scala
>>
>> Cheers
>>
>> On Thu, Jul 30, 2015 at 7:39 PM, bit1129@163.com <bi...@163.com> wrote:
>>
>>> Hi,
>>>
>>> I don't get a good understanding how RDD lineage works, so I would ask
>>> whether spark provides a unit test in the code base to illustrate how RDD
>>> lineage works.
>>> If there is, What's the class name is it?
>>> Thanks!
>>>
>>> ------------------------------
>>> bit1129@163.com
>>>
>>
>>
>

Re: Re: How RDD lineage works

Posted by "bit1129@163.com" <bi...@163.com>.
The following is copied from the paper, is something related with rdd lineage. Is there a unit test that covers this scenario(rdd partition lost and recovery)?
Thanks. 

If a partition of an RDD is lost, the RDD has enough information about how it was derived from other RDDs to recompute 
just that partition. Thus, lost data can be recovered, often quite quickly, without requiring costly replication.



bit1129@163.com
 
From: bit1129@163.com
Date: 2015-07-31 13:11
To: Tathagata Das; yuzhihong
CC: user
Subject: Re: Re: How RDD lineage works
Thanks TD and Zhihong for the guide. I will check it




bit1129@163.com
 
From: Tathagata Das
Date: 2015-07-31 12:27
To: Ted Yu
CC: bit1129@163.com; user
Subject: Re: How RDD lineage works
You have to read the original Spark paper to understand how RDD lineage works. 
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu <yu...@gmail.com> wrote:
Please take a look at:
core/src/test/scala/org/apache/spark/CheckpointSuite.scala

Cheers

On Thu, Jul 30, 2015 at 7:39 PM, bit1129@163.com <bi...@163.com> wrote:
Hi,

I don't get a good understanding how RDD lineage works, so I would ask whether spark provides a unit test in the code base to illustrate how RDD lineage works.
If there is, What's the class name is it? 
Thanks!



bit1129@163.com



Re: Re: How RDD lineage works

Posted by "bit1129@163.com" <bi...@163.com>.
Thanks TD and Zhihong for the guide. I will check it




bit1129@163.com
 
From: Tathagata Das
Date: 2015-07-31 12:27
To: Ted Yu
CC: bit1129@163.com; user
Subject: Re: How RDD lineage works
You have to read the original Spark paper to understand how RDD lineage works. 
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu <yu...@gmail.com> wrote:
Please take a look at:
core/src/test/scala/org/apache/spark/CheckpointSuite.scala

Cheers

On Thu, Jul 30, 2015 at 7:39 PM, bit1129@163.com <bi...@163.com> wrote:
Hi,

I don't get a good understanding how RDD lineage works, so I would ask whether spark provides a unit test in the code base to illustrate how RDD lineage works.
If there is, What's the class name is it? 
Thanks!



bit1129@163.com



Re: How RDD lineage works

Posted by Tathagata Das <ta...@gmail.com>.
You have to read the original Spark paper to understand how RDD lineage
works.
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look at:
> core/src/test/scala/org/apache/spark/CheckpointSuite.scala
>
> Cheers
>
> On Thu, Jul 30, 2015 at 7:39 PM, bit1129@163.com <bi...@163.com> wrote:
>
>> Hi,
>>
>> I don't get a good understanding how RDD lineage works, so I would ask
>> whether spark provides a unit test in the code base to illustrate how RDD
>> lineage works.
>> If there is, What's the class name is it?
>> Thanks!
>>
>> ------------------------------
>> bit1129@163.com
>>
>
>

Re: How RDD lineage works

Posted by Ted Yu <yu...@gmail.com>.
Please take a look at:
core/src/test/scala/org/apache/spark/CheckpointSuite.scala

Cheers

On Thu, Jul 30, 2015 at 7:39 PM, bit1129@163.com <bi...@163.com> wrote:

> Hi,
>
> I don't get a good understanding how RDD lineage works, so I would ask
> whether spark provides a unit test in the code base to illustrate how RDD
> lineage works.
> If there is, What's the class name is it?
> Thanks!
>
> ------------------------------
> bit1129@163.com
>