You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by David Thomas <dt...@gmail.com> on 2014/04/02 20:27:55 UTC

Resilient nature of RDD

Can someone explain how RDD is resilient? If one of the partition is lost,
who is responsible to recreate that partition - is it the driver program?

Re: Resilient nature of RDD

Posted by Andrew Or <an...@databricks.com>.

It all begins with calling rdd.iterator, which calls
rdd.computeOrReadCheckpoint(). This materializes the RDD if it's not
already materialized, or reads a previously checkpointed version if it is.
See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L216

On Thu, Apr 3, 2014 at 8:44 PM, David Thomas <dt...@gmail.com> wrote:

> I'm trying to understand the Spark soure code. Could you please point me
> to the code where the compute() function of RDD is called. Is that called
> by the workers?
>
>
> On Wed, Apr 2, 2014 at 5:36 PM, Patrick Wendell <pw...@gmail.com>wrote:
>
>> The driver stores the meta-data associated with the partition, but the
>> re-computation will occur on an executor. So if several partitions are
>> lost, e.g. due to a few machines failing, the re-computation can be striped
>> across the cluster making it fast.
>>
>>
>> On Wed, Apr 2, 2014 at 11:27 AM, David Thomas <dt...@gmail.com>wrote:
>>
>>> Can someone explain how RDD is resilient? If one of the partition is
>>> lost, who is responsible to recreate that partition - is it the driver
>>> program?
>>>
>>
>>
>

Re: Resilient nature of RDD

Posted by David Thomas <dt...@gmail.com>.

I'm trying to understand the Spark soure code. Could you please point me to
the code where the compute() function of RDD is called. Is that called by
the workers?

On Wed, Apr 2, 2014 at 5:36 PM, Patrick Wendell <pw...@gmail.com> wrote:

> The driver stores the meta-data associated with the partition, but the
> re-computation will occur on an executor. So if several partitions are
> lost, e.g. due to a few machines failing, the re-computation can be striped
> across the cluster making it fast.
>
>
> On Wed, Apr 2, 2014 at 11:27 AM, David Thomas <dt...@gmail.com> wrote:
>
>> Can someone explain how RDD is resilient? If one of the partition is
>> lost, who is responsible to recreate that partition - is it the driver
>> program?
>>
>
>

Re: Resilient nature of RDD

Posted by Patrick Wendell <pw...@gmail.com>.

The driver stores the meta-data associated with the partition, but the
re-computation will occur on an executor. So if several partitions are
lost, e.g. due to a few machines failing, the re-computation can be striped
across the cluster making it fast.

On Wed, Apr 2, 2014 at 11:27 AM, David Thomas <dt...@gmail.com> wrote:

> Can someone explain how RDD is resilient? If one of the partition is lost,
> who is responsible to recreate that partition - is it the driver program?
>