You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by brettplarson <br...@gmail.com> on 2021/01/06 17:14:54 UTC

Impact of .localCheckpoint() and executor dying

Hello,
I am wondering what the impact of using .localCheckpoint() and having the
executor die would be? 

My understanding is that .localCheckpoint() breaks the lineage of the RDD
and this requires that the entire RDD to be rebuild instead of being able to
recompute lost partitions.

Does each executor store a copy of the entire RDD?

It's unclear to me the benefit of using Checkpoint over .localCheckpoint. (I
am aware that this is HDFS backed, but it's unclear the implications of
this)

Please let me know,
Thank you!




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Impact of .localCheckpoint() and executor dying

Posted by Jacek Laskowski <ja...@japila.pl>.

Hi,

> impact of an executor dying after a localCheckpoint is taken.

My memory is a bit vague on this, but I'd not be surprised if this
localCheckpoint-ed RDD would be "broken" and any actions would simply throw
an exception like missing partitions or similar. There's no way back.

I wish myself that someone with more skills in this area chimed in...

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Wed, Jan 6, 2021 at 8:30 PM Brett Larson <br...@gmail.com>
wrote:

> Jacek,
> Thanks for your response, I am still trying to understand the impact of an
> executor dying after a localCheckpoint is taken.
>
> Would the entire spark application fail in this case due to the broken
> lineage? Or would the jobs associated with that executor need to be
> re-computed from scratch?
>
> Thank you!
>
>
> On Wed, Jan 6, 2021 at 1:09 PM Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> > My understanding is that .localCheckpoint() breaks the lineage of the
>> RDD
>>
>> True.
>>
>> > and this requires that the entire RDD to be rebuild instead of being
>> able to recompute lost partitions.
>>
>> In a sense, it's as if you saved the partitions to executors and re-read
>> them back as source data (for this checkpointed RDD).
>>
>> > Does each executor store a copy of the entire RDD?
>>
>> No. An executor has got only the data of the partitions (for the tasks
>> this executor has executed).
>>
>> > Checkpoint over .localCheckpoint.
>>
>> checkpoint is similar to localCheckpoint, but slower and reliable (as
>> it's on a stable HDFS file system not on an ephemeral executor). In either
>> case, the lineage should be the same = cut.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Wed, Jan 6, 2021 at 6:15 PM brettplarson <br...@gmail.com>
>> wrote:
>>
>>> Hello,
>>> I am wondering what the impact of using .localCheckpoint() and having the
>>> executor die would be?
>>>
>>> My understanding is that .localCheckpoint() breaks the lineage of the RDD
>>> and this requires that the entire RDD to be rebuild instead of being
>>> able to
>>> recompute lost partitions.
>>>
>>> Does each executor store a copy of the entire RDD?
>>>
>>> It's unclear to me the benefit of using Checkpoint over
>>> .localCheckpoint. (I
>>> am aware that this is HDFS backed, but it's unclear the implications of
>>> this)
>>>
>>> Please let me know,
>>> Thank you!
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>
> --
> *Brett Larson *
> brettpatricklarson@gmail.com / 847321200
>

Re: Impact of .localCheckpoint() and executor dying

Posted by Brett Larson <br...@gmail.com>.

Jacek,
Thanks for your response, I am still trying to understand the impact of an
executor dying after a localCheckpoint is taken.

Would the entire spark application fail in this case due to the broken
lineage? Or would the jobs associated with that executor need to be
re-computed from scratch?

Thank you!


On Wed, Jan 6, 2021 at 1:09 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> > My understanding is that .localCheckpoint() breaks the lineage of the RDD
>
> True.
>
> > and this requires that the entire RDD to be rebuild instead of being
> able to recompute lost partitions.
>
> In a sense, it's as if you saved the partitions to executors and re-read
> them back as source data (for this checkpointed RDD).
>
> > Does each executor store a copy of the entire RDD?
>
> No. An executor has got only the data of the partitions (for the tasks
> this executor has executed).
>
> > Checkpoint over .localCheckpoint.
>
> checkpoint is similar to localCheckpoint, but slower and reliable (as it's
> on a stable HDFS file system not on an ephemeral executor). In either case,
> the lineage should be the same = cut.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Wed, Jan 6, 2021 at 6:15 PM brettplarson <br...@gmail.com>
> wrote:
>
>> Hello,
>> I am wondering what the impact of using .localCheckpoint() and having the
>> executor die would be?
>>
>> My understanding is that .localCheckpoint() breaks the lineage of the RDD
>> and this requires that the entire RDD to be rebuild instead of being able
>> to
>> recompute lost partitions.
>>
>> Does each executor store a copy of the entire RDD?
>>
>> It's unclear to me the benefit of using Checkpoint over .localCheckpoint.
>> (I
>> am aware that this is HDFS backed, but it's unclear the implications of
>> this)
>>
>> Please let me know,
>> Thank you!
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

-- 
*Brett Larson *
brettpatricklarson@gmail.com / 847321200

Re: Impact of .localCheckpoint() and executor dying

Posted by Jacek Laskowski <ja...@japila.pl>.

Hi,

> My understanding is that .localCheckpoint() breaks the lineage of the RDD

True.

> and this requires that the entire RDD to be rebuild instead of being able
to recompute lost partitions.

In a sense, it's as if you saved the partitions to executors and re-read
them back as source data (for this checkpointed RDD).

> Does each executor store a copy of the entire RDD?

No. An executor has got only the data of the partitions (for the tasks this
executor has executed).

> Checkpoint over .localCheckpoint.

checkpoint is similar to localCheckpoint, but slower and reliable (as it's
on a stable HDFS file system not on an ephemeral executor). In either case,
the lineage should be the same = cut.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Wed, Jan 6, 2021 at 6:15 PM brettplarson <br...@gmail.com>
wrote:

> Hello,
> I am wondering what the impact of using .localCheckpoint() and having the
> executor die would be?
>
> My understanding is that .localCheckpoint() breaks the lineage of the RDD
> and this requires that the entire RDD to be rebuild instead of being able
> to
> recompute lost partitions.
>
> Does each executor store a copy of the entire RDD?
>
> It's unclear to me the benefit of using Checkpoint over .localCheckpoint.
> (I
> am aware that this is HDFS backed, but it's unclear the implications of
> this)
>
> Please let me know,
> Thank you!
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>