You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jörn Franke <jo...@gmail.com> on 2017/03/23 09:55:48 UTC

Re: Persist RDD doubt

What do you mean by clear ? What is the use case?

> On 23 Mar 2017, at 10:16, nayan sharma <na...@gmail.com> wrote:
> 
> Does Spark clears the persisted RDD in case if the task fails ?
> 
> Regards,
> 
> Nayan

Re: Persist RDD doubt

Posted by sjayatheertha <sj...@gmail.com>.

Spark’s cache is fault-tolerant – if any partition of an RDD is lost, it will automatically be recomputed using the transformations that originally created it.




> On Mar 23, 2017, at 4:11 AM, nayan sharma <na...@gmail.com> wrote:
> 
> In case of task failures,does spark clear the persisted RDD (StorageLevel.MEMORY_ONLY_SER) and recompute them again when the task is attempted to start from beginning. Or will the cached RDD be appended.
> 
> How does spark checks whether the RDD has been cached and skips the caching step for a particular task.
> 
>> On 23-Mar-2017, at 3:36 PM, Artur R <ar...@gpnxgroup.com> wrote:
>> 
>> I am not pretty sure, but:
>>  - if RDD persisted in memory then on task fail executor JVM process fails too, so the memory is released
>>  - if RDD persisted on disk then on task fail Spark shutdown hook just wipes temp files
>> 
>>> On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke <jo...@gmail.com> wrote:
>>> What do you mean by clear ? What is the use case?
>>> 
>>>> On 23 Mar 2017, at 10:16, nayan sharma <na...@gmail.com> wrote:
>>>> 
>>>> Does Spark clears the persisted RDD in case if the task fails ?
>>>> 
>>>> Regards,
>>>> 
>>>> Nayan
>> 
> 

Re: Persist RDD doubt

Posted by nayan sharma <na...@gmail.com>.
In case of task failures,does spark clear the persisted RDD (StorageLevel.MEMORY_ONLY_SER) and recompute them again when the task is attempted to start from beginning. Or will the cached RDD be appended.

How does spark checks whether the RDD has been cached and skips the caching step for a particular task.

> On 23-Mar-2017, at 3:36 PM, Artur R <ar...@gpnxgroup.com> wrote:
> 
> I am not pretty sure, but:
>  - if RDD persisted in memory then on task fail executor JVM process fails too, so the memory is released
>  - if RDD persisted on disk then on task fail Spark shutdown hook just wipes temp files
> 
> On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke <jornfranke@gmail.com <ma...@gmail.com>> wrote:
> What do you mean by clear ? What is the use case?
> 
> On 23 Mar 2017, at 10:16, nayan sharma <nayansharma13@gmail.com <ma...@gmail.com>> wrote:
> 
>> Does Spark clears the persisted RDD in case if the task fails ?
>> 
>> Regards,
>> 
>> Nayan
> 


Re: Persist RDD doubt

Posted by Artur R <ar...@gpnxgroup.com>.
I am not pretty sure, but:
 - if RDD persisted in memory then on task fail executor JVM process fails
too, so the memory is released
 - if RDD persisted on disk then on task fail Spark shutdown hook just
wipes temp files

On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke <jo...@gmail.com> wrote:

> What do you mean by clear ? What is the use case?
>
> On 23 Mar 2017, at 10:16, nayan sharma <na...@gmail.com> wrote:
>
> Does Spark clears the persisted RDD in case if the task fails ?
>
> Regards,
> Nayan
>
>