You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashwin Sai Shankar <as...@netflix.com.INVALID> on 2017/03/27 19:38:52 UTC

Spark shuffle files

Hi!

In spark on yarn, when are shuffle files on local disk removed? (Is it when
the app completes or
once all the shuffle files are fetched or end of the stage?)

Thanks,
Ashwin

Re: Spark shuffle files

Posted by Mark Hamstra <ma...@clearstorydata.com>.
When the RDD using them goes out of scope.

On Mon, Mar 27, 2017 at 3:13 PM, Ashwin Sai Shankar <as...@netflix.com>
wrote:

> Thanks Mark! follow up question, do you know when shuffle files are
> usually un-referenced?
>
> On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> Shuffle files are cleaned when they are no longer referenced. See
>> https://github.com/apache/spark/blob/master/core/src/mai
>> n/scala/org/apache/spark/ContextCleaner.scala
>>
>> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
>> ashankar@netflix.com.invalid> wrote:
>>
>>> Hi!
>>>
>>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>>> when the app completes or
>>> once all the shuffle files are fetched or end of the stage?)
>>>
>>> Thanks,
>>> Ashwin
>>>
>>
>>
>

Re: Spark shuffle files

Posted by Ashwin Sai Shankar <as...@netflix.com.INVALID>.
Thanks Mark! follow up question, do you know when shuffle files are usually
un-referenced?

On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> Shuffle files are cleaned when they are no longer referenced. See
> https://github.com/apache/spark/blob/master/core/src/
> main/scala/org/apache/spark/ContextCleaner.scala
>
> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
> ashankar@netflix.com.invalid> wrote:
>
>> Hi!
>>
>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>> when the app completes or
>> once all the shuffle files are fetched or end of the stage?)
>>
>> Thanks,
>> Ashwin
>>
>
>

Re: Spark shuffle files

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Shuffle files are cleaned when they are no longer referenced. See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ContextCleaner.scala

On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
ashankar@netflix.com.invalid> wrote:

> Hi!
>
> In spark on yarn, when are shuffle files on local disk removed? (Is it
> when the app completes or
> once all the shuffle files are fetched or end of the stage?)
>
> Thanks,
> Ashwin
>