You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashwin Sai Shankar <as...@netflix.com.INVALID> on 2017/03/27 19:38:52 UTC
Spark shuffle files
Hi!
In spark on yarn, when are shuffle files on local disk removed? (Is it when
the app completes or
once all the shuffle files are fetched or end of the stage?)
Thanks,
Ashwin
Re: Spark shuffle files
Posted by Mark Hamstra <ma...@clearstorydata.com>.
When the RDD using them goes out of scope.
On Mon, Mar 27, 2017 at 3:13 PM, Ashwin Sai Shankar <as...@netflix.com>
wrote:
> Thanks Mark! follow up question, do you know when shuffle files are
> usually un-referenced?
>
> On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> Shuffle files are cleaned when they are no longer referenced. See
>> https://github.com/apache/spark/blob/master/core/src/mai
>> n/scala/org/apache/spark/ContextCleaner.scala
>>
>> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
>> ashankar@netflix.com.invalid> wrote:
>>
>>> Hi!
>>>
>>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>>> when the app completes or
>>> once all the shuffle files are fetched or end of the stage?)
>>>
>>> Thanks,
>>> Ashwin
>>>
>>
>>
>
Re: Spark shuffle files
Posted by Ashwin Sai Shankar <as...@netflix.com.INVALID>.
Thanks Mark! follow up question, do you know when shuffle files are usually
un-referenced?
On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:
> Shuffle files are cleaned when they are no longer referenced. See
> https://github.com/apache/spark/blob/master/core/src/
> main/scala/org/apache/spark/ContextCleaner.scala
>
> On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
> ashankar@netflix.com.invalid> wrote:
>
>> Hi!
>>
>> In spark on yarn, when are shuffle files on local disk removed? (Is it
>> when the app completes or
>> once all the shuffle files are fetched or end of the stage?)
>>
>> Thanks,
>> Ashwin
>>
>
>
Re: Spark shuffle files
Posted by Mark Hamstra <ma...@clearstorydata.com>.
Shuffle files are cleaned when they are no longer referenced. See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ContextCleaner.scala
On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
ashankar@netflix.com.invalid> wrote:
> Hi!
>
> In spark on yarn, when are shuffle files on local disk removed? (Is it
> when the app completes or
> once all the shuffle files are fetched or end of the stage?)
>
> Thanks,
> Ashwin
>