You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2018/01/04 22:17:00 UTC

[jira] [Commented] (TEZ-3877) Delete spill files once merge is done

    [ https://issues.apache.org/jira/browse/TEZ-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312113#comment-16312113 ] 

Jason Lowe commented on TEZ-3877:
---------------------------------

I believe spill files are being deleted once the merge is completed.  The sorters create DiskSegment objects to track input segments that are on disk (i.e.: either inputs fetched directly to disk or spill files from prior merges).  The close method for those segments deletes the file unless explicitly marked as a disk segment that should be preserved (e.g.: a disk local fetch).  So when the merger finishes consuming a segment and closes it, the spill files should be deleted at that point.  The last merge is done on-the-fly into the user code, so there could be up to io.sort.factor spill files lying around while the task is running the processor stage.


> Delete spill files once merge is done
> -------------------------------------
>
>                 Key: TEZ-3877
>                 URL: https://issues.apache.org/jira/browse/TEZ-3877
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>
>   I see that spill files are not deleted right after merge completes. We should do that as it takes up a lot of space and we can't afford that wastage when Tez takes up a lot of shuffle space with complex DAGs. [~jlowe] told me they are only cleaned up after application completes as they are written in app directory and not container directory. That also has to be done so that they are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)