You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2016/09/28 20:28:20 UTC

[jira] [Comment Edited] (TEZ-3362) Delete intermediate data at DAG level for Shuffle Handler

    [ https://issues.apache.org/jira/browse/TEZ-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530780#comment-15530780 ] 

Hitesh Shah edited comment on TEZ-3362 at 9/28/16 8:27 PM:
-----------------------------------------------------------

2 main concerns with the current impl:
  
1) Current impl seems to be a blocking call. This can potentiallyblock submission of a new dag for a long time if any node is not reachable or takes a long time to respond ( or if there are too many nodes to cycle through). The cleanup should happen asynchronously.
2) The cleanup code probably should not be in DAGAppMaster.  Wouldn't the right place be something like the AMNodeTracker or the appropriate service plugin be the right layer to hand off this cleanup action? Also please consider a potential impl of how vertex level data could be cleaned up in terms of where this code should ideally belong. 




was (Author: hitesh):
2 main concerns with the current impl:
  
1) Current impl seems to be a blocking call. This can potentiallyblock submission of a new dag for a long time if any node is not reachable or takes a long time to respond ( or if there are too many nodes to cycle through). The cleanup should happen asynchronously.
2) The cleanup code probably should not be in DAGAppMaster.  Wouldn't the right place be something like the AMNodeTracker or the appropriate service plugin be the right layer to hand off this cleanup action? 



> Delete intermediate data at DAG level for Shuffle Handler
> ---------------------------------------------------------
>
>                 Key: TEZ-3362
>                 URL: https://issues.apache.org/jira/browse/TEZ-3362
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jonathan Eagles
>            Assignee: Kuhu Shukla
>         Attachments: TEZ-3362.001.patch, TEZ-3362.002.patch, TEZ-3362.003.patch
>
>
> Applications like hive that use tez in session mode need the ability to delete intermediate data after a DAG completes and while the application continues to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)