You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/07/03 01:00:20 UTC
[jira] [Commented] (MAPREDUCE-5317) Stale files left behind for failed jobs

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698366#comment-13698366 ] 

Jason Lowe commented on MAPREDUCE-5317:
---------------------------------------

Thanks for the update, Ravi.  Are we pushing the JOB_WAIT_TIMEOUT to another JIRA?  I didn't see that addressed.  A few more comments:

* Why does FAIL_WAIT ignore the JOB_COMMIT_COMPLETED/JOB_COMMIT_FAILED events?  I don't see how those events could arrive in this state, as it would require the committer to have been invoked sometime before entering this state.  Maybe I'm missing a scenario where that does occur?  KILL_WAIT doesn't do this, for example, so it seems we should either not need this in FAIL_WAIT or KILL_WAIT also needs it.
* In the testcase, it's using AsyncDispatcher yet checking immediately after handling an event that the committer has not been invoked.  This is inherently racy due to the nature of AsyncDispatcher.  Couple of options to fix it:
** Use InlineDispatcher or DrainDispatcher and call drain() (the latter is still technically a bit racy but the window is much smaller)
** Rather than checking the committer directly, spy/mock the event handler and verify after the event was handled that we didn't try to dispatch a committer event
* Nit: rather than explicitly waiting a hardcoded duration in the test case, we might be able to use verify with a timeout so we don't have to wait the full duration under normal test conditions.
                
> Stale files left behind for failed jobs
> ---------------------------------------
>
>                 Key: MAPREDUCE-5317
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5317
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.8
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: MAPREDUCE-5317.branch-0.23.patch, MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, MAPREDUCE-5317.patch, MAPREDUCE-5317.patch
>
>
> Courtesy [~amar_kamat]!
> {quote}
> We are seeing _temporary files left behind in the output folder if the job
> fails.
> The job were failed due to hitting quota issue.
> I simply ran the randomwriter (from hadoop examples) with the default setting.
> That failed and left behind some stray files.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira