You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Omkar Vinit Joshi (JIRA)" <ji...@apache.org> on 2013/04/08 23:21:16 UTC

[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

    [ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625829#comment-13625829 ] 

Omkar Vinit Joshi commented on YARN-112:
----------------------------------------

I am rebasing the patch as yarn-467 is committed and yarn-99 is updated.
                
> Race in localization can cause containers to fail
> -------------------------------------------------
>
>                 Key: YARN-112
>                 URL: https://issues.apache.org/jira/browse/YARN-112
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Jason Lowe
>            Assignee: Omkar Vinit Joshi
>         Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112-20130408.patch, yarn-112.20131503.patch
>
>
> On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node.  It appears they both tried to localize job.jar and job.xml at the same time.  One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty.  Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira