You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2012/04/23 23:24:34 UTC

[jira] [Commented] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259968#comment-13259968 ] 

Siddharth Seth commented on MAPREDUCE-3899:
-------------------------------------------

Bikas, I'm not sure that the locking isn't needed. It's broken the way it is rightnow though.
ApplicationMasterService seems to be written to handle one bad request (caused by network issues etc). allocate() does more than a scheduler.allocate - node state changes, newly allocated containers etc. All of that needs to be part of a single response. Without the lock - that may not happen. Locking on the AppAttemptId or the AppId itself is probably a better option, and also needs to cover the retrieval of the last response.
Also, we shouldn't be logging in the allocate call (not at INFO level anyway). That'll flood the RM logs.
                
> Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3899
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3899-branch-0.23.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira