You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Bikas Saha (Created) (JIRA)" <ji...@apache.org> on 2012/02/23 00:19:50 UTC

[jira] [Created] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
-------------------------------------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-3899
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2, resourcemanager
    Affects Versions: 0.23.0
            Reporter: Bikas Saha
            Assignee: Bikas Saha
             Fix For: 0.23.0


Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

Posted by "Siddharth Seth (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259968#comment-13259968 ] 

Siddharth Seth commented on MAPREDUCE-3899:
-------------------------------------------

Bikas, I'm not sure that the locking isn't needed. It's broken the way it is rightnow though.
ApplicationMasterService seems to be written to handle one bad request (caused by network issues etc). allocate() does more than a scheduler.allocate - node state changes, newly allocated containers etc. All of that needs to be part of a single response. Without the lock - that may not happen. Locking on the AppAttemptId or the AppId itself is probably a better option, and also needs to cover the retrieval of the last response.
Also, we shouldn't be logging in the allocate call (not at INFO level anyway). That'll flood the RM logs.
                
> Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3899
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3899-branch-0.23.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

Posted by "Bikas Saha (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikas Saha updated MAPREDUCE-3899:
----------------------------------

    Attachment: MAPREDUCE-3899-branch-0.23.patch

1) removed the lock as its not needed
2) added comments for when it might be needed
3) refactored appAttemptId -> applicationAttemptId to make it consistent across the file
4) added a couple of logs to make it consistent across the processing
5) a few Eclipse code style changes
                
> Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3899
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3899-branch-0.23.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikas Saha updated MAPREDUCE-3899:
----------------------------------

    Status: Open  (was: Patch Available)
    
> Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3899
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3899-branch-0.23.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

Posted by "Bikas Saha (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikas Saha updated MAPREDUCE-3899:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3899
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3899-branch-0.23.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260015#comment-13260015 ] 

Bikas Saha commented on MAPREDUCE-3899:
---------------------------------------

Yeah. Those complications have left this jira dead for sometime. I should have cancelled the patch long ago. Sorry about that. If you have time could you look at MAPREDUCE-3921 instead :P
                
> Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3899
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3899-branch-0.23.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3899) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215867#comment-13215867 ] 

Hadoop QA commented on MAPREDUCE-3899:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515864/MAPREDUCE-3899-branch-0.23.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1925//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1925//console

This message is automatically generated.
                
> Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3899
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3899
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3899-branch-0.23.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira