You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ravi Gummadi (Created) (JIRA)" <ji...@apache.org> on 2011/12/08 06:03:40 UTC

[jira] [Created] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Deadlock in LocalDirsHandlerService and ShuffleHandler
------------------------------------------------------

                 Key: MAPREDUCE-3519
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 0.23.1, 0.24.0
            Reporter: Ravi Gummadi
            Assignee: Ravi Gummadi
             Fix For: 0.23.1


MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166896#comment-13166896 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #923 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/923/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212680
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166671#comment-13166671 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Common-0.23-Commit #266 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/266/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.
svn merge -c 1212680 --ignore-ancestry ../../trunk

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212681
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Robert Joseph Evans (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans updated MAPREDUCE-3519:
-------------------------------------------

            Priority: Blocker  (was: Major)
    Target Version/s: 0.23.1, 0.24.0
    
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165258#comment-13165258 ] 

Robert Joseph Evans commented on MAPREDUCE-3519:
------------------------------------------------

We have seen one other deadlock in the past when trying to lock the Configuration object.  The other one I am thinking of was specifically withing Configuration itself when trying to write out the configuration to a stream and the reader of the stream in the same process was trying to read in a configuration value.  That was also fixed by making sure that the configuration objects were separate instances.

In yarn with the message passing framework and how initialization happens automatically we are moving much more towards having a single global Configuration object shared by all parts of the system.  I think it is time for us to look at moving to read/write locks on the Configuration class.  This will not solve all cases of deadlocks, because we could still have a writer holding the lock and someone else trying to read or write holding a different lock. But both of the situations I have seen so far are two readers deadlocked on reads. No writes at all. With read write locks the reads would both be able to finish successfully without blocking.  This would also have the advantage of a potential performance boost for anyone using Configuration.  It would break backwards compatibility for anyone who is doing what we see here and synchronizing on configuration so it would have to be part of the release notes and we would have to inform downstream projects.  But I think that the benefits outweigh the drawbacks.  Especially for YARN where we are already changing a lot. 
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166324#comment-13166324 ] 

Ravi Gummadi commented on MAPREDUCE-3519:
-----------------------------------------

OK. I created HADOOP-7900 for the issue of 2 calls to conf.get() in confChanged(). Uploaded a patch there with the suggested code change.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166218#comment-13166218 ] 

Robert Joseph Evans commented on MAPREDUCE-3519:
------------------------------------------------

I like the patch.  I also traced down from where the synchronized call is happening looking for a potential race condition between reads and writes.  The only place I found that uses conf from LocalDirsHandlerService.getLocalPathForWrite is LocalDirAllocator.confChanged.  Inside that code we access conf three times, once to get the local file system, so I don't see that as being an issue.  But we call it twice to get the value of the contextConfigItemName.  The first to see if the value has changed, and the second to turn that value into an array.  It would be better to change the code so that it only calls conf.get once for that value, and then to turn it into a trimmed array of Strings call StringUtils.getTrimmedStrings, which is what the second method does internally. 
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166859#comment-13166859 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #121 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/121/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.
svn merge -c 1212680 --ignore-ancestry ../../trunk

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212681
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164994#comment-13164994 ] 

Ravi Gummadi commented on MAPREDUCE-3519:
-----------------------------------------

Here is one deadlock that is observed:


Found one Java-level deadlock:
=============================
"ContainersLauncher #32":
  waiting to lock monitor 0x08e84674 (object 0xb5bec810, a org.apache.hadoop.conf.Configuration),
  which is held by "ContainersLauncher #27"
"ContainersLauncher #27":
  waiting to lock monitor 0x08ea65b0 (object 0xb5cd2898, a org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext),
  which is held by "New I/O server worker #1-3"
"New I/O server worker #1-3":
  waiting to lock monitor 0x08e84674 (object 0xb5bec810, a org.apache.hadoop.conf.Configuration),
  which is held by "ContainersLauncher #27"

Java stack information for the threads listed above:
===================================================
"ContainersLauncher #32":
        at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLogPathForWrite(LocalDirsHandlerService.java:283)
        - waiting to lock <0xb5bec810> (a org.apache.hadoop.conf.Configuration)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:126)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
"ContainersLauncher #27":
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:331)
        - waiting to lock <0xb5cd2898> (a org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
        at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:262)
        - locked <0xb5bec810> (a org.apache.hadoop.conf.Configuration)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:150)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
"New I/O server worker #1-3":
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1373)
        - waiting to lock <0xb5bec810> (a org.apache.hadoop.conf.Configuration)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:569)
        at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:586)
        at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:730)
        at org.apache.hadoop.fs.FileSystem.getDefaultBlockSize(FileSystem.java:1805)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:429)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1069)
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:425)
        - locked <0xb5cd2898> (a org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
        at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:464)
        at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
        at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
        at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
        at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

Found 1 deadlock.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166437#comment-13166437 ] 

Robert Joseph Evans commented on MAPREDUCE-3519:
------------------------------------------------

I am +1 on this Thanks for the quick fix.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravi Gummadi updated MAPREDUCE-3519:
------------------------------------

    Attachment: 3519.patch

Attaching patch with the fix.

With this patch, nobody else other than LocalDirsHandlerService can access the configuration-which-gets-updated-based-on-disks-health.

Please review and provide your comments.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166701#comment-13166701 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #276 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/276/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.
svn merge -c 1212680 --ignore-ancestry ../../trunk

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212681
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165293#comment-13165293 ] 

Ravi Gummadi commented on MAPREDUCE-3519:
-----------------------------------------

As reads of Configuration object happen often and writes to Configuration object don't happen often, having separate locks for read and write should improve things a lot in terms of performance.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3519:
-----------------------------------------------

          Resolution: Fixed
    Target Version/s: 0.23.1, 0.24.0  (was: 0.24.0, 0.23.1)
        Release Note: Fixed a deadlock in NodeManager LocalDirectories's handling service.
        Hadoop Flags: Reviewed
              Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch-0.23. Thanks Ravi!
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166065#comment-13166065 ] 

Ravi Gummadi commented on MAPREDUCE-3519:
-----------------------------------------

Changing Configuration.java to have a ReadWriteLock and affecting everybody (not just hadoop-mapreduce-project) seems to be complex and needs more thinking in terms of dependencies because almost every class in hadoop uses Configuration. We can move to this model of ReadWriteLock in  the long run (if possible).

 In the short term, I don't see a simple way to handle this deadlock without removing the synchronized(conf) blocks in the methods in LocalDirsHandlerService.java.

Further, I don't see an issue/problem in removing these synchronized(conf) blocks in LocalDirsHandlerService.java. Everywhere else in hadoop source code there are no such synchronization blocks when reading from OR writing to configuration object(s). As the methods in Configuration.java like getProps() are already synchronized and the actual setting of a property using Configuration.properties.setProperty() and getting the value of a property using Configuration.properties.getProperty() seem to be atomic, I don't see an issue in removing these synchronized(conf) blocks from LocalDirsHandlerService.java.

Please provide your comments/suggestions.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166722#comment-13166722 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1416 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1416/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212680
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165016#comment-13165016 ] 

Ravi Gummadi commented on MAPREDUCE-3519:
-----------------------------------------

The patch attached may not solve the issue as the ShuffleHandler seems to be getting access to updated conf because of FileSystem object (as FileSystem object is saving the config and reusing it). So whoever creates the local FileSystem object first, that config will be reused for other accesses to local FileSystem and thus ShuffleHandler is able to "try to get lock for the cloned conf".

Investigating further...
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166322#comment-13166322 ] 

Robert Joseph Evans commented on MAPREDUCE-3519:
------------------------------------------------

I moved this issue to blocker because we have seen this issue show up in our tests and it is blocking us from running tests because it stops some of the tests from completing.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166649#comment-13166649 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3519:
----------------------------------------------------

Configuration seems to have a host of other synchronization problems, we are either lucky to not hit them or plain ignorant.

Looked at the latest patch. This is fine especially in the light of [Ravi's comment|https://issues.apache.org/jira/browse/MAPREDUCE-3519?focusedCommentId=13166065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13166065] above.

+1.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166346#comment-13166346 ] 

Hadoop QA commented on MAPREDUCE-3519:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12506731/3519.v1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1415//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1415//console

This message is automatically generated.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravi Gummadi updated MAPREDUCE-3519:
------------------------------------

    Attachment: deadlock.txt

Attaching the deadlock details with better formatting as earlier comment seems a little complex to understand.
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166675#comment-13166675 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #1394 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1394/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212680
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166850#comment-13166850 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #103 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/103/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.
svn merge -c 1212680 --ignore-ancestry ../../trunk

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212681
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166884#comment-13166884 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #890 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/890/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212680
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166666#comment-13166666 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1468 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1468/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212680
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166664#comment-13166664 ] 

Hudson commented on MAPREDUCE-3519:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #256 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/256/])
    MAPREDUCE-3519. Fixed a deadlock in NodeManager LocalDirectories's handling service. Contributed by Ravi Gummadi.
svn merge -c 1212680 --ignore-ancestry ../../trunk

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212681
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java

                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravi Gummadi updated MAPREDUCE-3519:
------------------------------------

    Attachment: 3519.v1.patch

Attaching patch by removing the synchronized(conf) blocks from LocalDirsHandlerService.java
                
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravi Gummadi updated MAPREDUCE-3519:
------------------------------------

    Status: Patch Available  (was: Open)
    
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3519:
-----------------------------------------------

    Issue Type: Sub-task  (was: Bug)
        Parent: MAPREDUCE-3121
    
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, 3519.v1.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Ravi Gummadi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravi Gummadi updated MAPREDUCE-3519:
------------------------------------

    Description: MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.  (was: MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.)
    
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is used in local FileSystem object creation in LocalDirAllocator.AllocatorPerContext and the same FileSystem object is used in ShuffleHandler.Shuffle.localDirAllocator, this is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3519) Deadlock in LocalDirsHandlerService and ShuffleHandler

Posted by "Mahadev konar (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated MAPREDUCE-3519:
-------------------------------------

    Component/s: mrv2
    
> Deadlock in LocalDirsHandlerService and ShuffleHandler
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3519
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3519
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Ravi Gummadi
>            Assignee: Ravi Gummadi
>             Fix For: 0.23.1
>
>         Attachments: 3519.patch, deadlock.txt
>
>
> MAPREDUCE-3121 cloned Configuration object in LocalDirsHandlerService.init() to avoid others to access that configuration object. But since it is set in the base class of LocalDirsHandlerService using super.init(conf), it is reflected and is accessible to some other services. This is causing a deadlock when accessing this configuration object from LocalDirsHandlerService and ShuffleHandler along with AllocatorPerContext object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira