You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kihwal Lee (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 21:30:57 UTC

[jira] [Created] (HADOOP-8050) Deadlock in metrics

Deadlock in metrics
-------------------

                 Key: HADOOP-8050
                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
             Project: Hadoop Common
          Issue Type: Bug
          Components: metrics
    Affects Versions: 1.0.0, 0.20.205.0, 0.20.204.0
            Reporter: Kihwal Lee
             Fix For: 1.1.0, 1.0.1


The metrics serving thread and the periodic snapshot thread can deadlock.
It happened a few times on one of namenode we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Luke Lu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206704#comment-13206704 ] 

Luke Lu commented on HADOOP-8050:
---------------------------------

Ah those pesky tests :) Kihwal's latest patch looks reasonable to me besides the redundant "needUpdate" variable.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Attachment: hadoop-8050-trunk.patch.txt
                hadoop-8050-branch-1.patch.txt

Thanks for the review, Luke. I removed the unused variable in the new patches. Attaching patches for branch-1 and trunk.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Luke Lu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luke Lu updated HADOOP-8050:
----------------------------

    Affects Version/s: 0.23.0
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206966#comment-13206966 ] 

Kihwal Lee commented on HADOOP-8050:
------------------------------------

No test was added since there is no functional change.
I wish static analysis tools catch this kind of bugs.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206693#comment-13206693 ] 

Hadoop QA commented on HADOOP-8050:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514317/hadoop-8050-trunk.patch.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/588//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/588//console

This message is automatically generated.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211813#comment-13211813 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Hdfs-trunk #961 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/961/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291084)

     Result = SUCCESS
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291084
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Status: Patch Available  (was: Open)
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 1.0.0, 0.20.205.0, 0.20.204.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Description: 
The metrics serving thread and the periodic snapshot thread can deadlock.
It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

  was:
The metrics serving thread and the periodic snapshot thread can deadlock.
It happened a few times on one of namenode we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.0
>            Reporter: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211582#comment-13211582 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #565 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/565/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291081)

     Result = SUCCESS
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291081
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204828#comment-13204828 ] 

Kihwal Lee commented on HADOOP-8050:
------------------------------------

"1822485214@qtp-1598533502-1267":
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:141)
        - waiting to lock <0x00002aabae0cdb18> (a org.apache.hadoop.metrics2.impl.MetricsSourceAdapter)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1375)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(JmxMBeanServer.java:880)
        at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:183)
        at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:159)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at com.yahoo.hadoop.HadoopBouncerFilter.doFilter(HadoopBouncerFilter.java:60)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:818)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
"902074768@qtp-1598533502-432":
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$5.getMetrics(MetricsSystemImpl.java:477)
        - waiting to lock <0x00002aabae06f408> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:169)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:149)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:141)
        - locked <0x00002aabae0cdb18> (a org.apache.hadoop.metrics2.impl.MetricsSourceAdapter)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1375)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(JmxMBeanServer.java:880)
        at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:183)
        at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:159)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at com.yahoo.hadoop.HadoopBouncerFilter.doFilter(HadoopBouncerFilter.java:60)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:818)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
"Timer for 'NameNode' metrics system":
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:164)
        - waiting to lock <0x00002aabae0cdb18> (a org.apache.hadoop.metrics2.impl.MetricsSourceAdapter)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:336)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:327)
        - locked <0x00002aabae06f408> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:309)
        - locked <0x00002aabae06f408> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:296)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)

Found 1 deadlock.

--------------------------------
There is no problem for normal metrics sources, which locks the source adapter object while doing getMetrics(). But the system source locks MetricsSystemImpl on getMetrics(). 

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.0
>            Reporter: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenode we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Attachment: hadoop-8050-trunk.patch.txt
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Luke Lu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206342#comment-13206342 ] 

Luke Lu commented on HADOOP-8050:
---------------------------------

The reason of the deadlock is that the JMX serving thread has different lock order (source adapter, source (which can be metrics system)) than the snapshot thread (metrics system, source adapter). The correct fix (sans moving jmx to a sink) is not removing the lock on metrics system in the snapshot thread but fixing the lock order in MetricsSourceAdapter (to make source.getMetrics is done without holding the adapter lock).
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (HADOOP-8050) Deadlock in metrics

Posted by "Matt Foley (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley reopened HADOOP-8050:
--------------------------------


The Commit integration to both trunk and v0.23 succeeded with common and hdfs, but aborted in mapreduce.  The log records:
{code}
ivy-resolve-mapred:
Build timed out (after 45 minutes). Marking the build as aborted.
Build was aborted
Recording test results
None of the test reports contained any result
Updating HADOOP-8050
No emails were triggered.
Finished: ABORTED
{code}

This looks more like a connectivity issue with ivy than a problem with the patch, but re-opening pending investigation.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205756#comment-13205756 ] 

Hadoop QA commented on HADOOP-8050:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514152/hadoop-8050.patch.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/586//console

This message is automatically generated.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211587#comment-13211587 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1831 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1831/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291084)

     Result = SUCCESS
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291084
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206564#comment-13206564 ] 

Kihwal Lee commented on HADOOP-8050:
------------------------------------

The new patch sets publishSelfMetrics to false, which causes the snapshot to skip the system source.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Luke Lu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206336#comment-13206336 ] 

Luke Lu commented on HADOOP-8050:
---------------------------------

Always wanted to move the JMX stuff to a sink to untangle the mess and never got around to do it (maybe I'll do it with HADOOP-8061). The main reason for locking metrics system during snapshots is that people can trigger a metrics system restart (stop/start) via JMX in another thread.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Attachment: hadoop-8050-branch-1.patch.txt
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211596#comment-13211596 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #580 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/580/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291081)

     Result = ABORTED
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291081
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Luke Lu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208085#comment-13208085 ] 

Luke Lu commented on HADOOP-8050:
---------------------------------

The latest patch lgtm, +1. Thanks Kihwal. It'll be great if you can add a test case for the jmx metrics serving (the purpose is not reproduce the deadlock, but people can run something like jcarder with the unit tests and detect potential deadlocks). I'm not holding my +1 for it though.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211584#comment-13211584 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Common-0.23-Commit #578 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/578/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291081)

     Result = SUCCESS
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291081
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Attachment: hadoop-8050-branch-1.patch.txt
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208108#comment-13208108 ] 

Kihwal Lee commented on HADOOP-8050:
------------------------------------

Filed HADOOP-8073 per Luke's comment. Thanks for the review.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Luke Lu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206522#comment-13206522 ] 

Luke Lu commented on HADOOP-8050:
---------------------------------

@Matt, I'd have already attached a patch if not for my employer's patch review/approval process. For a quick fix for 1.0.1, the least risky approach would be commenting out the registerSystemSource line in MetricsSystemImpl#configureSources. The metrics system metrics was mostly for the original dev testing/debugging and not required for production. I'll review the patch :)


                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206558#comment-13206558 ] 

Kihwal Lee commented on HADOOP-8050:
------------------------------------

what if we set publishSelfMetrics to false?
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213733#comment-13213733 ] 

Kihwal Lee commented on HADOOP-8050:
------------------------------------

So this time, the mr trunk build went through okay, but 0.23 was not. I just filed MAPREDUCE-3894 to investigate the build issue. Since the issue is independent of this jira, I think we can close it.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211814#comment-13211814 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #174 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/174/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291081)

     Result = SUCCESS
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291081
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Fix Version/s: 0.23.2
                   0.24.0
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 1.1.0, 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211599#comment-13211599 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1768 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1768/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291084)

     Result = ABORTED
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291084
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Matt Foley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated HADOOP-8050:
-------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.24.0)
                       (was: 1.1.0)
           Status: Resolved  (was: Patch Available)

Committed to branch-1.0, branch-1, branch-0.23, and trunk.
Thanks, Kihwal and Luke!
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211846#comment-13211846 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #202 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/202/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291081)

     Result = FAILURE
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291081
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Robert Joseph Evans (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212747#comment-13212747 ] 

Robert Joseph Evans commented on HADOOP-8050:
---------------------------------------------

I have seen a number of other builds fail with this, or with aborted lately. The failures look like they started around Feb 10th, but I don't know for sure.  The first I saw of this, when looking through JIRA is MAPREDUCE-3852.  But JQL cannot look for "RESULT = ABORTED" in a comment.  It just pulls out everything that has result or = or ABORTED in it. 
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee reassigned HADOOP-8050:
----------------------------------

    Assignee: Kihwal Lee
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211845#comment-13211845 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Mapreduce-trunk #996 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/996/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291084)

     Result = SUCCESS
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291084
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee resolved HADOOP-8050.
--------------------------------

    Resolution: Fixed
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Matt Foley (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated HADOOP-8050:
-------------------------------

    Fix Version/s:     (was: 1.0.1)
                   1.0.2

Was committed to 1.0.2, not 1.0.1.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.23.2, 1.0.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Attachment: hadoop-8050-trunk.patch.txt
                hadoop-8050-branch-1.patch.txt
    
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kihwal Lee updated HADOOP-8050:
-------------------------------

    Attachment: hadoop-8050.patch.txt

If a lot of methods are synchronized and two classes containing them have interdependency, deadlock is likely.

The current way of locking in metrics is a little excessive. I do not believe the strict global consistency is required in processing metrics. For one, sources are not cordinating with each other (they are mostly independent), so locking the whole subsystem and taking snapshot does not add much value to the quality of data. 

This patch removes some locks around accessing the source adapter map within MetricsSystemImpl. This makes the metric snapshot only lock on each individual source adapter, one at a time, instead of the entire metrics impl.  This is safe because:

* Once sources are registered, they are not removed until shutdown(). Even shoutdown() or stop() is called rarely.

* During snapshot, the source adapter hashmap is the only data structure that needs protection.

* snapshot() is only called from the timer event handler. startTimer() makes sure that there is only one timer.

I wrapped the LinkeHashMap used for the source adapter map with Collections.synchronizedMap. This made accessing the data structure safe without holding a big coarse lock. No further synchronization between sources seem needed.

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Kihwal Lee (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206689#comment-13206689 ] 

Kihwal Lee commented on HADOOP-8050:
------------------------------------

bq. The correct fix (sans moving jmx to a sink) is not removing the lock on metrics system in the snapshot thread but fixing the lock order in MetricsSourceAdapter (to make source.getMetrics is done without holding the adapter lock).

I tried to do this in the new patch. Since updateJmxCache() doesn't block while calling getMetrics(), some may not get the latest metric data if updateJmxCache() is already being executed by another thread.

The patch passes all metrics related tests. 
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Matt Foley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206374#comment-13206374 ] 

Matt Foley commented on HADOOP-8050:
------------------------------------

Hi Luke, would you be able to submit an alternate patch per your proposal (quick fix for lock order)?  I'm trying to get a 1.0.1 build done, and it would be great to get this in.  Thanks.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206878#comment-13206878 ] 

Hadoop QA commented on HADOOP-8050:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514353/hadoop-8050-trunk.patch.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/590//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/590//console

This message is automatically generated.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206649#comment-13206649 ] 

Hadoop QA commented on HADOOP-8050:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514294/hadoop-8050-trunk.patch.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these unit tests:
                  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/587//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/587//console

This message is automatically generated.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Luke Lu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207239#comment-13207239 ] 

Luke Lu commented on HADOOP-8050:
---------------------------------

static analysis is hopeless for this case, as the compiler needs to know all the possible dynamic bindings of metrics source before hand. I recall that Todd ran jcarder (dynamic deadlock finder) on metrics2 in trunk and didn't find the issue, probably due to a lack of test coverage in the jmx metrics serving (there are some tests but we need to make sure snapshot thread happens in the tests as well), or the fact that metrics sources can be created via annotations.
                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8050) Deadlock in metrics

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211588#comment-13211588 ] 

Hudson commented on HADOOP-8050:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #1757 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1757/])
    HADOOP-8050. Deadlock in metrics. Contributed by Kihwal Lee. (Revision 1291084)

     Result = SUCCESS
mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1291084
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java

                
> Deadlock in metrics
> -------------------
>
>                 Key: HADOOP-8050
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8050
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 0.20.204.0, 0.20.205.0, 0.23.0, 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 1.0.1, 0.23.2
>
>         Attachments: hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-branch-1.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050-trunk.patch.txt, hadoop-8050.patch.txt
>
>
> The metrics serving thread and the periodic snapshot thread can deadlock.
> It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira