You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/11/24 18:13:10 UTC

[jira] [Commented] (HADOOP-12594) Deadlock in metrics subsystem

    [ https://issues.apache.org/jira/browse/HADOOP-12594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024875#comment-15024875 ] 

Jason Lowe commented on HADOOP-12594:
-------------------------------------

The deadlock occurred because a Jetty thread was trying to handle a JMX metrics request just as the metrics timer fired and was gathering a snapshot.
{noformat}
"324490955@qtp-119819655-4445":
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.getMetrics(MetricsSystemImpl.java:564)
        - waiting to lock <0x00000003097c02c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:178)
        - locked <0x000000030ab29680> (a org.apache.hadoop.metrics2.impl.MetricsSourceAdapter)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:155)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1378)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(JmxMBeanServer.java:920)
        at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:248)
        at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:142)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
[...]
"Timer for 'ResourceManager' metrics system":
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:194)
        - waiting to lock <0x000000030ab29680> (a org.apache.hadoop.metrics2.impl.MetricsSourceAdapter)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:410)
        - locked <0x00000003097c02c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381)
        - locked <0x00000003097c02c8> (a org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
        at org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505)

Found 1 deadlock.
{noformat}

The timer thread has the MetricsSystemImpl lock and is trying to grab the MetricsSourceAdapter lock.  In the meantime the JMX thread has the MetricsSourceAdapter lock and is trying to grab the MetricsSystemImpl lock.  The locking order isn't consistent so we deadlocked.


> Deadlock in metrics subsystem
> -----------------------------
>
>                 Key: HADOOP-12594
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12594
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 2.7.1
>            Reporter: Jason Lowe
>            Priority: Critical
>
> Saw a YARN ResourceManager process encounter a deadlock which appears to be caused by the metrics subsystem.  Stack trace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)