You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/09/30 17:45:04 UTC

[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

    [ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14937022#comment-14937022 ] 

Jason Lowe commented on YARN-3619:
----------------------------------

My apologies for the long delay, as this fell off my radar.  The approach seems reasonable.

The patch needs to be upmerged to trunk.  In addition I'm wondering about the Timer handling.  I think the Timer should be a daemon thread (we don't want to prolong NM shutdown due to this).  Also it seems wasteful to dedicate a separate timer thread for every container that finished.  It would be more efficient to share a timer that handles multiple timer tasks rather than spawn a thread for every timer task.


> ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-3619
>                 URL: https://issues.apache.org/jira/browse/YARN-3619
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.0
>            Reporter: Jason Lowe
>            Assignee: zhihai xu
>         Attachments: YARN-3619.000.patch, test.patch
>
>
> ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources.  This leads to a ConcurrentModificationException log like this:
> {noformat}
> 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)