You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tao Yang (JIRA)" <ji...@apache.org> on 2019/07/31 02:22:00 UTC

[jira] [Comment Edited] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

    [ https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896704#comment-16896704 ] 

Tao Yang edited comment on YARN-9714 at 7/31/19 2:21 AM:
---------------------------------------------------------

Hi, [~bibinchundatt].
{quote}
IIUC the zookeer StateStore is not an active service and zookeeper connection is common for leader election too.
Do we really need to close the connection ??
{quote}
RMStateStore is an active service which will be created for every RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore, so that we should only close it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?


was (Author: tao yang):
Hi, [~bibinchundatt].
{quote}
IIUC the zookeer StateStore is not an active service and zookeeper connection is common for leader election too.
Do we really need to close the connection ??
{qoute}
RMStateStore is an active service which will be created for every RMActiveServices instance. As for zkManager in ZKStateStore, it will reuse zkManager for HA when RM uses the Curator-based elector for leader election, otherwise it will be created for ZKRMStateStore, so that we should only close it when it's not for HA in ZKRMStateStore#serviceStop. Make sense?

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -----------------------------------------------------------------------------
>
>                 Key: YARN-9714
>                 URL: https://issues.apache.org/jira/browse/YARN-9714
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Blocker
>              Labels: memory-leak
>         Attachments: YARN-9714.001.patch, YARN-9714.002.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the dump memory and jstack, I found two places in RM may cause memory leaks after RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when services are stopping.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org