You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Nikhil <mn...@gmail.com> on 2015/02/24 22:53:44 UTC

Restarting a resource manager kills the other in HA

Hi,

In the YARN HA for Resource Manager, I noticed that the HA has been fine
initially during the HA setup but however after sometime I notice that
restarting one resource manager gets the other resource manager
stopped/killed. Below is what I see the logs on the killed resource manager
instance. I am using hadoop version 2.5.1, if that helps.

Has anyone seen this before? Any ideas on how do I go about this one?

thanks,
Nikhil

-----

2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Yielding from election
2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Deleting bread-crumb of active node...
2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session:
0x14b997543fd001e closed
2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Ignoring stale result from old client with sessionId 0x14b997543fd001e
2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2015-02-24 16:47:37,580 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Transitioning to standby state
2015-02-24 16:47:37,581 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager
metrics system...
2015-02-24 16:47:37,587 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
system stopped.
2015-02-24 16:47:37,588 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
system shutdown complete.
2015-02-24 16:47:37,588 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread
thread interrupted! Exiting!
2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session:
0x24b13ab5b4c069a closed
2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
AsyncDispatcher is draining to stop, igonring any new events.
2015-02-24 16:47:37,617 WARN
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
interrupted. Returning.
2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server
on 8032
2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8032
2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server
on 8030
2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8030
2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server
on 8031
2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8031
2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2015-02-24 16:47:37,634 INFO
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
thread interrupted

-----

Re: Restarting a resource manager kills the other in HA

Posted by daemeon reiydelle <da...@gmail.com>.

Only one rm will be active at a time. The other is in standby. When you
started the new rm, the configuration files direct the "new" rm to come up
and take over, the old primary will go to stand by (or should!). Working as
designed except you will see slowdown in scheduling. I suspect what you
want is for the new rm to come up in standby, not take over, no?

So ... I see normal messages for a switch over. However you should still
see the standby rm receiving status from the new active rm if ha is
configured.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 24, 2015 1:56 PM, "Nikhil" <mn...@gmail.com> wrote:

> Hi,
>
> In the YARN HA for Resource Manager, I noticed that the HA has been fine
> initially during the HA setup but however after sometime I notice that
> restarting one resource manager gets the other resource manager
> stopped/killed. Below is what I see the logs on the killed resource manager
> instance. I am using hadoop version 2.5.1, if that helps.
>
> Has anyone seen this before? Any ideas on how do I go about this one?
>
> thanks,
> Nikhil
>
> -----
>
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Deleting bread-crumb of active node...
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x14b997543fd001e closed
> 2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x14b997543fd001e
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,580 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Transitioning to standby state
> 2015-02-24 16:47:37,581 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager
> metrics system...
> 2015-02-24 16:47:37,587 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system stopped.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system shutdown complete.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread
> thread interrupted! Exiting!
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x24b13ab5b4c069a closed
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> AsyncDispatcher is draining to stop, igonring any new events.
> 2015-02-24 16:47:37,617 WARN
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> 2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8030
> 2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8030
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,634 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
> thread interrupted
>
> -----
>

Re: Restarting a resource manager kills the other in HA

Posted by daemeon reiydelle <da...@gmail.com>.

Only one rm will be active at a time. The other is in standby. When you
started the new rm, the configuration files direct the "new" rm to come up
and take over, the old primary will go to stand by (or should!). Working as
designed except you will see slowdown in scheduling. I suspect what you
want is for the new rm to come up in standby, not take over, no?

So ... I see normal messages for a switch over. However you should still
see the standby rm receiving status from the new active rm if ha is
configured.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 24, 2015 1:56 PM, "Nikhil" <mn...@gmail.com> wrote:

> Hi,
>
> In the YARN HA for Resource Manager, I noticed that the HA has been fine
> initially during the HA setup but however after sometime I notice that
> restarting one resource manager gets the other resource manager
> stopped/killed. Below is what I see the logs on the killed resource manager
> instance. I am using hadoop version 2.5.1, if that helps.
>
> Has anyone seen this before? Any ideas on how do I go about this one?
>
> thanks,
> Nikhil
>
> -----
>
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Deleting bread-crumb of active node...
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x14b997543fd001e closed
> 2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x14b997543fd001e
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,580 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Transitioning to standby state
> 2015-02-24 16:47:37,581 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager
> metrics system...
> 2015-02-24 16:47:37,587 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system stopped.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system shutdown complete.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread
> thread interrupted! Exiting!
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x24b13ab5b4c069a closed
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> AsyncDispatcher is draining to stop, igonring any new events.
> 2015-02-24 16:47:37,617 WARN
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> 2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8030
> 2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8030
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,634 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
> thread interrupted
>
> -----
>

Re: Restarting a resource manager kills the other in HA

Posted by daemeon reiydelle <da...@gmail.com>.

Only one rm will be active at a time. The other is in standby. When you
started the new rm, the configuration files direct the "new" rm to come up
and take over, the old primary will go to stand by (or should!). Working as
designed except you will see slowdown in scheduling. I suspect what you
want is for the new rm to come up in standby, not take over, no?

So ... I see normal messages for a switch over. However you should still
see the standby rm receiving status from the new active rm if ha is
configured.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 24, 2015 1:56 PM, "Nikhil" <mn...@gmail.com> wrote:

> Hi,
>
> In the YARN HA for Resource Manager, I noticed that the HA has been fine
> initially during the HA setup but however after sometime I notice that
> restarting one resource manager gets the other resource manager
> stopped/killed. Below is what I see the logs on the killed resource manager
> instance. I am using hadoop version 2.5.1, if that helps.
>
> Has anyone seen this before? Any ideas on how do I go about this one?
>
> thanks,
> Nikhil
>
> -----
>
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Deleting bread-crumb of active node...
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x14b997543fd001e closed
> 2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x14b997543fd001e
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,580 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Transitioning to standby state
> 2015-02-24 16:47:37,581 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager
> metrics system...
> 2015-02-24 16:47:37,587 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system stopped.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system shutdown complete.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread
> thread interrupted! Exiting!
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x24b13ab5b4c069a closed
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> AsyncDispatcher is draining to stop, igonring any new events.
> 2015-02-24 16:47:37,617 WARN
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> 2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8030
> 2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8030
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,634 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
> thread interrupted
>
> -----
>

Re: Restarting a resource manager kills the other in HA

Posted by daemeon reiydelle <da...@gmail.com>.

Only one rm will be active at a time. The other is in standby. When you
started the new rm, the configuration files direct the "new" rm to come up
and take over, the old primary will go to stand by (or should!). Working as
designed except you will see slowdown in scheduling. I suspect what you
want is for the new rm to come up in standby, not take over, no?

So ... I see normal messages for a switch over. However you should still
see the standby rm receiving status from the new active rm if ha is
configured.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 24, 2015 1:56 PM, "Nikhil" <mn...@gmail.com> wrote:

> Hi,
>
> In the YARN HA for Resource Manager, I noticed that the HA has been fine
> initially during the HA setup but however after sometime I notice that
> restarting one resource manager gets the other resource manager
> stopped/killed. Below is what I see the logs on the killed resource manager
> instance. I am using hadoop version 2.5.1, if that helps.
>
> Has anyone seen this before? Any ideas on how do I go about this one?
>
> thanks,
> Nikhil
>
> -----
>
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Deleting bread-crumb of active node...
> 2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x14b997543fd001e closed
> 2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x14b997543fd001e
> 2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,580 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Transitioning to standby state
> 2015-02-24 16:47:37,581 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager
> metrics system...
> 2015-02-24 16:47:37,587 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system stopped.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
> system shutdown complete.
> 2015-02-24 16:47:37,588 INFO
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread
> thread interrupted! Exiting!
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x24b13ab5b4c069a closed
> 2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
> AsyncDispatcher is draining to stop, igonring any new events.
> 2015-02-24 16:47:37,617 WARN
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> 2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8032
> 2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8030
> 2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8030
> 2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server
> on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server listener on 8031
> 2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC
> Server Responder
> 2015-02-24 16:47:37,634 INFO
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
> thread interrupted
>
> -----
>