You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Zack Marsh (JIRA)" <ji...@apache.org> on 2015/06/30 23:24:04 UTC
[jira] [Created] (YARN-3871) ResourceManager down after Blueprint
install
Zack Marsh created YARN-3871:
--------------------------------
Summary: ResourceManager down after Blueprint install
Key: YARN-3871
URL: https://issues.apache.org/jira/browse/YARN-3871
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.7.1
Environment: ambari-2.1.0-1295, hdp-2.3.0.0-2497, sles11sp3
Reporter: Zack Marsh
Priority: Critical
Attachments: yarn-yarn-resourcemanager-piripiri3.log, yarn-yarn-resourcemanager-piripiri3.out
On a 3-Master HDP 2.3 cluster installed with HDP-2.3.0.0-2482 and Ambari-2.1.0-1266, the YARN ResourceManager was down following the Blueprint install.
It's important to note that nothing failed during the Blueprint install. The ResourceManager shutdown because of an inability to connect to Zookeeper.
Excerpt from the ResourceManager log:
{code}
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.library.path=:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native:/usr/hdp/2.3.0.0-2482/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2482/hadoop/lib/native
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=3.0.101-0.50.TDC.1.R.0-default
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=yarn
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/yarn
2015-06-26 03:35:47,188 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.3.0.0-2482/hadoop-yarn
2015-06-26 03:35:47,190 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=piripiri2.labs.teradata.com:2181,piripiri1.labs.teradata.com:2181,piripiri3.labs.teradata.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@59d2103b
2015-06-26 03:35:47,209 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,276 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:47,380 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:47,381 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:47,452 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:48,067 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:48,378 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:49,914 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:49,915 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:50,028 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,028 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:50,030 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:50,133 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:50,134 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,064 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,065 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:52,901 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:52,901 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:52,902 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:53,570 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:53,571 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:55,541 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri2.labs.teradata.com/39.0.40.2:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:55,542 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:56,513 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri3.labs.teradata.com/39.0.40.3:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,514 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to piripiri3.labs.teradata.com/39.0.40.3:2181, initiating session
2015-06-26 03:35:56,515 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2015-06-26 03:35:56,821 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server piripiri1.labs.teradata.com/39.0.40.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-26 03:35:56,822 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1102)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2015-06-26 03:35:57,205 ERROR ha.ActiveStandbyElector (ActiveStandbyElector.java:waitForZKConnectionEvent(1044)) - Connection timed out: couldn't connect to ZooKeeper in 10000 milliseconds
2015-06-26 03:35:57,396 INFO zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed
2015-06-26 03:35:57,397 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down
2015-06-26 03:35:57,403 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 03:35:57,404 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 7 more
2015-06-26 03:35:57,404 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 7 more
2015-06-26 03:35:57,405 INFO resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1068)) - Transitioning to standby state
2015-06-26 03:35:57,405 INFO resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1075)) - Transitioned to standby state
2015-06-26 03:35:57,405 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1230)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:149)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:261)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1047)
at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1018)
at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:633)
at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:767)
at org.apache.hadoop.ha.ActiveStandbyElector.<init>(ActiveStandbyElector.java:227)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:92)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 7 more
2015-06-26 03:35:57,407 INFO resourcemanager.ResourceManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at piripiri3/39.0.40.3
************************************************************/
{code}
This issue was observed again on a 3-Master cluster installed with HDP-2.3.0.0-2497 and Ambari-2.1.0-1295.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)