You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by bijieshan <bi...@huawei.com> on 2011/04/14 09:51:58 UTC

Zookeeper exception leading to the shutdown of HBase

Hi,
   I found this problem when the HBase cluster was running,here the logs information:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2011-03-21 13:26:39,697 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server t1/157.5.111.11:2181
2011-03-21 13:26:39,698 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t1/157.5.111.11:2181, initiating session
2011-03-21 13:26:53,035 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 13336ms for sessionid 0x22e8e6ee15f0046, closing socket connection and attempting reconnect
2011-03-21 13:26:53,135 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e8e6ee15f0046 Unable to get data of znode /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
         at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
         at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
2011-03-21 13:26:53,137 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e8e6ee15f0046 Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
         at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
         at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
2011-03-21 13:26:53,138 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned node data
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
         at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
         at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
2011-03-21 13:26:53,138 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
When I restart the cluster,the problem is still exist(Due to the unnormally Zookeeper process):
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11-03-21 14:43:27,565 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=t2:2181,t1:2181,t0:2181 sessionTimeout=180000 watcher=master:60000
2011-03-21 14:43:27,573 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server t1/157.5.111.11:2181
2011-03-21 14:43:27,582 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t1/157.5.111.11:2181, initiating session
2011-03-21 14:44:27,586 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60003ms for sessionid 0x0, closing socket connection and attempting reconnect
2011-03-21 14:44:27,699 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
         at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1071)
         at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
         at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
         at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
         at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1085)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:648)
         at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902)
         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
         at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:219)
         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1066)
         ... 5 more
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This problem is most similar to the phenomenon described in the issue of:
https://issues.apache.org/jira/browse/HBASE-3062
And the bug has been fixed in the version of HBase 0.90.1.
Please help to analysis the problem.Thank you.
Expecting to the response.

Regards,
Jieshan


Re: Zookeeper exception leading to the shutdown of HBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.
In the first case there clearly is a pause of 13 seconds, and in the
second case it talks of a 60 secs lapse of time when the master's
zookeeper client wasn't able to talk to the zookeeper server. As far
as I can tell there's something weird going on in your environment
(network issues maybe?).

J-D

On Thu, Apr 14, 2011 at 12:51 AM, bijieshan <bi...@huawei.com> wrote:
> Hi,
>   I found this problem when the HBase cluster was running,here the logs information:
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2011-03-21 13:26:39,697 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server t1/157.5.111.11:2181
> 2011-03-21 13:26:39,698 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t1/157.5.111.11:2181, initiating session
> 2011-03-21 13:26:53,035 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 13336ms for sessionid 0x22e8e6ee15f0046, closing socket connection and attempting reconnect
> 2011-03-21 13:26:53,135 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22e8e6ee15f0046 Unable to get data of znode /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
>         at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> 2011-03-21 13:26:53,137 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22e8e6ee15f0046 Received unexpected KeeperException, re-throwing exception
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
>         at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> 2011-03-21 13:26:53,138 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned node data
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/59ba25120921011b7d9ed4025d30c105
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:932)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
>         at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:739)
>         at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:501)
> 2011-03-21 13:26:53,138 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> When I restart the cluster,the problem is still exist(Due to the unnormally Zookeeper process):
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 11-03-21 14:43:27,565 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=t2:2181,t1:2181,t0:2181 sessionTimeout=180000 watcher=master:60000
> 2011-03-21 14:43:27,573 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server t1/157.5.111.11:2181
> 2011-03-21 14:43:27,582 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to t1/157.5.111.11:2181, initiating session
> 2011-03-21 14:44:27,586 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60003ms for sessionid 0x0, closing socket connection and attempting reconnect
> 2011-03-21 14:44:27,699 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
> java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
>         at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1071)
>         at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
>         at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
>         at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1085)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:648)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902)
>         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
>         at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:219)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1066)
>         ... 5 more
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> This problem is most similar to the phenomenon described in the issue of:
> https://issues.apache.org/jira/browse/HBASE-3062
> And the bug has been fixed in the version of HBase 0.90.1.
> Please help to analysis the problem.Thank you.
> Expecting to the response.
>
> Regards,
> Jieshan
>
>