You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Taylor, Ronald C" <ro...@pnnl.gov> on 2011/11/30 07:51:22 UTC

getting HBase up after an unexpected power failure - need some advice

Hello folks,

We have a small Hadoop/Hbase cluster whose power got shut off without HBase and Hadoop being shut down.

So – I am trying to bring the cluster back up. Hadoop comes back up fine, and the “hadoop fsck” says that the HDFS file system is healthy.

However: when I then tried to bring up Hbase,  I get errors in the log file
     hbase-hbase-master-h01.emsl.pnl.gov.log

and  the HBase web site for monitoring does not come up at
  http://h01.emsl.pnl.gov:60010/master.jsp

The log file says “Failed to create /hbase”. And that Hbase is “unable to read additional data from server” and “likely server has closed socket” and “check quorum servers”, in reference to the three nodes that I selected for use in the  zookeeper quorum that manages our HBase copy at
      h09:2182, h06:2182, h05:2182

I rebooted the entire cluster again, after shutting down Hadoop using stop-all.sh. I then brought Hadoop back up, and tried the Hbase start command again:

    /home/hbase/hbase/bin/start-hbase.sh

Same errors seen. See the tail end of the log at bottom.

We are running the Apache distribution, using Hadoop 0.20.2 and HBase 0.89.20100726. (Yep, I know we should upgrade and probably switch to the Cloudera stack – hope to do so soon – but, right now, could use some more immediate help).

Can anybody give me some guidance as to what is going wrong?

-          Ron


Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568
email: ronald.taylor@pnnl.gov<ma...@pnnl.gov>

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

HBase log output:

2011-11-29 22:14:01,345 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Trying to read /hbase/master
2011-11-29 22:14:01,393 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h06/192.168.200.26:2182
2011-11-29 22:14:01,393 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h06/192.168.200.26:2182, initiating session
2011-11-29 22:14:01,393 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
ing reconnect
2011-11-29 22:14:01,495 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to read org.apache.zookeeper.KeeperExcepti\
on$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2011-11-29 22:14:01,495 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Writing master address 192.168.200.21:60000 to zn\
ode /hbase/master
2011-11-29 22:14:01,894 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h05/192.168.200.25:2182
2011-11-29 22:14:01,894 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h05/192.168.200.25:2182, initiating session
2011-11-29 22:14:01,894 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
ing reconnect
2011-11-29 22:14:01,997 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to create /hbase -- check quorum servers, c\
urrently=h09:2182,h06:2182,h05:2182
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:500)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:527)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeMasterAddress(ZooKeeperWrapper.java:650)
        at org.apache.hadoop.hbase.master.ZKMasterAddressWatcher.writeAddressToZooKeeper(ZKMasterAddressWatcher.java:111)
        at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:234)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1227)
        at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1329)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1380)
2011-11-29 22:14:01,997 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Trying to read /hbase/master
2011-11-29 22:14:02,626 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h09/192.168.200.29:2182
2011-11-29 22:14:02,626 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h09/192.168.200.29:2182, initiating session
2011-11-29 22:14:02,626 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
ing reconnect
2011-11-29 22:14:02,729 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to read org.apache.zookeeper.KeeperExcepti\
on$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2011-11-29 22:14:02,729 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Writing master address 192.168.200.21:60000 to zn\
ode /hbase/master
2011-11-29 22:14:03,064 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h06/192.168.200.26:2182
2011-11-29 22:14:03,064 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h06/192.168.200.26:2182, initiating session
2011-11-29 22:14:03,064 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
ing reconnect
2011-11-29 22:14:03,167 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to create /hbase -- check quorum servers, c\
urrently=h09:2182,h06:2182,h05:2182
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:500)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:527)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeMasterAddress(ZooKeeperWrapper.java:650)
        at org.apache.hadoop.hbase.master.ZKMasterAddressWatcher.writeAddressToZooKeeper(ZKMasterAddressWatcher.java:111)
        at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:234)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1227)
        at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1329)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1380)
2011-11-29 22:14:03,167 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Trying to read /hbase/master
2011-11-29 22:14:03,920 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h05/192.168.200.25:2182
2011-11-29 22:14:03,920 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h05/192.168.200.25:2182, initiating session
2011-11-29 22:14:03,920 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
ing reconnect
2011-11-29 22:14:04,022 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to read org.apache.zookeeper.KeeperExcepti\
on$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2011-11-29 22:14:04,022 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Writing master address 192.168.200.21:60000 to zn\
ode /hbase/master
[rtaylor@h01 logs]$


Re: getting HBase up after an unexpected power failure - need some advice

Posted by Lars George <la...@gmail.com>.
Hey,

Looks like you have a corrupted ZK. Try and stop ZK (after stopping HBase of course) and restart it. If that also fails, then wipe the data dir ZK uses (check the config, for example the zoo.cfg for stand alone ZK nodes). ZK is going to recreate the data files and it should be able to move forward.

Cheers,
Lars


On Nov 30, 2011, at 7:51 AM, Taylor, Ronald C wrote:

> Hello folks,
> 
> We have a small Hadoop/Hbase cluster whose power got shut off without HBase and Hadoop being shut down.
> 
> So – I am trying to bring the cluster back up. Hadoop comes back up fine, and the “hadoop fsck” says that the HDFS file system is healthy.
> 
> However: when I then tried to bring up Hbase,  I get errors in the log file
>     hbase-hbase-master-h01.emsl.pnl.gov.log
> 
> and  the HBase web site for monitoring does not come up at
>  http://h01.emsl.pnl.gov:60010/master.jsp
> 
> The log file says “Failed to create /hbase”. And that Hbase is “unable to read additional data from server” and “likely server has closed socket” and “check quorum servers”, in reference to the three nodes that I selected for use in the  zookeeper quorum that manages our HBase copy at
>      h09:2182, h06:2182, h05:2182
> 
> I rebooted the entire cluster again, after shutting down Hadoop using stop-all.sh. I then brought Hadoop back up, and tried the Hbase start command again:
> 
>    /home/hbase/hbase/bin/start-hbase.sh
> 
> Same errors seen. See the tail end of the log at bottom.
> 
> We are running the Apache distribution, using Hadoop 0.20.2 and HBase 0.89.20100726. (Yep, I know we should upgrade and probably switch to the Cloudera stack – hope to do so soon – but, right now, could use some more immediate help).
> 
> Can anybody give me some guidance as to what is going wrong?
> 
> -          Ron
> 
> 
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group
> Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
> Richland, WA 99352
> phone: (509) 372-6568
> email: ronald.taylor@pnnl.gov<ma...@pnnl.gov>
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> HBase log output:
> 
> 2011-11-29 22:14:01,345 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Trying to read /hbase/master
> 2011-11-29 22:14:01,393 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h06/192.168.200.26:2182
> 2011-11-29 22:14:01,393 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h06/192.168.200.26:2182, initiating session
> 2011-11-29 22:14:01,393 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
> ing reconnect
> 2011-11-29 22:14:01,495 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to read org.apache.zookeeper.KeeperExcepti\
> on$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
> 2011-11-29 22:14:01,495 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Writing master address 192.168.200.21:60000 to zn\
> ode /hbase/master
> 2011-11-29 22:14:01,894 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h05/192.168.200.25:2182
> 2011-11-29 22:14:01,894 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h05/192.168.200.25:2182, initiating session
> 2011-11-29 22:14:01,894 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
> ing reconnect
> 2011-11-29 22:14:01,997 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to create /hbase -- check quorum servers, c\
> urrently=h09:2182,h06:2182,h05:2182
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:500)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:527)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeMasterAddress(ZooKeeperWrapper.java:650)
>        at org.apache.hadoop.hbase.master.ZKMasterAddressWatcher.writeAddressToZooKeeper(ZKMasterAddressWatcher.java:111)
>        at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:234)
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1227)
>        at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1329)
>        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1380)
> 2011-11-29 22:14:01,997 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Trying to read /hbase/master
> 2011-11-29 22:14:02,626 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h09/192.168.200.29:2182
> 2011-11-29 22:14:02,626 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h09/192.168.200.29:2182, initiating session
> 2011-11-29 22:14:02,626 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
> ing reconnect
> 2011-11-29 22:14:02,729 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to read org.apache.zookeeper.KeeperExcepti\
> on$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
> 2011-11-29 22:14:02,729 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Writing master address 192.168.200.21:60000 to zn\
> ode /hbase/master
> 2011-11-29 22:14:03,064 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h06/192.168.200.26:2182
> 2011-11-29 22:14:03,064 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h06/192.168.200.26:2182, initiating session
> 2011-11-29 22:14:03,064 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
> ing reconnect
> 2011-11-29 22:14:03,167 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to create /hbase -- check quorum servers, c\
> urrently=h09:2182,h06:2182,h05:2182
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:500)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:527)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeMasterAddress(ZooKeeperWrapper.java:650)
>        at org.apache.hadoop.hbase.master.ZKMasterAddressWatcher.writeAddressToZooKeeper(ZKMasterAddressWatcher.java:111)
>        at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:234)
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1227)
>        at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1329)
>        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1380)
> 2011-11-29 22:14:03,167 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Trying to read /hbase/master
> 2011-11-29 22:14:03,920 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server h05/192.168.200.25:2182
> 2011-11-29 22:14:03,920 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to h05/192.168.200.25:2182, initiating session
> 2011-11-29 22:14:03,920 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempt\
> ing reconnect
> 2011-11-29 22:14:04,022 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Failed to read org.apache.zookeeper.KeeperExcepti\
> on$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
> 2011-11-29 22:14:04,022 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <h05,h06,h09:/hbase,org.apache.hadoop.hbase.master.HMaster>Writing master address 192.168.200.21:60000 to zn\
> ode /hbase/master
> [rtaylor@h01 logs]$
>