You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/07/24 12:52:03 UTC

Master down log.

Hi,

My cluster got some troubles last night and at the end, all the
servers went down. Hadoop is still running, but HBase is not.

I have no clue what the root cause is. I looked at the logs on the
master side, and the fist line when it started to go down was:
2012-07-24 01:20:13,227 INFO
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
ephemeral node deleted, processing expiration
[phenom,60020,1342789574088]

And then everything has started to die.

At the end, on the master side, I have this in the out file:

hbase@node3:~/hbase-0.94.0$ cat logs/hbase-hbase-master-node3.out
Exception in thread "master-node3,60000,1342789522486"
java.lang.NullPointerException
        at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:749)
        at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:726)
        at org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:276)
        at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:240)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:370)
        at java.lang.Thread.run(Thread.java:722)

I think this one need to be addressed.

I looked at my zookeeper logs and I have one entry every 2 seconds. So
I think something is missconfigured and I will look at it. So the goal
of this post is just to report the error above and see if this should
be fixed by adding a null check on the related code.

JM