You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ron van der Vegt <ro...@openindex.io> on 2014/08/25 13:51:15 UTC

Could not get distributed Hbase stable, stopping every ~24 hours.

Hi all!

I have setup one master and 5 regionservers to collect log data. But every ~24 hours, at random times, the regionservers generating a fatal error and all stopping one by one. Eventually the master will stop. I also see some weird characters before the server names in the logs. Seems like some encoding issue.

I have read in the documentation, that if the garbage collection is taking to long, you will also get the session expired message. But I have logged the GC on the master, and it seems oke. Could someone help me figure out why this is happening?

Furthermore, I am currently monitoring the memory usage of the master with JMX. I notice that the heap size is slowly growing. Could there be a memory leakage?
xmx is set to 1gb.

Setup:
hbase 0.94.20
hadoop 1.2.1
debian wheezy

Thanks in advice,

Ron

Logs of master:
===============

2014-08-23 07:00:20,104 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/70236052
2014-08-23 07:00:20,406 ERROR org.apache.hadoop.hbase.master.HMaster: Region server vps2060.directvps.nl,60020,1408691165501 reported a fatal error:
ABORTING region server vps2060.directvps.nl,60020,1408691165501: regionserver:60020-0x347fc15265a00eb-0x347fc15265a00eb-0x347fc15265a00eb regionserver:60020-0x347fc15265a00eb-0x347fc15265a00eb-0x347fc15265a00eb received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-08-23 07:00:20,911 ERROR org.apache.hadoop.hbase.master.HMaster: Region server vps2057.directvps.nl,60020,1408691165499 reported a fatal error:
ABORTING region server vps2057.directvps.nl,60020,1408691165499: regionserver:60020-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea regionserver:60020-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-08-23 07:00:21,001 ERROR org.apache.hadoop.hbase.master.HMaster: Region server vps2059.directvps.nl,60020,1408691165851 reported a fatal error:
ABORTING region server vps2059.directvps.nl,60020,1408691165851: regionserver:60020-0x147fc1616d200bb-0x147fc1616d200bb-0x147fc1616d200bb regionserver:60020-0x147fc1616d200bb-0x147fc1616d200bb-0x147fc1616d200bb received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-08-23 07:00:21,056 ERROR org.apache.hadoop.hbase.master.HMaster: Region server vps2058.directvps.nl,60020,1408691165675 reported a fatal error:
ABORTING region server vps2058.directvps.nl,60020,1408691165675: regionserver:60020-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec regionserver:60020-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-08-23 07:00:22,140 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/70236052
2014-08-23 07:00:26,141 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/70236052
2014-08-23 07:00:34,114 ERROR org.apache.hadoop.hbase.master.HMaster: Region server vps2056.directvps.nl,60020,1408691165439 reported a fatal error:
ABORTING region server vps2056.directvps.nl,60020,1408691165439: Unexpected exception handling nodeDeleted event
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:420)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:182)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-08-23 07:00:34,118 ERROR org.apache.hadoop.hbase.master.HMaster: Region server vps2056.directvps.nl,60020,1408691165439 reported a fatal error:
ABORTING region server vps2056.directvps.nl,60020,1408691165439: regionserver:60020-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2 regionserver:60020-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2 received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-08-23 07:00:34,141 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/70236052
2014-08-23 07:00:34,142 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 3 retries
2014-08-23 07:00:34,152 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001 Unable to get data of znode /hbase/unassigned/70236052
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/70236052
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
    at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852)
    at org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274)
    at org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255)
    at org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382)
    at org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2014-08-23 07:00:34,152 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001 Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/70236052
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
    at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852)
    at org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274)
    at org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255)
    at org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382)
    at org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2014-08-23 07:00:34,163 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2014-08-23 07:00:34,215 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/backup-masters/vps2008.directvps.nl,60000,1408691163492 already deleted, and this is not a retry
2014-08-23 07:05:34,165 WARN org.apache.hadoop.hbase.master.SplitLogManager: Interrupted while waiting for log splits to be completed
2014-08-23 07:05:34,179 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception reading unassigned node for region=70236052
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/unassigned/70236052
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
    at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852)
    at org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274)
    at org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255)
    at org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382)
    at org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2014-08-23 07:05:34,179 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://namenode.openindex.io:8020/hbase/.logs/vps2058.directvps.nl,60020,1408691165675-splitting] installed = 2 but only 0 done
2014-08-23 07:05:34,184 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000
2014-08-23 07:05:34,185 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan of catalog table
java.io.IOException: Giving up after tries=1
    at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:210)
    at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:188)
    at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:82)
    at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:67)
    at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:126)
    at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:137)
    at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:93)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:207)
    ... 8 more
2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60000: exiting
2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60000: exiting
2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0 on 60000: exiting
2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60000: exiting
2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60000: exiting
2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60000: exiting
2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60000: exiting
2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60000: exiting
2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000: exiting
2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2 on 60000: exiting
2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000
2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2014-08-23 07:05:34,214 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2014-08-23 07:05:34,212 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1 on 60000: exiting
2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60000: exiting
2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60000: exiting
2014-08-23 07:05:34,256 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2014-08-23 07:05:34,259 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2014-08-23 07:05:34,260 FATAL org.apache.hadoop.hbase.master.HMaster: master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001-0x347fcb5a0130001 master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001-0x347fcb5a0130001 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2014-08-23 07:05:34,414 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2129)

GC log:
=======

0.185: Application time: 0.1304320 seconds
0.185: [GC0.185: [ParNew: 4288K->511K(4800K), 0.0120520 secs] 4288K->1008K(15424K), 0.0121600 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 
0.197: Total time for which application threads were stopped: 0.0126240 seconds
Heap
 par new generation   total 4800K, used 3580K [0x00000000b7200000, 0x00000000b7730000, 0x00000000c1860000)
  eden space 4288K,  71% used [0x00000000b7200000, 0x00000000b74ff328, 0x00000000b7630000)
  from space 512K,  99% used [0x00000000b76b0000, 0x00000000b772fff8, 0x00000000b7730000)
  to   space 512K,   0% used [0x00000000b7630000, 0x00000000b7630000, 0x00000000b76b0000)
 concurrent mark-sweep generation total 10624K, used 496K [0x00000000c1860000, 0x00000000c22c0000, 0x00000000f5a00000)
 concurrent-mark-sweep perm gen total 21248K, used 6688K [0x00000000f5a00000, 0x00000000f6ec0000, 0x0000000100000000)
0.370: Application time: 0.1728650 seconds


Re: Could not get distributed Hbase stable, stopping every ~24 hours.

Posted by Ted Yu <yu...@gmail.com>.
Have you checked zookeeper logs ?
What zookeeper release are you using ?

bq. Could there be a memory leakage?

You can use jmap to capture heap memory details:

http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html

Cheers


On Mon, Aug 25, 2014 at 4:51 AM, Ron van der Vegt <
ron.van.der.vegt@openindex.io> wrote:

> Hi all!
>
> I have setup one master and 5 regionservers to collect log data. But every
> ~24 hours, at random times, the regionservers generating a fatal error and
> all stopping one by one. Eventually the master will stop. I also see some
> weird characters before the server names in the logs. Seems like some
> encoding issue.
>
> I have read in the documentation, that if the garbage collection is taking
> to long, you will also get the session expired message. But I have logged
> the GC on the master, and it seems oke. Could someone help me figure out
> why this is happening?
>
> Furthermore, I am currently monitoring the memory usage of the master with
> JMX. I notice that the heap size is slowly growing. Could there be a memory
> leakage?
> xmx is set to 1gb.
>
> Setup:
> hbase 0.94.20
> hadoop 1.2.1
> debian wheezy
>
> Thanks in advice,
>
> Ron
>
> Logs of master:
> ===============
>
> 2014-08-23 07:00:20,104 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/unassigned/70236052
> 2014-08-23 07:00:20,406 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server vps2060.directvps.nl,60020,1408691165501 reported a fatal
> error:
> ABORTING region server vps2060.directvps.nl,60020,1408691165501:
> regionserver:60020-0x347fc15265a00eb-0x347fc15265a00eb-0x347fc15265a00eb
> regionserver:60020-0x347fc15265a00eb-0x347fc15265a00eb-0x347fc15265a00eb
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>
> 2014-08-23 07:00:20,911 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server vps2057.directvps.nl,60020,1408691165499 reported a fatal
> error:
> ABORTING region server vps2057.directvps.nl,60020,1408691165499:
> regionserver:60020-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea
> regionserver:60020-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea-0x347fc15265a00ea
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>
> 2014-08-23 07:00:21,001 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server vps2059.directvps.nl,60020,1408691165851 reported a fatal
> error:
> ABORTING region server vps2059.directvps.nl,60020,1408691165851:
> regionserver:60020-0x147fc1616d200bb-0x147fc1616d200bb-0x147fc1616d200bb
> regionserver:60020-0x147fc1616d200bb-0x147fc1616d200bb-0x147fc1616d200bb
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>
> 2014-08-23 07:00:21,056 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server vps2058.directvps.nl,60020,1408691165675 reported a fatal
> error:
> ABORTING region server vps2058.directvps.nl,60020,1408691165675:
> regionserver:60020-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec
> regionserver:60020-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec-0x347fc15265a00ec
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>
> 2014-08-23 07:00:22,140 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper exception:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/unassigned/70236052
> 2014-08-23 07:00:26,141 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper exception:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/unassigned/70236052
> 2014-08-23 07:00:34,114 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server vps2056.directvps.nl,60020,1408691165439 reported a fatal
> error:
> ABORTING region server vps2056.directvps.nl,60020,1408691165439:
> Unexpected exception handling nodeDeleted event
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:420)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.nodeDeleted(ZooKeeperNodeTracker.java:182)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>
> 2014-08-23 07:00:34,118 ERROR org.apache.hadoop.hbase.master.HMaster:
> Region server vps2056.directvps.nl,60020,1408691165439 reported a fatal
> error:
> ABORTING region server vps2056.directvps.nl,60020,1408691165439:
> regionserver:60020-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2
> regionserver:60020-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2-0x247fc16c80500d2
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>
> 2014-08-23 07:00:34,141 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper exception:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/unassigned/70236052
> 2014-08-23 07:00:34,142 ERROR
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData
> failed after 3 retries
> 2014-08-23 07:00:34,152 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001
> Unable to get data of znode /hbase/unassigned/70236052
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/unassigned/70236052
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>     at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852)
>     at
> org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274)
>     at
> org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255)
>     at
> org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382)
>     at
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2014-08-23 07:00:34,152 ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
> master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001
> Received unexpected KeeperException, re-throwing exception
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/unassigned/70236052
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>     at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852)
>     at
> org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274)
>     at
> org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255)
>     at
> org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382)
>     at
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2014-08-23 07:00:34,163 FATAL org.apache.hadoop.hbase.master.HMaster:
> Master server abort: loaded coprocessors are: []
> 2014-08-23 07:00:34,215 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
> /hbase/backup-masters/vps2008.directvps.nl,60000,1408691163492 already
> deleted, and this is not a retry
> 2014-08-23 07:05:34,165 WARN
> org.apache.hadoop.hbase.master.SplitLogManager: Interrupted while waiting
> for log splits to be completed
> 2014-08-23 07:05:34,179 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unexpected ZK exception reading unassigned node for region=70236052
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/unassigned/70236052
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>     at
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:290)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:709)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:685)
>     at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:852)
>     at
> org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRegion(AssignmentManager.java:3274)
>     at
> org.apache.hadoop.hbase.master.AssignmentManager.isCarryingRoot(AssignmentManager.java:3255)
>     at
> org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:382)
>     at
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker.nodeDeleted(RegionServerTracker.java:122)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:318)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2014-08-23 07:05:34,179 WARN
> org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs
> in [hdfs://
> namenode.openindex.io:8020/hbase/.logs/vps2058.directvps.nl,60020,1408691165675-splitting]
> installed = 2 but only 0 done
> 2014-08-23 07:05:34,184 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60000
> 2014-08-23 07:05:34,185 WARN
> org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan of catalog table
> java.io.IOException: Giving up after tries=1
>     at
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:210)
>     at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:188)
>     at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:82)
>     at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:67)
>     at
> org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:126)
>     at
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:137)
>     at
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:93)
>     at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException: sleep interrupted
>     at java.lang.Thread.sleep(Native Method)
>     at
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:207)
>     ... 8 more
> 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 5 on 60000: exiting
> 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 6 on 60000: exiting
> 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
> Server handler 0 on 60000: exiting
> 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60000: exiting
> 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 3 on 60000: exiting
> 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 4 on 60000: exiting
> 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60000: exiting
> 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 8 on 60000: exiting
> 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 9 on 60000: exiting
> 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
> Server handler 2 on 60000: exiting
> 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> IPC Server listener on 60000
> 2014-08-23 07:05:34,213 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> IPC Server Responder
> 2014-08-23 07:05:34,214 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> IPC Server Responder
> 2014-08-23 07:05:34,212 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
> Server handler 1 on 60000: exiting
> 2014-08-23 07:05:34,186 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60000: exiting
> 2014-08-23 07:05:34,185 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1 on 60000: exiting
> 2014-08-23 07:05:34,256 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:60010
> 2014-08-23 07:05:34,259 FATAL org.apache.hadoop.hbase.master.HMaster:
> Master server abort: loaded coprocessors are: []
> 2014-08-23 07:05:34,260 FATAL org.apache.hadoop.hbase.master.HMaster:
> master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001-0x347fcb5a0130001
> master:60000-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x147fc1616d200ba-0x347fcb5a0130000-0x247ffe833880001-0x247ffe833880001-0x247ffe833880001-0x347fcb5a0130001
> received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
>     at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2014-08-23 07:05:34,414 ERROR
> org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
> java.lang.RuntimeException: HMaster Aborted
>     at
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
>     at
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
>     at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2129)
>
> GC log:
> =======
>
> 0.185: Application time: 0.1304320 seconds
> 0.185: [GC0.185: [ParNew: 4288K->511K(4800K), 0.0120520 secs]
> 4288K->1008K(15424K), 0.0121600 secs] [Times: user=0.01 sys=0.01, real=0.01
> secs]
> 0.197: Total time for which application threads were stopped: 0.0126240
> seconds
> Heap
>  par new generation   total 4800K, used 3580K [0x00000000b7200000,
> 0x00000000b7730000, 0x00000000c1860000)
>   eden space 4288K,  71% used [0x00000000b7200000, 0x00000000b74ff328,
> 0x00000000b7630000)
>   from space 512K,  99% used [0x00000000b76b0000, 0x00000000b772fff8,
> 0x00000000b7730000)
>   to   space 512K,   0% used [0x00000000b7630000, 0x00000000b7630000,
> 0x00000000b76b0000)
>  concurrent mark-sweep generation total 10624K, used 496K
> [0x00000000c1860000, 0x00000000c22c0000, 0x00000000f5a00000)
>  concurrent-mark-sweep perm gen total 21248K, used 6688K
> [0x00000000f5a00000, 0x00000000f6ec0000, 0x0000000100000000)
> 0.370: Application time: 0.1728650 seconds
>
>