You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mark <st...@gmail.com> on 2011/08/25 02:19:06 UTC

Dead Servers

I noticed that after running some hefty jobs on our cluster that 3 out 
of 5 of our HBase region servers were killed. First off, when this 
happens and there are only 2 servers is there a possibility of data 
corruption and/or loss? Secondly and more importantly, why does this 
happen and how can I resolve it?

Thanks!

Here is the relevant part of my log:

2011-08-24 15:08:34,989 DEBUG 
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66 
MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215, 
hits=84188, hitRatio=99.96%%, cachingAccesses=84189, cachingHits=84167, 
cachingHitsRatio=99.97%%, evictions=0, evicted=0, evictedPerRun=NaN
2011-08-24 15:12:03,348 DEBUG 
org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 
3600000ms elapsed
2011-08-24 15:13:34,989 DEBUG 
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66 
MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215, 
hits=84188, hitRatio=99.96%%, cachingAccesses=84189, cachingHits=84167, 
cachingHitsRatio=99.97%%, evictions=0, evicted=0, evictedPerRun=NaN
2011-08-24 15:18:34,990 DEBUG 
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66 
MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215, 
hits=84188, hitRatio=99.96%%, cachingAccesses=84189, cachingHits=84167, 
cachingHitsRatio=99.97%%, evictions=0, evicted=0, evictedPerRun=NaN
2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client 
session timed out, have not heard from server in 26666ms for sessionid 
0x131ec6ce0b00004, closing socket connection and attempting reconnect
2011-08-24 15:20:48,929 INFO org.apache.zookeeper.ClientCnxn: Opening 
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:20:57,463 INFO org.apache.zookeeper.ClientCnxn: Client 
session timed out, have not heard from server in 26666ms for sessionid 
0x131ec6ce0b00003, closing socket connection and attempting reconnect
2011-08-24 15:20:59,156 INFO org.apache.zookeeper.ClientCnxn: Opening 
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:09,961 WARN org.apache.zookeeper.ClientCnxn: Session 
0x131ec6ce0b00004 for server null, unexpected error, closing socket 
connection and attempting reconnect
java.net.ConnectException: Connection timed out
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2011-08-24 15:21:11,415 INFO org.apache.zookeeper.ClientCnxn: Opening 
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:11,416 INFO org.apache.zookeeper.ClientCnxn: Socket 
connection established to hadoop-master.ioffer.com/10.101.101.0:2181, 
initiating session
2011-08-24 15:21:11,445 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnect to ZooKeeper service, session 0x131ec6ce0b00004 has expired, 
closing socket connection
2011-08-24 15:21:11,452 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
server serverName=hadoop1.ioffer.com,60020,1313931812841, 
load=(requests=246, regions=2, usedHeap=43, maxHeap=3983): 
regionserver:60020-0x131ec6ce0b00004 
regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper, 
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired
     at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
     at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
     at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-08-24 15:21:11,466 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0, 
memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=42, 
maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552, 
blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27, 
blockCacheEvictedCount=0, blockCacheHitRatio=99, 
blockCacheHitCachingRatio=99
2011-08-24 15:21:11,467 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: 
regionserver:60020-0x131ec6ce0b00004 
regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper, 
aborting
2011-08-24 15:21:11,467 INFO org.apache.zookeeper.ClientCnxn: 
EventThread shut down
2011-08-24 15:21:11,570 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: hadoop-master.ioffer.com/10.101.101.0:9000. Already 
tried 0 time(s).
2011-08-24 15:21:13,516 INFO 
org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
2011-08-24 15:21:17,193 INFO 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
regionserver60020.compactor exiting
2011-08-24 15:21:18,727 INFO 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: 
regionserver60020.cacheFlusher exiting
2011-08-24 15:21:20,157 WARN org.apache.zookeeper.ClientCnxn: Session 
0x131ec6ce0b00003 for server null, unexpected error, closing socket 
connection and attempting reconnect
java.net.ConnectException: Connection timed out
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2011-08-24 15:21:21,919 INFO org.apache.zookeeper.ClientCnxn: Opening 
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:21,920 INFO org.apache.zookeeper.ClientCnxn: Socket 
connection established to hadoop-master.ioffer.com/10.101.101.0:2181, 
initiating session
2011-08-24 15:21:21,921 INFO org.apache.zookeeper.ClientCnxn: Unable to 
reconnect to ZooKeeper service, session 0x131ec6ce0b00003 has expired, 
closing socket connection
2011-08-24 15:21:21,921 INFO 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
This client just lost it's session with ZooKeeper, trying to reconnect.
2011-08-24 15:21:21,921 INFO 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Trying to reconnect to zookeeper
2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ZooKeeper: Initiating 
client connection, connectString=hadoop-master.ioffer.com:2181 
sessionTimeout=180000 watcher=hconnection
2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ClientCnxn: Opening 
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:21,926 INFO org.apache.zookeeper.ClientCnxn: Socket 
connection established to hadoop-master.ioffer.com/10.101.101.0:2181, 
initiating session
2011-08-24 15:21:21,935 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server 
hadoop-master.ioffer.com/10.101.101.0:2181, sessionid = 
0x131ec6ce0b000cd, negotiated timeout = 40000
2011-08-24 15:21:21,939 INFO 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
Reconnected successfully. This disconnect could have been caused by a 
network partition or a long-running GC pause, either way it's 
recommended that you verify your environment.
2011-08-24 15:21:21,939 INFO org.apache.zookeeper.ClientCnxn: 
EventThread shut down
2011-08-24 15:21:27,210 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
server serverName=hadoop1.ioffer.com,60020,1313931812841, 
load=(requests=246, regions=2, usedHeap=43, maxHeap=3983): Unhandled 
exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT 
rejected; currently processing hadoop1.ioffer.com,60020,1313931812841 as 
dead server
org.apache.hadoop.hbase.YouAreDeadException: 
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
currently processing hadoop1.ioffer.com,60020,1313931812841 as dead server
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
     at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
     at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
     at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
     at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
     at 
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:733)
     at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:594)
     at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
currently processing hadoop1.ioffer.com,60020,1313931812841 as dead server
     at 
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:201)
     at 
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:259)
     at 
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:641)
     at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
     at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
     at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
     at $Proxy5.regionServerReport(Unknown Source)
     at 
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:727)
     ... 2 more
2011-08-24 15:21:27,211 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0, 
memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=40, 
maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552, 
blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27, 
blockCacheEvictedCount=0, blockCacheHitRatio=99, 
blockCacheHitCachingRatio=99
2011-08-24 15:21:27,211 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT 
rejected; currently processing hadoop1.ioffer.com,60020,1313931812841 as 
dead server
2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
server on 60020
2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
IPC Server listener on 60020
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC 
Server handler 3 on 60020: exiting
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC 
Server handler 5 on 60020: exiting
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC 
Server handler 9 on 60020: exiting
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC 
Server handler 8 on 60020: exiting


Re: Dead Servers

Posted by Jean-Daniel Cryans <jd...@apache.org>.
> First off, thanks for you response. 26 seconds seems a bit short to time
> outs o what are some more reasonable timeouts I should set?

Agreed, but then again it really depends on your configurations.
Setting a higher timeout would only hide the issue, fixing it is much
better.

>
> This is probably the root cause since my job was pretty hefty.

Do you have metrics installed on that cluster? Debugging this issues
while being blind ain't something I fancy doing, I guess it's the same
for others.

>
> Question about swapping...
> Make sure you don't swap, the JVM never behaves well under swapping
> Is this as simple setting
>
> sysctl -w vm.swappiness=5

Setting swappiness low is good, but if you overcommit your memory it
will still swap!

>
> I know its extremely situation dependent but what would be a recommended
> memory allocation to HBase... currently I have it set to 4G?

Depends on the available RAM? Check that other thread currently going
on on this mailing list about calculating memory assignment.

J-D

Re: Dead Servers

Posted by Mark <st...@gmail.com>.
First off, thanks for you response. 26 seconds seems a bit short to time 
outs o what are some more reasonable timeouts I should set?

This is probably the root cause since my job was pretty hefty.

Make
  sure you are not CPU starving the RegionServer thread. For example, if
you are running a MapReduce job using 6 CPU-intensive tasks on a machine
  with 4 cores, you are probably starving the RegionServer enough to
create longer garbage collection pauses.



Question about swapping...

Make sure you don't swap, the JVM never behaves well under swapping



Is this as simple setting

sysctl -w vm.swappiness=5


I know its extremely situation dependent but what would be a recommended 
memory allocation to HBase... currently I have it set to 4G?

Thanks again for you help.


On 8/24/11 5:41 PM, Jean-Daniel Cryans wrote:
>> Are there performance hits for running in
>> INFO/DEBUG/? What do most people suggest?
> DEBUG until you get your HBase config under control
>
>>> 5 of our HBase region servers were killed. First off, when this happens and
>>> there are only 2 servers is there a possibility of data corruption and/or
>>> loss?
> No, unless you hit some sort of bug.
>
>>> Secondly and more importantly, why does this happen and how can I resolve it?
> The important line is:
>
>>> 2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
>>> session timed out, have not heard from server in 26666ms for sessionid
> This indicates that either your ZK server was GCing for 26 seconds or
> your region server was. Either way it ended up in:
>
>>> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> Which is 13.6.2.7 here:
> http://hbase.apache.org/book/trouble.rs.html#trouble.rs.runtime
>
> J-D

Re: Dead Servers

Posted by Jean-Daniel Cryans <jd...@apache.org>.
> Are there performance hits for running in
> INFO/DEBUG/? What do most people suggest?

DEBUG until you get your HBase config under control

>> 5 of our HBase region servers were killed. First off, when this happens and
>> there are only 2 servers is there a possibility of data corruption and/or
>> loss?

No, unless you hit some sort of bug.

>> Secondly and more importantly, why does this happen and how can I resolve it?

The important line is:

>> 2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
>> session timed out, have not heard from server in 26666ms for sessionid

This indicates that either your ZK server was GCing for 26 seconds or
your region server was. Either way it ended up in:

>> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired

Which is 13.6.2.7 here:
http://hbase.apache.org/book/trouble.rs.html#trouble.rs.runtime

J-D

Re: Dead Servers

Posted by Mark <st...@gmail.com>.
As a side note, I obviously never changed the logger level from the 
default cloudera installation. Are there performance hits for running in 
INFO/DEBUG/? What do most people suggest?

Thanks

On 8/24/11 5:19 PM, Mark wrote:
> I noticed that after running some hefty jobs on our cluster that 3 out 
> of 5 of our HBase region servers were killed. First off, when this 
> happens and there are only 2 servers is there a possibility of data 
> corruption and/or loss? Secondly and more importantly, why does this 
> happen and how can I resolve it?
>
> Thanks!
>
> Here is the relevant part of my log:
>
> 2011-08-24 15:08:34,989 DEBUG 
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66 
> MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215, 
> hits=84188, hitRatio=99.96%%, cachingAccesses=84189, 
> cachingHits=84167, cachingHitsRatio=99.97%%, evictions=0, evicted=0, 
> evictedPerRun=NaN
> 2011-08-24 15:12:03,348 DEBUG 
> org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 
> 3600000ms elapsed
> 2011-08-24 15:13:34,989 DEBUG 
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66 
> MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215, 
> hits=84188, hitRatio=99.96%%, cachingAccesses=84189, 
> cachingHits=84167, cachingHitsRatio=99.97%%, evictions=0, evicted=0, 
> evictedPerRun=NaN
> 2011-08-24 15:18:34,990 DEBUG 
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66 
> MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215, 
> hits=84188, hitRatio=99.96%%, cachingAccesses=84189, 
> cachingHits=84167, cachingHitsRatio=99.97%%, evictions=0, evicted=0, 
> evictedPerRun=NaN
> 2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client 
> session timed out, have not heard from server in 26666ms for sessionid 
> 0x131ec6ce0b00004, closing socket connection and attempting reconnect
> 2011-08-24 15:20:48,929 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:20:57,463 INFO org.apache.zookeeper.ClientCnxn: Client 
> session timed out, have not heard from server in 26666ms for sessionid 
> 0x131ec6ce0b00003, closing socket connection and attempting reconnect
> 2011-08-24 15:20:59,156 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:09,961 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x131ec6ce0b00004 for server null, unexpected error, closing socket 
> connection and attempting reconnect
> java.net.ConnectException: Connection timed out
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-08-24 15:21:11,415 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:11,416 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to hadoop-master.ioffer.com/10.101.101.0:2181, 
> initiating session
> 2011-08-24 15:21:11,445 INFO org.apache.zookeeper.ClientCnxn: Unable 
> to reconnect to ZooKeeper service, session 0x131ec6ce0b00004 has 
> expired, closing socket connection
> 2011-08-24 15:21:11,452 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
> server serverName=hadoop1.ioffer.com,60020,1313931812841, 
> load=(requests=246, regions=2, usedHeap=43, maxHeap=3983): 
> regionserver:60020-0x131ec6ce0b00004 
> regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper, 
> aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired
>     at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
>     at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
>     at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
>     at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> 2011-08-24 15:21:11,466 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
> requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0, 
> memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=42, 
> maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552, 
> blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27, 
> blockCacheEvictedCount=0, blockCacheHitRatio=99, 
> blockCacheHitCachingRatio=99
> 2011-08-24 15:21:11,467 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: 
> regionserver:60020-0x131ec6ce0b00004 
> regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper, 
> aborting
> 2011-08-24 15:21:11,467 INFO org.apache.zookeeper.ClientCnxn: 
> EventThread shut down
> 2011-08-24 15:21:11,570 INFO org.apache.hadoop.ipc.Client: Retrying 
> connect to server: hadoop-master.ioffer.com/10.101.101.0:9000. Already 
> tried 0 time(s).
> 2011-08-24 15:21:13,516 INFO 
> org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
> 2011-08-24 15:21:17,193 INFO 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
> regionserver60020.compactor exiting
> 2011-08-24 15:21:18,727 INFO 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: 
> regionserver60020.cacheFlusher exiting
> 2011-08-24 15:21:20,157 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x131ec6ce0b00003 for server null, unexpected error, closing socket 
> connection and attempting reconnect
> java.net.ConnectException: Connection timed out
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-08-24 15:21:21,919 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:21,920 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to hadoop-master.ioffer.com/10.101.101.0:2181, 
> initiating session
> 2011-08-24 15:21:21,921 INFO org.apache.zookeeper.ClientCnxn: Unable 
> to reconnect to ZooKeeper service, session 0x131ec6ce0b00003 has 
> expired, closing socket connection
> 2011-08-24 15:21:21,921 INFO 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> This client just lost it's session with ZooKeeper, trying to reconnect.
> 2011-08-24 15:21:21,921 INFO 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> Trying to reconnect to zookeeper
> 2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ZooKeeper: 
> Initiating client connection, 
> connectString=hadoop-master.ioffer.com:2181 sessionTimeout=180000 
> watcher=hconnection
> 2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:21,926 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to hadoop-master.ioffer.com/10.101.101.0:2181, 
> initiating session
> 2011-08-24 15:21:21,935 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server 
> hadoop-master.ioffer.com/10.101.101.0:2181, sessionid = 
> 0x131ec6ce0b000cd, negotiated timeout = 40000
> 2011-08-24 15:21:21,939 INFO 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> Reconnected successfully. This disconnect could have been caused by a 
> network partition or a long-running GC pause, either way it's 
> recommended that you verify your environment.
> 2011-08-24 15:21:21,939 INFO org.apache.zookeeper.ClientCnxn: 
> EventThread shut down
> 2011-08-24 15:21:27,210 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region 
> server serverName=hadoop1.ioffer.com,60020,1313931812841, 
> load=(requests=246, regions=2, usedHeap=43, maxHeap=3983): Unhandled 
> exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT 
> rejected; currently processing hadoop1.ioffer.com,60020,1313931812841 
> as dead server
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
> currently processing hadoop1.ioffer.com,60020,1313931812841 as dead 
> server
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>     at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
>     at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:733)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:594)
>     at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; 
> currently processing hadoop1.ioffer.com,60020,1313931812841 as dead 
> server
>     at 
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:201)
>     at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:259)
>     at 
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:641)
>     at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>     at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>
>     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
>     at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>     at $Proxy5.regionServerReport(Unknown Source)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:727)
>     ... 2 more
> 2011-08-24 15:21:27,211 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
> requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0, 
> memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=40, 
> maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552, 
> blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27, 
> blockCacheEvictedCount=0, blockCacheHitRatio=99, 
> blockCacheHitCachingRatio=99
> 2011-08-24 15:21:27,211 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
> exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT 
> rejected; currently processing hadoop1.ioffer.com,60020,1313931812841 
> as dead server
> 2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer: 
> Stopping server on 60020
> 2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer: 
> Stopping IPC Server listener on 60020
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC 
> Server handler 3 on 60020: exiting
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC 
> Server handler 5 on 60020: exiting
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI 
> IPC Server handler 9 on 60020: exiting
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI 
> IPC Server handler 8 on 60020: exiting
>