You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mark <st...@gmail.com> on 2011/08/25 02:19:06 UTC
Dead Servers
I noticed that after running some hefty jobs on our cluster that 3 out
of 5 of our HBase region servers were killed. First off, when this
happens and there are only 2 servers is there a possibility of data
corruption and/or loss? Secondly and more importantly, why does this
happen and how can I resolve it?
Thanks!
Here is the relevant part of my log:
2011-08-24 15:08:34,989 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66
MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215,
hits=84188, hitRatio=99.96%%, cachingAccesses=84189, cachingHits=84167,
cachingHitsRatio=99.97%%, evictions=0, evicted=0, evictedPerRun=NaN
2011-08-24 15:12:03,348 DEBUG
org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period
3600000ms elapsed
2011-08-24 15:13:34,989 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66
MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215,
hits=84188, hitRatio=99.96%%, cachingAccesses=84189, cachingHits=84167,
cachingHitsRatio=99.97%%, evictions=0, evicted=0, evictedPerRun=NaN
2011-08-24 15:18:34,990 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66
MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215,
hits=84188, hitRatio=99.96%%, cachingAccesses=84189, cachingHits=84167,
cachingHitsRatio=99.97%%, evictions=0, evicted=0, evictedPerRun=NaN
2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 26666ms for sessionid
0x131ec6ce0b00004, closing socket connection and attempting reconnect
2011-08-24 15:20:48,929 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:20:57,463 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 26666ms for sessionid
0x131ec6ce0b00003, closing socket connection and attempting reconnect
2011-08-24 15:20:59,156 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:09,961 WARN org.apache.zookeeper.ClientCnxn: Session
0x131ec6ce0b00004 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2011-08-24 15:21:11,415 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:11,416 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop-master.ioffer.com/10.101.101.0:2181,
initiating session
2011-08-24 15:21:11,445 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x131ec6ce0b00004 has expired,
closing socket connection
2011-08-24 15:21:11,452 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
server serverName=hadoop1.ioffer.com,60020,1313931812841,
load=(requests=246, regions=2, usedHeap=43, maxHeap=3983):
regionserver:60020-0x131ec6ce0b00004
regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper,
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-08-24 15:21:11,466 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0,
memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=42,
maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552,
blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27,
blockCacheEvictedCount=0, blockCacheHitRatio=99,
blockCacheHitCachingRatio=99
2011-08-24 15:21:11,467 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
regionserver:60020-0x131ec6ce0b00004
regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper,
aborting
2011-08-24 15:21:11,467 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-08-24 15:21:11,570 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop-master.ioffer.com/10.101.101.0:9000. Already
tried 0 time(s).
2011-08-24 15:21:13,516 INFO
org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
2011-08-24 15:21:17,193 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver60020.compactor exiting
2011-08-24 15:21:18,727 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
regionserver60020.cacheFlusher exiting
2011-08-24 15:21:20,157 WARN org.apache.zookeeper.ClientCnxn: Session
0x131ec6ce0b00003 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2011-08-24 15:21:21,919 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:21,920 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop-master.ioffer.com/10.101.101.0:2181,
initiating session
2011-08-24 15:21:21,921 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x131ec6ce0b00003 has expired,
closing socket connection
2011-08-24 15:21:21,921 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
This client just lost it's session with ZooKeeper, trying to reconnect.
2011-08-24 15:21:21,921 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Trying to reconnect to zookeeper
2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hadoop-master.ioffer.com:2181
sessionTimeout=180000 watcher=hconnection
2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
2011-08-24 15:21:21,926 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop-master.ioffer.com/10.101.101.0:2181,
initiating session
2011-08-24 15:21:21,935 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server
hadoop-master.ioffer.com/10.101.101.0:2181, sessionid =
0x131ec6ce0b000cd, negotiated timeout = 40000
2011-08-24 15:21:21,939 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Reconnected successfully. This disconnect could have been caused by a
network partition or a long-running GC pause, either way it's
recommended that you verify your environment.
2011-08-24 15:21:21,939 INFO org.apache.zookeeper.ClientCnxn:
EventThread shut down
2011-08-24 15:21:27,210 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
server serverName=hadoop1.ioffer.com,60020,1313931812841,
load=(requests=246, regions=2, usedHeap=43, maxHeap=3983): Unhandled
exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
rejected; currently processing hadoop1.ioffer.com,60020,1313931812841 as
dead server
org.apache.hadoop.hbase.YouAreDeadException:
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
currently processing hadoop1.ioffer.com,60020,1313931812841 as dead server
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:733)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:594)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
currently processing hadoop1.ioffer.com,60020,1313931812841 as dead server
at
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:201)
at
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:259)
at
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:641)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy5.regionServerReport(Unknown Source)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:727)
... 2 more
2011-08-24 15:21:27,211 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0,
memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=40,
maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552,
blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27,
blockCacheEvictedCount=0, blockCacheHitRatio=99,
blockCacheHitCachingRatio=99
2011-08-24 15:21:27,211 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled
exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
rejected; currently processing hadoop1.ioffer.com,60020,1313931812841 as
dead server
2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020
2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server listener on 60020
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 3 on 60020: exiting
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 5 on 60020: exiting
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 9 on 60020: exiting
2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 8 on 60020: exiting
Re: Dead Servers
Posted by Jean-Daniel Cryans <jd...@apache.org>.
> First off, thanks for you response. 26 seconds seems a bit short to time
> outs o what are some more reasonable timeouts I should set?
Agreed, but then again it really depends on your configurations.
Setting a higher timeout would only hide the issue, fixing it is much
better.
>
> This is probably the root cause since my job was pretty hefty.
Do you have metrics installed on that cluster? Debugging this issues
while being blind ain't something I fancy doing, I guess it's the same
for others.
>
> Question about swapping...
> Make sure you don't swap, the JVM never behaves well under swapping
> Is this as simple setting
>
> sysctl -w vm.swappiness=5
Setting swappiness low is good, but if you overcommit your memory it
will still swap!
>
> I know its extremely situation dependent but what would be a recommended
> memory allocation to HBase... currently I have it set to 4G?
Depends on the available RAM? Check that other thread currently going
on on this mailing list about calculating memory assignment.
J-D
Re: Dead Servers
Posted by Mark <st...@gmail.com>.
First off, thanks for you response. 26 seconds seems a bit short to time
outs o what are some more reasonable timeouts I should set?
This is probably the root cause since my job was pretty hefty.
Make
sure you are not CPU starving the RegionServer thread. For example, if
you are running a MapReduce job using 6 CPU-intensive tasks on a machine
with 4 cores, you are probably starving the RegionServer enough to
create longer garbage collection pauses.
Question about swapping...
Make sure you don't swap, the JVM never behaves well under swapping
Is this as simple setting
sysctl -w vm.swappiness=5
I know its extremely situation dependent but what would be a recommended
memory allocation to HBase... currently I have it set to 4G?
Thanks again for you help.
On 8/24/11 5:41 PM, Jean-Daniel Cryans wrote:
>> Are there performance hits for running in
>> INFO/DEBUG/? What do most people suggest?
> DEBUG until you get your HBase config under control
>
>>> 5 of our HBase region servers were killed. First off, when this happens and
>>> there are only 2 servers is there a possibility of data corruption and/or
>>> loss?
> No, unless you hit some sort of bug.
>
>>> Secondly and more importantly, why does this happen and how can I resolve it?
> The important line is:
>
>>> 2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
>>> session timed out, have not heard from server in 26666ms for sessionid
> This indicates that either your ZK server was GCing for 26 seconds or
> your region server was. Either way it ended up in:
>
>>> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> Which is 13.6.2.7 here:
> http://hbase.apache.org/book/trouble.rs.html#trouble.rs.runtime
>
> J-D
Re: Dead Servers
Posted by Jean-Daniel Cryans <jd...@apache.org>.
> Are there performance hits for running in
> INFO/DEBUG/? What do most people suggest?
DEBUG until you get your HBase config under control
>> 5 of our HBase region servers were killed. First off, when this happens and
>> there are only 2 servers is there a possibility of data corruption and/or
>> loss?
No, unless you hit some sort of bug.
>> Secondly and more importantly, why does this happen and how can I resolve it?
The important line is:
>> 2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
>> session timed out, have not heard from server in 26666ms for sessionid
This indicates that either your ZK server was GCing for 26 seconds or
your region server was. Either way it ended up in:
>> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
Which is 13.6.2.7 here:
http://hbase.apache.org/book/trouble.rs.html#trouble.rs.runtime
J-D
Re: Dead Servers
Posted by Mark <st...@gmail.com>.
As a side note, I obviously never changed the logger level from the
default cloudera installation. Are there performance hits for running in
INFO/DEBUG/? What do most people suggest?
Thanks
On 8/24/11 5:19 PM, Mark wrote:
> I noticed that after running some hefty jobs on our cluster that 3 out
> of 5 of our HBase region servers were killed. First off, when this
> happens and there are only 2 servers is there a possibility of data
> corruption and/or loss? Secondly and more importantly, why does this
> happen and how can I resolve it?
>
> Thanks!
>
> Here is the relevant part of my log:
>
> 2011-08-24 15:08:34,989 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66
> MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215,
> hits=84188, hitRatio=99.96%%, cachingAccesses=84189,
> cachingHits=84167, cachingHitsRatio=99.97%%, evictions=0, evicted=0,
> evictedPerRun=NaN
> 2011-08-24 15:12:03,348 DEBUG
> org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period
> 3600000ms elapsed
> 2011-08-24 15:13:34,989 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66
> MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215,
> hits=84188, hitRatio=99.96%%, cachingAccesses=84189,
> cachingHits=84167, cachingHitsRatio=99.97%%, evictions=0, evicted=0,
> evictedPerRun=NaN
> 2011-08-24 15:18:34,990 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.66
> MB, free=790.02 MB, max=796.67 MB, blocks=22, accesses=84215,
> hits=84188, hitRatio=99.96%%, cachingAccesses=84189,
> cachingHits=84167, cachingHitsRatio=99.97%%, evictions=0, evicted=0,
> evictedPerRun=NaN
> 2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 26666ms for sessionid
> 0x131ec6ce0b00004, closing socket connection and attempting reconnect
> 2011-08-24 15:20:48,929 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:20:57,463 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 26666ms for sessionid
> 0x131ec6ce0b00003, closing socket connection and attempting reconnect
> 2011-08-24 15:20:59,156 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:09,961 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x131ec6ce0b00004 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection timed out
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-08-24 15:21:11,415 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:11,416 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop-master.ioffer.com/10.101.101.0:2181,
> initiating session
> 2011-08-24 15:21:11,445 INFO org.apache.zookeeper.ClientCnxn: Unable
> to reconnect to ZooKeeper service, session 0x131ec6ce0b00004 has
> expired, closing socket connection
> 2011-08-24 15:21:11,452 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server serverName=hadoop1.ioffer.com,60020,1313931812841,
> load=(requests=246, regions=2, usedHeap=43, maxHeap=3983):
> regionserver:60020-0x131ec6ce0b00004
> regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper,
> aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
> at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> 2011-08-24 15:21:11,466 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0,
> memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=42,
> maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552,
> blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27,
> blockCacheEvictedCount=0, blockCacheHitRatio=99,
> blockCacheHitCachingRatio=99
> 2011-08-24 15:21:11,467 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
> regionserver:60020-0x131ec6ce0b00004
> regionserver:60020-0x131ec6ce0b00004 received expired from ZooKeeper,
> aborting
> 2011-08-24 15:21:11,467 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-08-24 15:21:11,570 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop-master.ioffer.com/10.101.101.0:9000. Already
> tried 0 time(s).
> 2011-08-24 15:21:13,516 INFO
> org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
> 2011-08-24 15:21:17,193 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver60020.compactor exiting
> 2011-08-24 15:21:18,727 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver60020.cacheFlusher exiting
> 2011-08-24 15:21:20,157 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x131ec6ce0b00003 for server null, unexpected error, closing socket
> connection and attempting reconnect
> java.net.ConnectException: Connection timed out
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-08-24 15:21:21,919 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:21,920 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop-master.ioffer.com/10.101.101.0:2181,
> initiating session
> 2011-08-24 15:21:21,921 INFO org.apache.zookeeper.ClientCnxn: Unable
> to reconnect to ZooKeeper service, session 0x131ec6ce0b00003 has
> expired, closing socket connection
> 2011-08-24 15:21:21,921 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> This client just lost it's session with ZooKeeper, trying to reconnect.
> 2011-08-24 15:21:21,921 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Trying to reconnect to zookeeper
> 2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ZooKeeper:
> Initiating client connection,
> connectString=hadoop-master.ioffer.com:2181 sessionTimeout=180000
> watcher=hconnection
> 2011-08-24 15:21:21,923 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop-master.ioffer.com/10.101.101.0:2181
> 2011-08-24 15:21:21,926 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop-master.ioffer.com/10.101.101.0:2181,
> initiating session
> 2011-08-24 15:21:21,935 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server
> hadoop-master.ioffer.com/10.101.101.0:2181, sessionid =
> 0x131ec6ce0b000cd, negotiated timeout = 40000
> 2011-08-24 15:21:21,939 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Reconnected successfully. This disconnect could have been caused by a
> network partition or a long-running GC pause, either way it's
> recommended that you verify your environment.
> 2011-08-24 15:21:21,939 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-08-24 15:21:27,210 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server serverName=hadoop1.ioffer.com,60020,1313931812841,
> load=(requests=246, regions=2, usedHeap=43, maxHeap=3983): Unhandled
> exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> rejected; currently processing hadoop1.ioffer.com,60020,1313931812841
> as dead server
> org.apache.hadoop.hbase.YouAreDeadException:
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> currently processing hadoop1.ioffer.com,60020,1313931812841 as dead
> server
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:733)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:594)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> currently processing hadoop1.ioffer.com,60020,1313931812841 as dead
> server
> at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:201)
> at
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:259)
> at
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:641)
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> at $Proxy5.regionServerReport(Unknown Source)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:727)
> ... 2 more
> 2011-08-24 15:21:27,211 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> requests=82, regions=2, stores=2, storefiles=1, storefileIndexSize=0,
> memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=40,
> maxHeap=3983, blockCacheSize=6980720, blockCacheFree=828393552,
> blockCacheCount=22, blockCacheHitCount=84188, blockCacheMissCount=27,
> blockCacheEvictedCount=0, blockCacheHitRatio=99,
> blockCacheHitCachingRatio=99
> 2011-08-24 15:21:27,211 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled
> exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> rejected; currently processing hadoop1.ioffer.com,60020,1313931812841
> as dead server
> 2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping server on 60020
> 2011-08-24 15:21:27,211 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping IPC Server listener on 60020
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 3 on 60020: exiting
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 5 on 60020: exiting
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 9 on 60020: exiting
> 2011-08-24 15:21:27,212 INFO org.apache.hadoop.ipc.HBaseServer: PRI
> IPC Server handler 8 on 60020: exiting
>