You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 陈加俊 <cj...@gmail.com> on 2011/04/01 18:01:32 UTC

Regionserver is crashed frequently these days

Regionserver is crashed frequently  these days.,but It worked fine many
months before these days .
Some logs of one RS's log is as follows:

2011-04-01 19:13:40,413 WARN org.apache.hadoop.hbase.regionserver.Store:
Failed open of hdfs://
master.uc.uuwatch.com:9000/hbase/cjjHTML/1494733632/page/5173469199902346167.1864097884;
presumption is that file was corrupted at flush and lost edits picked up by
commit log replay. Verify!
java.io.IOException: Cannot open filename
/hbase/cjjHTML/1864097884/page/5173469199902346167
......

2011-04-01 19:17:22,716 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
cjjHTML,http://news.ifeng.com/gundong/detail_2011_03/15/515
4913_0.shtml,1300245193111: Overloaded
2011-04-01 19:17:22,716 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_CLOSE: cjjHTML,http://news.ifeng.com/gundong/detail_2011_0
3/15/5154913_0.shtml,1300245193111: Overloaded
2011-04-01 19:17:22,716 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed cjjHTML,
http://news.ifeng.com/gundong/detail_2011_03/15/5154913_0.shtml,1300245193111
2011-04-01 19:17:22,716 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
cjjHTML,http://news.39.net/kyfx/2010124/1561005.html,1298035330808:
Overloaded
2011-04-01 19:17:22,716 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_CLOSE: cjjHTML,
http://news.39.net/kyfx/2010124/1561005.html,1298035330808: Overloaded
......

2011-04-01 22:01:49,212 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x942f0f7ae13d0000 to sun.nio.ch.SelectionKeyImpl@34819c89
java.io.IOException: TIMED OUT
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
2011-04-01 22:01:49,213 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x982f0f7ae0960001 to sun.nio.ch.SelectionKeyImpl@7e9ca589
java.io.IOException: TIMED OUT
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
2011-04-01 22:01:49,310 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:549)
        at java.lang.Thread.run(Thread.java:619)
2011-04-01 22:01:49,313 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path: null
2011-04-01 22:01:49,589 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /192.168.5.155:2181
2011-04-01 22:01:49,589 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
192.168.5.149:33651 remote=/192.168.5.155:2181]
2011-04-01 22:01:49,590 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2011-04-01 22:01:49,592 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x982f0f7ae0960001 to sun.nio.ch.SelectionKeyImpl@a5858e9
java.io.IOException: Session Expired
        at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
        at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2011-04-01 22:01:49,592 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Expired, type: None, path: null
2011-04-01 22:01:49,592 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
expired
2011-04-01 22:01:49,602 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=941, stores=1721, storefiles=1439,
storefileIndexSize=228, memstoreSize=558, compactionQueueSize=12,
usedHeap=2896, maxHeap=3995, blockCacheSize=640319368,
blockCacheFree=197676344, blockCacheCount=8309, blockCacheHitRatio=93,
fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
2011-04-01 22:01:49,853 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /192.168.5.147:2181
2011-04-01 22:01:49,854 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
192.168.5.149:50014 remote=/192.168.5.147:2181]
2011-04-01 22:01:49,854 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2011-04-01 22:01:49,856 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x942f0f7ae13d0000 to sun.nio.ch.SelectionKeyImpl@35c591b7
java.io.IOException: Session Expired
        at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
        at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2011-04-01 22:01:49,856 INFO org.apache.zookeeper.ZooKeeper: Closing
session: 0x942f0f7ae13d0000
2011-04-01 22:01:49,857 INFO org.apache.zookeeper.ClientCnxn: Closing
ClientCnxn for session: 0x942f0f7ae13d0000
2011-04-01 22:01:49,857 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
ClientCnxn for session: 0x942f0f7ae13d0000
2011-04-01 22:01:49,857 INFO org.apache.zookeeper.ZooKeeper: Session:
0x942f0f7ae13d0000 closed
2011-04-01 22:01:49,857 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2011-04-01 22:01:50,310 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020
2011-04-01 22:01:50,311 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: exiting
2011-04-01 22:01:50,311 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: exiting
......



-- 
Thanks & Best regards
jiajun

Re: Regionserver is crashed frequently these days

Posted by 陈加俊 <cj...@gmail.com>.
I will get the gc time in gc-hbase.log at the next chance,the gc log about
22:00 is lost.

On Sat, Apr 2, 2011 at 12:17 AM, Stack <st...@duboce.net> wrote:

> On Fri, Apr 1, 2011 at 9:01 AM, 陈加俊 <cj...@gmail.com> wrote:
> > 2011-04-01 19:13:40,413 WARN org.apache.hadoop.hbase.regionserver.Store:
> > Failed open of hdfs://
> >
> master.uc.uuwatch.com:9000/hbase/cjjHTML/1494733632/page/5173469199902346167.1864097884
> ;
> > presumption is that file was corrupted at flush and lost edits picked up
> by
> > commit log replay. Verify!
> > java.io.IOException: Cannot open filename
> > /hbase/cjjHTML/1864097884/page/5173469199902346167
> > ......
> >
>
> This is a case where a daughter region is unable to open its parent
> regions storefile (The daughter refers to parent storefiles for a
> period of time after initial open).  Look at what happened to the
> parent region.  Was it prematurely removed?
>
> > 2011-04-01 19:17:22,716 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
> > cjjHTML,http://news.ifeng.com/gundong/detail_2011_03/15/515
> > 4913_0.shtml,1300245193111: Overloaded
> > 2011-04-01 19:17:22,716 INFO
>
> This we've discussed.
>
> > 2011-04-01 22:01:49,212 WARN org.apache.zookeeper.ClientCnxn: Exception
> > closing session 0x942f0f7ae13d0000 to
> sun.nio.ch.SelectionKeyImpl@34819c89
> > java.io.IOException: TIMED OUT
> >        at
>
>
> This looks like straight session timeout against ZK.   Long GC pause?
>
> St.Ack
>



-- 
Thanks & Best regards
jiajun

Re: Regionserver is crashed frequently these days

Posted by Stack <st...@duboce.net>.
On Fri, Apr 1, 2011 at 9:01 AM, 陈加俊 <cj...@gmail.com> wrote:
> 2011-04-01 19:13:40,413 WARN org.apache.hadoop.hbase.regionserver.Store:
> Failed open of hdfs://
> master.uc.uuwatch.com:9000/hbase/cjjHTML/1494733632/page/5173469199902346167.1864097884;
> presumption is that file was corrupted at flush and lost edits picked up by
> commit log replay. Verify!
> java.io.IOException: Cannot open filename
> /hbase/cjjHTML/1864097884/page/5173469199902346167
> ......
>

This is a case where a daughter region is unable to open its parent
regions storefile (The daughter refers to parent storefiles for a
period of time after initial open).  Look at what happened to the
parent region.  Was it prematurely removed?

> 2011-04-01 19:17:22,716 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
> cjjHTML,http://news.ifeng.com/gundong/detail_2011_03/15/515
> 4913_0.shtml,1300245193111: Overloaded
> 2011-04-01 19:17:22,716 INFO

This we've discussed.

> 2011-04-01 22:01:49,212 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x942f0f7ae13d0000 to sun.nio.ch.SelectionKeyImpl@34819c89
> java.io.IOException: TIMED OUT
>        at


This looks like straight session timeout against ZK.   Long GC pause?

St.Ack