You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2010/06/11 07:46:00 UTC

HBase went down

Hi,
We use hbase 0.20.3
HBase went down when we ran map-reduce job.

Your comment / help is appreciated.

I didn't find such WARN in master or region server log:
We slept 117947ms, ten times longer than scheduled: 10000

Here is master log snippet:

2010-06-11 04:32:50,077 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
T3_HBASE_NO_INCR_ORCL_SQLLDR-THREEGPPSPEECHCALLS-1276229903001-0,BCBF1C0DB6FCE3695709CC6175CF512D,1276230743333
in region .META.,,1 with startcode=1276204602423, server=10.201.8.208:60020
2010-06-11 04:33:16,002 INFO org.apache.hadoop.hbase.master.ServerManager:
sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195 znode expired
2010-06-11 04:33:16,003 INFO org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
2010-06-11 04:33:16,003 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit: false,
rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2010-06-11 04:33:16,005 INFO org.apache.hadoop.hbase.regionserver.HLog:
Splitting 3 hlog(s) in hdfs://
sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
2010-06-11 04:33:16,098 WARN org.apache.hadoop.hbase.regionserver.HLog:
Empty hlog, continuing: org.apache.hadoop.fs.FileStatus@bc89a24a count=0
java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at java.io.DataInputStream.readFully(DataInputStream.java:152)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
    at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:962)
    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:872)
    at
org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:286)
    at
org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429)
2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.regionserver.HLog: hlog
file splitting completed in 1020 millis for hdfs://
sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
2010-06-11 04:33:17,024 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
meta reassignment and scanning:
2010-06-11 04:33:17,024 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: ProcessServerShutdown
reassigning ROOT region
2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.master.RegionManager:
ROOT inserted into regionsInTransition
2010-06-11 04:33:17,064 INFO org.apache.hadoop.hbase.master.RegionManager:
Assigning region -ROOT-,,0 to sjc1-hadoop6.sjc1.carrieriq.com
,60020,1276204602430
2010-06-11 04:33:20,070 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from
sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430; 1 of 2
2010-06-11 04:33:20,070 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_OPEN: -ROOT-,,0 from
sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430;
2 of 2
2010-06-11 04:33:25,891 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server: 10.201.8.208:60020,
regionname: .META.,,1, startKey: <>}
2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 161 row(s) of meta region {server:
10.201.8.208:60020, regionname: .META.,,1, startKey: <>} complete
2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2010-06-11 04:33:29,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60000, call regionServerReport(address: 10.201.8.205:60020,
startcode: 1276204602195, load: (requests=19880, regions=24, usedHeap=533,
maxHeap=3983), [Lorg.apache.hadoop.hbase.HMsg;@6c9a819d,
[Lorg.apache.hadoop.hbase.HRegionInfo;@13b3a21f) from 10.201.8.205:60755:
error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
    at
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:252)
    at
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:724)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 04:33:40,350 INFO org.apache.hadoop.hbase.master.ServerManager: 5
region servers, 1 dead, average load 27.0[sjc1-hadoop1.sjc1.carrieriq.com
,60020,1276204602195]
2010-06-11 04:33:45,892 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server: 10.201.8.210:60020,
regionname: -ROOT-,,0, startKey: <>}
2010-06-11 04:34:17,044 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit: true,
rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2010-06-11 04:34:17,044 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
meta reassignment and scanning:
2010-06-11 04:34:24,195 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server:
10.201.8.210:60020, regionname: -ROOT-,,0, startKey: <>} complete

10.201.8.205 is one of the region servers which went down.
Here is region server log from 10.201.8.205

2010-06-11 04:32:42,734 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
2010-06-11 04:32:42,734 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
2010-06-11 04:32:43,441 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561/1645281535
available; sequence id is 342755
2010-06-11 04:32:43,515 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
2010-06-11 04:33:10,793 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@69267649
java.io.IOException: TIMED OUT
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
2010-06-11 04:33:10,895 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path: null
2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181
2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/
10.201.8.205:31261 remote=sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181]
2010-06-11 04:33:29,677 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2010-06-11 04:33:29,678 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Expired, type: None, path: null
2010-06-11 04:33:29,678 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@32bc4106
java.io.IOException: Session Expired
    at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2010-06-11 04:33:29,678 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
expired
2010-06-11 04:33:29,678 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020, call put([B@2ba504e0,
[Lorg.apache.hadoop.hbase.client.Put;@7b420664) from 10.201.8.205:24720:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020, call put([B@1d014ece,
[Lorg.apache.hadoop.hbase.client.Put;@40dcc542) from 10.201.8.210:3754:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020, call put([B@2d4ed3d0,
[Lorg.apache.hadoop.hbase.client.Put;@123413ed) from 10.201.8.210:3755:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

Later, I restarted region server but I saw:

2010-06-11 05:05:34,087 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed TED-THREEGPPSPEECHCALLS-1276213362068-17,,1276213463847
2010-06-11 05:05:34,087 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
MSG_REGION_CLOSE: MetaTable-15,,1276213160085: Overloaded
2010-06-11 05:05:34,088 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed MetaTable-15,,1276213160085
2010-06-11 05:06:07,818 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2010-06-11 05:06:07,820 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020, call getClosestRowBefore([B@51f88fbd, [B@15a4eb56,
[B@7e566633) from 10.201.8.212:2687: error:
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
    at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

Re: HBase went down

Posted by Ted Yu <yu...@gmail.com>.
Sujee:
Can you explain why zookeeper.session.timeout should be increased ?

I don't see the following in server logs:
We slept 117947ms, ten times longer than scheduled: 10000

Our HBase cluster has 7 region servers.
Each one is 8 core Intel(R) Xeon(R) with 16 GB memory.

On Fri, Jun 11, 2010 at 8:51 AM, Sujee Maniyam <su...@sujee.net> wrote:

> Can you post some details about your cluster : number of machines, type of
> machines,   how much data you are going through ..etc
>
> try increasing the zookeeper timeout
>
> in Hbase/conf/hbase-site.xml
> ----
>  <property>
>        <name>zookeeper.session.timeout</name>
>        <value>60000</value>
>    </property>
>
>     <property>
>       <name>hbase.regionserver.handler.count</name>
>       <value>100</value>
>      </property>
>  <property>
> -----
>
> The second property increases handlers on region servers to handle data
> requests.  might worth a shot.
>
>
>
> http://sujee.net
>
>
> On Thu, Jun 10, 2010 at 10:46 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Hi,
> > We use hbase 0.20.3
> > HBase went down when we ran map-reduce job.
> >
> > Your comment / help is appreciated.
> >
> > I didn't find such WARN in master or region server log:
> > We slept 117947ms, ten times longer than scheduled: 10000
> >
> > Here is master log snippet:
> >
> > 2010-06-11 04:32:50,077 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> >
> >
> T3_HBASE_NO_INCR_ORCL_SQLLDR-THREEGPPSPEECHCALLS-1276229903001-0,BCBF1C0DB6FCE3695709CC6175CF512D,1276230743333
> > in region .META.,,1 with startcode=1276204602423, server=
> > 10.201.8.208:60020
> > 2010-06-11 04:33:16,002 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> > sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195 znode expired
> > 2010-06-11 04:33:16,003 INFO
> org.apache.hadoop.hbase.master.RegionManager:
> > -ROOT- region unset (but not set to be reassigned)
> > 2010-06-11 04:33:16,003 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> > server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit:
> > false,
> > rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> > 2010-06-11 04:33:16,005 INFO org.apache.hadoop.hbase.regionserver.HLog:
> > Splitting 3 hlog(s) in hdfs://
> >
> >
> sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
> > 2010-06-11 04:33:16,098 WARN org.apache.hadoop.hbase.regionserver.HLog:
> > Empty hlog, continuing: org.apache.hadoop.fs.FileStatus@bc89a24a count=0
> > java.io.EOFException
> >    at java.io.DataInputStream.readFully(DataInputStream.java:180)
> >    at java.io.DataInputStream.readFully(DataInputStream.java:152)
> >    at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
> >    at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
> >    at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> >    at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> >    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:962)
> >    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:872)
> >    at
> >
> >
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:286)
> >    at
> > org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494)
> >    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429)
> > 2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.regionserver.HLog:
> > hlog
> > file splitting completed in 1020 millis for hdfs://
> >
> >
> sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
> > 2010-06-11 04:33:17,024 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
> > meta reassignment and scanning:
> > 2010-06-11 04:33:17,024 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation:
> ProcessServerShutdown
> > reassigning ROOT region
> > 2010-06-11 04:33:17,024 INFO
> org.apache.hadoop.hbase.master.RegionManager:
> > -ROOT- region unset (but not set to be reassigned)
> > 2010-06-11 04:33:17,024 INFO
> org.apache.hadoop.hbase.master.RegionManager:
> > ROOT inserted into regionsInTransition
> > 2010-06-11 04:33:17,064 INFO
> org.apache.hadoop.hbase.master.RegionManager:
> > Assigning region -ROOT-,,0 to sjc1-hadoop6.sjc1.carrieriq.com
> > ,60020,1276204602430
> > 2010-06-11 04:33:20,070 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> > Processing MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from
> > sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430; 1 of 2
> > 2010-06-11 04:33:20,070 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> > Processing MSG_REPORT_OPEN: -ROOT-,,0 from
> > sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430;
> > 2 of 2
> > 2010-06-11 04:33:25,891 INFO org.apache.hadoop.hbase.master.BaseScanner:
> > RegionManager.metaScanner scanning meta region {server:
> 10.201.8.208:60020
> > ,
> > regionname: .META.,,1, startKey: <>}
> > 2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner:
> > RegionManager.metaScanner scan of 161 row(s) of meta region {server:
> > 10.201.8.208:60020, regionname: .META.,,1, startKey: <>} complete
> > 2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner:
> > All
> > 1 .META. region(s) scanned
> > 2010-06-11 04:33:29,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 7 on 60000, call regionServerReport(address: 10.201.8.205:60020,
> > startcode: 1276204602195, load: (requests=19880, regions=24,
> usedHeap=533,
> > maxHeap=3983), [Lorg.apache.hadoop.hbase.HMsg;@6c9a819d,
> > [Lorg.apache.hadoop.hbase.HRegionInfo;@13b3a21f) from 10.201.8.205:60755
> :
> > error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> > org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> >    at
> >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:252)
> >    at
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:724)
> >    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> >    at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
> >    at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> > 2010-06-11 04:33:40,350 INFO
> org.apache.hadoop.hbase.master.ServerManager:
> > 5
> > region servers, 1 dead, average load 27.0[
> sjc1-hadoop1.sjc1.carrieriq.com
> > ,60020,1276204602195]
> > 2010-06-11 04:33:45,892 INFO org.apache.hadoop.hbase.master.BaseScanner:
> > RegionManager.rootScanner scanning meta region {server:
> 10.201.8.210:60020
> > ,
> > regionname: -ROOT-,,0, startKey: <>}
> > 2010-06-11 04:34:17,044 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> > server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit:
> > true,
> > rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> > 2010-06-11 04:34:17,044 INFO
> > org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
> > meta reassignment and scanning:
> > 2010-06-11 04:34:24,195 INFO org.apache.hadoop.hbase.master.BaseScanner:
> > RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> > 10.201.8.210:60020, regionname: -ROOT-,,0, startKey: <>} complete
> >
> > 10.201.8.205 is one of the region servers which went down.
> > Here is region server log from 10.201.8.205
> >
> > 2010-06-11 04:32:42,734 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> >
> >
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
> > 2010-06-11 04:32:42,734 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> > MSG_REGION_OPEN:
> >
> >
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
> > 2010-06-11 04:32:43,441 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > region
> >
> >
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561/1645281535
> > available; sequence id is 342755
> > 2010-06-11 04:32:43,515 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Starting compaction on region
> >
> >
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
> > 2010-06-11 04:33:10,793 WARN org.apache.zookeeper.ClientCnxn: Exception
> > closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@69267649
> > java.io.IOException: TIMED OUT
> >    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> > 2010-06-11 04:33:10,895 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> > state: Disconnected, type: None, path: null
> > 2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Attempting
> > connection to server sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181
> > 2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Priming
> > connection to java.nio.channels.SocketChannel[connected local=/
> > 10.201.8.205:31261 remote=
> > sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181]
> > 2010-06-11 04:33:29,677 INFO org.apache.zookeeper.ClientCnxn: Server
> > connection successful
> > 2010-06-11 04:33:29,678 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> > state: Expired, type: None, path: null
> > 2010-06-11 04:33:29,678 WARN org.apache.zookeeper.ClientCnxn: Exception
> > closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@32bc4106
> > java.io.IOException: Session Expired
> >    at
> >
> >
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
> >    at
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
> >    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
> > 2010-06-11 04:33:29,678 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
> > expired
> > 2010-06-11 04:33:29,678 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 0 on 60020, call put([B@2ba504e0,
> > [Lorg.apache.hadoop.hbase.client.Put;@7b420664) from 10.201.8.205:24720:
> > error: java.io.IOException: Server not running, aborting
> > java.io.IOException: Server not running, aborting
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
> >    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >    at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
> >    at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> > 2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 7 on 60020, call put([B@1d014ece,
> > [Lorg.apache.hadoop.hbase.client.Put;@40dcc542) from 10.201.8.210:3754:
> > error: java.io.IOException: Server not running, aborting
> > java.io.IOException: Server not running, aborting
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
> >    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >    at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
> >    at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> > 2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 4 on 60020, call put([B@2d4ed3d0,
> > [Lorg.apache.hadoop.hbase.client.Put;@123413ed) from 10.201.8.210:3755:
> > error: java.io.IOException: Server not running, aborting
> > java.io.IOException: Server not running, aborting
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
> >    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
> >    at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
> >    at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> >
> > Later, I restarted region server but I saw:
> >
> > 2010-06-11 05:05:34,087 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed TED-THREEGPPSPEECHCALLS-1276213362068-17,,1276213463847
> > 2010-06-11 05:05:34,087 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> > MSG_REGION_CLOSE: MetaTable-15,,1276213160085: Overloaded
> > 2010-06-11 05:05:34,088 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed MetaTable-15,,1276213160085
> > 2010-06-11 05:06:07,818 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
> >    at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> > 2010-06-11 05:06:07,820 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 0 on 60020, call getClosestRowBefore([B@51f88fbd, [B@15a4eb56,
> > [B@7e566633) from 10.201.8.212:2687: error:
> > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
> >    at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
> >    at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> >
>

Re: HBase went down

Posted by Sujee Maniyam <su...@sujee.net>.
Can you post some details about your cluster : number of machines, type of
machines,   how much data you are going through ..etc

try increasing the zookeeper timeout

in Hbase/conf/hbase-site.xml
----
  <property>
        <name>zookeeper.session.timeout</name>
        <value>60000</value>
    </property>

     <property>
       <name>hbase.regionserver.handler.count</name>
       <value>100</value>
      </property>
  <property>
-----

The second property increases handlers on region servers to handle data
requests.  might worth a shot.



http://sujee.net


On Thu, Jun 10, 2010 at 10:46 PM, Ted Yu <yu...@gmail.com> wrote:

> Hi,
> We use hbase 0.20.3
> HBase went down when we ran map-reduce job.
>
> Your comment / help is appreciated.
>
> I didn't find such WARN in master or region server log:
> We slept 117947ms, ten times longer than scheduled: 10000
>
> Here is master log snippet:
>
> 2010-06-11 04:32:50,077 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
>
> T3_HBASE_NO_INCR_ORCL_SQLLDR-THREEGPPSPEECHCALLS-1276229903001-0,BCBF1C0DB6FCE3695709CC6175CF512D,1276230743333
> in region .META.,,1 with startcode=1276204602423, server=
> 10.201.8.208:60020
> 2010-06-11 04:33:16,002 INFO org.apache.hadoop.hbase.master.ServerManager:
> sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195 znode expired
> 2010-06-11 04:33:16,003 INFO org.apache.hadoop.hbase.master.RegionManager:
> -ROOT- region unset (but not set to be reassigned)
> 2010-06-11 04:33:16,003 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit:
> false,
> rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> 2010-06-11 04:33:16,005 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Splitting 3 hlog(s) in hdfs://
>
> sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
> 2010-06-11 04:33:16,098 WARN org.apache.hadoop.hbase.regionserver.HLog:
> Empty hlog, continuing: org.apache.hadoop.fs.FileStatus@bc89a24a count=0
> java.io.EOFException
>    at java.io.DataInputStream.readFully(DataInputStream.java:180)
>    at java.io.DataInputStream.readFully(DataInputStream.java:152)
>    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)
>    at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
>    at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>    at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:962)
>    at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:872)
>    at
>
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:286)
>    at
> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494)
>    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429)
> 2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.regionserver.HLog:
> hlog
> file splitting completed in 1020 millis for hdfs://
>
> sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195
> 2010-06-11 04:33:17,024 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
> meta reassignment and scanning:
> 2010-06-11 04:33:17,024 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: ProcessServerShutdown
> reassigning ROOT region
> 2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.master.RegionManager:
> -ROOT- region unset (but not set to be reassigned)
> 2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.master.RegionManager:
> ROOT inserted into regionsInTransition
> 2010-06-11 04:33:17,064 INFO org.apache.hadoop.hbase.master.RegionManager:
> Assigning region -ROOT-,,0 to sjc1-hadoop6.sjc1.carrieriq.com
> ,60020,1276204602430
> 2010-06-11 04:33:20,070 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from
> sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430; 1 of 2
> 2010-06-11 04:33:20,070 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_OPEN: -ROOT-,,0 from
> sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430;
> 2 of 2
> 2010-06-11 04:33:25,891 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scanning meta region {server: 10.201.8.208:60020
> ,
> regionname: .META.,,1, startKey: <>}
> 2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 161 row(s) of meta region {server:
> 10.201.8.208:60020, regionname: .META.,,1, startKey: <>} complete
> 2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner:
> All
> 1 .META. region(s) scanned
> 2010-06-11 04:33:29,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60000, call regionServerReport(address: 10.201.8.205:60020,
> startcode: 1276204602195, load: (requests=19880, regions=24, usedHeap=533,
> maxHeap=3983), [Lorg.apache.hadoop.hbase.HMsg;@6c9a819d,
> [Lorg.apache.hadoop.hbase.HRegionInfo;@13b3a21f) from 10.201.8.205:60755:
> error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>    at
>
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:252)
>    at
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:724)
>    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>    at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> 2010-06-11 04:33:40,350 INFO org.apache.hadoop.hbase.master.ServerManager:
> 5
> region servers, 1 dead, average load 27.0[sjc1-hadoop1.sjc1.carrieriq.com
> ,60020,1276204602195]
> 2010-06-11 04:33:45,892 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scanning meta region {server: 10.201.8.210:60020
> ,
> regionname: -ROOT-,,0, startKey: <>}
> 2010-06-11 04:34:17,044 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of
> server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit:
> true,
> rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
> 2010-06-11 04:34:17,044 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete,
> meta reassignment and scanning:
> 2010-06-11 04:34:24,195 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.rootScanner scan of 1 row(s) of meta region {server:
> 10.201.8.210:60020, regionname: -ROOT-,,0, startKey: <>} complete
>
> 10.201.8.205 is one of the region servers which went down.
> Here is region server log from 10.201.8.205
>
> 2010-06-11 04:32:42,734 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
> 2010-06-11 04:32:42,734 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN:
>
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
> 2010-06-11 04:32:43,441 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region
>
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561/1645281535
> available; sequence id is 342755
> 2010-06-11 04:32:43,515 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on region
>
> T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561
> 2010-06-11 04:33:10,793 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@69267649
> java.io.IOException: TIMED OUT
>    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
> 2010-06-11 04:33:10,895 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Disconnected, type: None, path: null
> 2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Attempting
> connection to server sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181
> 2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Priming
> connection to java.nio.channels.SocketChannel[connected local=/
> 10.201.8.205:31261 remote=
> sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181]
> 2010-06-11 04:33:29,677 INFO org.apache.zookeeper.ClientCnxn: Server
> connection successful
> 2010-06-11 04:33:29,678 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Expired, type: None, path: null
> 2010-06-11 04:33:29,678 WARN org.apache.zookeeper.ClientCnxn: Exception
> closing session 0x12923b8c6b50001 to sun.nio.ch.SelectionKeyImpl@32bc4106
> java.io.IOException: Session Expired
>    at
>
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
>    at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
>    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
> 2010-06-11 04:33:29,678 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
> expired
> 2010-06-11 04:33:29,678 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020, call put([B@2ba504e0,
> [Lorg.apache.hadoop.hbase.client.Put;@7b420664) from 10.201.8.205:24720:
> error: java.io.IOException: Server not running, aborting
> java.io.IOException: Server not running, aborting
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
>    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>    at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> 2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60020, call put([B@1d014ece,
> [Lorg.apache.hadoop.hbase.client.Put;@40dcc542) from 10.201.8.210:3754:
> error: java.io.IOException: Server not running, aborting
> java.io.IOException: Server not running, aborting
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
>    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>    at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> 2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 4 on 60020, call put([B@2d4ed3d0,
> [Lorg.apache.hadoop.hbase.client.Put;@123413ed) from 10.201.8.210:3755:
> error: java.io.IOException: Server not running, aborting
> java.io.IOException: Server not running, aborting
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771)
>    at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>    at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>
> Later, I restarted region server but I saw:
>
> 2010-06-11 05:05:34,087 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed TED-THREEGPPSPEECHCALLS-1276213362068-17,,1276213463847
> 2010-06-11 05:05:34,087 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_CLOSE: MetaTable-15,,1276213160085: Overloaded
> 2010-06-11 05:05:34,088 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed MetaTable-15,,1276213160085
> 2010-06-11 05:06:07,818 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> 2010-06-11 05:06:07,820 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020, call getClosestRowBefore([B@51f88fbd, [B@15a4eb56,
> [B@7e566633) from 10.201.8.212:2687: error:
> org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
>    at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
>    at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>