You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vidhyashankar Venkataraman <vi...@yahoo-inc.com> on 2010/08/21 01:52:48 UTC

Regions offlined..

I am seeing a couple of regions offlined by the master because of an exception (attached below) at the RS to which the master tried to assign...

 The following jira says the issue has been resolved: But the change is in 0.90.. I am using 0.89 right now: Can you guys let me know of what changes went into  0.89 and what did not?

https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891806#action_12891806

Thank you
Vidhya


2010-08-20 19:18:27,333 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
2010-08-20 19:18:27,335 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.net,60020,1282326954084>Failed to write data to ZooKeeper
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
        at java.lang.Thread.run(Thread.java:619)
2010-08-20 19:18:27,335 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1072)
        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
        ... 5 more
2010-08-20 19:18:27,336 ERROR org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of region 5da7abbffde229aaab56382c3812363d
2010-08-20 19:18:27,337 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with [RS2ZK_REGION_CLOSED] expected version = 2




Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
What Stack said, and try setting your split size bigger on your tables
in order to limit the number of them. Bigger files = less smaller
files = less occupied xcievers to answer request to those files. See
the help in the shell for "alter", look for the MAX_FILESIZE value
(which is in bytes, defaults to 256MB, and try 1GB).

J-D

On Wed, Sep 1, 2010 at 10:25 PM, Stack <st...@duboce.net> wrote:
> Sounds like 2047 is not enough.  Up it again.  4k?
> St.Ack
>
> 2010/9/1 xiujin yang <xi...@hotmail.com>:
>>
>> Thank you J-D.
>>
>>
>> I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047"
>>
>>
>> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
>> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
>>    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>>    at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> 172:
>> http://pastebin.com/cdw2svHT
>>
>> 177:
>> http://pastebin.com/iA3jxfuq
>>
>> Our cluster 4CPU & 6G RAM.  is as following:
>> HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX?
>>
>> Hadoop
>> 192.168.158.176 Master
>>
>>
>> 192.168.158.171 Slave
>> 192.168.158.172 Slave
>>
>> 192.168.158.174 Slave
>>
>> 192.168.158.177 Slave & SNN
>>
>> 192.168.158.180 Slave
>>
>> 192.168.158.186 Slave
>>
>>
>> HBase Only
>> 192.168.158.179  HMaster & RS & ZK
>>
>>
>> 192.168.158.187  RS & ZK
>>
>>
>> 192.168.158.188  RS & ZK
>>
>>
>>
>>
>> Thank you in advance.
>>
>> -- Xiujin Yang.
>> -----------------------------------------------------------------
>> My linkedin: http://cn.linkedin.com/in/xiujinyang
>>
>>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>>> Subject: Re: Region servers down...
>>> From: jdcryans@apache.org
>>> To: user@hbase.apache.org
>>>
>>> This is errors coming from HDFS, I would start looking at the datanode
>>> log on the same machine for any exceptions thrown at the same time.
>>> Also make sure your cluster is properly configured according to the
>>> last bullet point in the requirements
>>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>>
>>> J-D
>>>
>>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>>> >
>>> >
>>> > HBase: 0.20.6
>>> > Hadoop: 0.20.2
>>> >
>>> > After I upgrage to 0.20.6,
>>> > It run no more than one week and one Region server down again.
>>> >
>>> > Please check HBase log:
>>> >
>>> >
>>> > http://pastebin.com/J9LugZ17
>>> >
>>> >
>>> >
>>> > HBase out :
>>> > http://pastebin.com/QKbpSMwq
>>> >
>>> >
>>> > Thank you in advance.
>>> >
>>> > Best,
>>> >
>>> > -- Xiujin Yang.
>>> > -----------------------------------------------------------------
>>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>>> >
>>> >
>>> >
>>> >
>>
>

Re: Region servers down...

Posted by Stack <st...@duboce.net>.
Sounds like 2047 is not enough.  Up it again.  4k?
St.Ack

2010/9/1 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D.
>
>
> I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047"
>
>
> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
>    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>    at java.lang.Thread.run(Thread.java:619)
>
>
>
> 172:
> http://pastebin.com/cdw2svHT
>
> 177:
> http://pastebin.com/iA3jxfuq
>
> Our cluster 4CPU & 6G RAM.  is as following:
> HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX?
>
> Hadoop
> 192.168.158.176 Master
>
>
> 192.168.158.171 Slave
> 192.168.158.172 Slave
>
> 192.168.158.174 Slave
>
> 192.168.158.177 Slave & SNN
>
> 192.168.158.180 Slave
>
> 192.168.158.186 Slave
>
>
> HBase Only
> 192.168.158.179  HMaster & RS & ZK
>
>
> 192.168.158.187  RS & ZK
>
>
> 192.168.158.188  RS & ZK
>
>
>
>
> Thank you in advance.
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> This is errors coming from HDFS, I would start looking at the datanode
>> log on the same machine for any exceptions thrown at the same time.
>> Also make sure your cluster is properly configured according to the
>> last bullet point in the requirements
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>
>> J-D
>>
>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>> >
>> >
>> > HBase: 0.20.6
>> > Hadoop: 0.20.2
>> >
>> > After I upgrage to 0.20.6,
>> > It run no more than one week and one Region server down again.
>> >
>> > Please check HBase log:
>> >
>> >
>> > http://pastebin.com/J9LugZ17
>> >
>> >
>> >
>> > HBase out :
>> > http://pastebin.com/QKbpSMwq
>> >
>> >
>> > Thank you in advance.
>> >
>> > Best,
>> >
>> > -- Xiujin Yang.
>> > -----------------------------------------------------------------
>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>> >
>> >
>> >
>> >
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D.


I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047" 


[2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
    at java.lang.Thread.run(Thread.java:619)



172:
http://pastebin.com/cdw2svHT

177:
http://pastebin.com/iA3jxfuq

Our cluster 4CPU & 6G RAM.  is as following:  
HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX? 

Hadoop 
192.168.158.176 Master


192.168.158.171 Slave
192.168.158.172 Slave

192.168.158.174 Slave

192.168.158.177 Slave & SNN

192.168.158.180 Slave

192.168.158.186 Slave


HBase Only 
192.168.158.179  HMaster & RS & ZK


192.168.158.187  RS & ZK


192.168.158.188  RS & ZK




Thank you in advance. 

-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang

> Date: Wed, 1 Sep 2010 10:30:44 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> This is errors coming from HDFS, I would start looking at the datanode
> log on the same machine for any exceptions thrown at the same time.
> Also make sure your cluster is properly configured according to the
> last bullet point in the requirements
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
> 
> J-D
> 
> 2010/8/31 xiujin yang <xi...@hotmail.com>:
> >
> >
> > HBase: 0.20.6
> > Hadoop: 0.20.2
> >
> > After I upgrage to 0.20.6,
> > It run no more than one week and one Region server down again.
> >
> > Please check HBase log:
> >
> >
> > http://pastebin.com/J9LugZ17
> >
> >
> >
> > HBase out :
> > http://pastebin.com/QKbpSMwq
> >
> >
> > Thank you in advance.
> >
> > Best,
> >
> > -- Xiujin Yang.
> > -----------------------------------------------------------------
> > My linkedin: http://cn.linkedin.com/in/xiujinyang
> >
> >
> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
This is errors coming from HDFS, I would start looking at the datanode
log on the same machine for any exceptions thrown at the same time.
Also make sure your cluster is properly configured according to the
last bullet point in the requirements
http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements

J-D

2010/8/31 xiujin yang <xi...@hotmail.com>:
>
>
> HBase: 0.20.6
> Hadoop: 0.20.2
>
> After I upgrage to 0.20.6,
> It run no more than one week and one Region server down again.
>
> Please check HBase log:
>
>
> http://pastebin.com/J9LugZ17
>
>
>
> HBase out :
> http://pastebin.com/QKbpSMwq
>
>
> Thank you in advance.
>
> Best,
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>
>
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.

HBase: 0.20.6
Hadoop: 0.20.2

After I upgrage to 0.20.6, 
It run no more than one week and one Region server down again.  

Please check HBase log:


http://pastebin.com/J9LugZ17



HBase out :
http://pastebin.com/QKbpSMwq


Thank you in advance. 

Best,

-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang 



 		 	   		  

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you very much, J-D.

I was trapped by the problem for a long time. 

Thank you again. 

I will upgrade to 0.20.6.

Best regards,

Xiujin Yang. 

> Date: Wed, 25 Aug 2010 09:30:55 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> That's https://issues.apache.org/jira/browse/HBASE-2797, please
> upgrade to 0.20.6 (no migration needed, just copy over the configs).
> 
> J-D
> 
> 2010/8/24 xiujin yang <xi...@hotmail.com>:
> >
> > Thank you J-D.
> >
> > The out file is like this. It has an "NullPointerException" error.
> >
> > 2010-08-24 02:30:14.187::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
> > 2010-08-24 02:30:14.187::INFO:  jetty-6.1.14
> > 2010-08-24 02:30:14.122::INFO:  Started SelectChannelConnector@0.0.0.0:60030
> > Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
> >    at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
> >    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
> >    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
> >    at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
> >    at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
> >    at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
> >    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
> >    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
> >    at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
> >    at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
> >    at java.util.PriorityQueue.poll(PriorityQueue.java:523)
> >    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
> >    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
> >    at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
> >    at org.apache.hadoop.hbase.Leases.run(Leases.java:98)
> >
> >
> >> Date: Tue, 24 Aug 2010 11:16:34 -0700
> >> Subject: Re: Region servers down...
> >> From: jdcryans@apache.org
> >> To: user@hbase.apache.org
> >>
> >> The last log to look at would be the .out file.
> >>
> >> J-D
> >>
> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >
> >> > Thank you J-D,
> >> >
> >> > I posted today's whole RS log:
> >> > http://pastebin.com/djGnNJxk
> >> >
> >> > GC log:
> >> > http://pastebin.com/AQH5kUCE
> >> >
> >> > I don't see the messages started with "We slept".
> >> >
> >> >
> >> >
> >> >
> >> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
> >> >> Subject: Re: Region servers down...
> >> >> From: jdcryans@apache.org
> >> >> To: user@hbase.apache.org
> >> >>
> >> >> I don't really see the cause of the shutdown in there, it seems it was
> >> >> already under way. Do you see messages starting with "We slept" and
> >> >> then telling how long it slept? It should be not very far from that in
> >> >> the log.
> >> >>
> >> >> J-D
> >> >>
> >> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >> >> >
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> >> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> >> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> >> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> >> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >> >> >
> >> >> >
> >> >> > Could anyone help me?
> >> >> >
> >> >> > Here is snippet from the region server log:
> >> >> > http://pastebin.com/YCUDLqc3
> >> >> >
> >> >> > Version:
> >> >> > HBase: 0.20.5
> >> >> > Hadoop: 0.20.2
> >> >> > Zookeeper: 3.3.0
> >> >> >
> >> >> >
> >> >> >
> >> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's https://issues.apache.org/jira/browse/HBASE-2797, please
upgrade to 0.20.6 (no migration needed, just copy over the configs).

J-D

2010/8/24 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D.
>
> The out file is like this. It has an "NullPointerException" error.
>
> 2010-08-24 02:30:14.187::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
> 2010-08-24 02:30:14.187::INFO:  jetty-6.1.14
> 2010-08-24 02:30:14.122::INFO:  Started SelectChannelConnector@0.0.0.0:60030
> Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
>    at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
>    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
>    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
>    at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
>    at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
>    at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
>    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
>    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
>    at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
>    at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
>    at java.util.PriorityQueue.poll(PriorityQueue.java:523)
>    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
>    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
>    at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
>    at org.apache.hadoop.hbase.Leases.run(Leases.java:98)
>
>
>> Date: Tue, 24 Aug 2010 11:16:34 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> The last log to look at would be the .out file.
>>
>> J-D
>>
>> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >
>> > Thank you J-D,
>> >
>> > I posted today's whole RS log:
>> > http://pastebin.com/djGnNJxk
>> >
>> > GC log:
>> > http://pastebin.com/AQH5kUCE
>> >
>> > I don't see the messages started with "We slept".
>> >
>> >
>> >
>> >
>> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
>> >> Subject: Re: Region servers down...
>> >> From: jdcryans@apache.org
>> >> To: user@hbase.apache.org
>> >>
>> >> I don't really see the cause of the shutdown in there, it seems it was
>> >> already under way. Do you see messages starting with "We slept" and
>> >> then telling how long it slept? It should be not very far from that in
>> >> the log.
>> >>
>> >> J-D
>> >>
>> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >> >
>> >> > Hi,
>> >> >
>> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
>> >> >
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
>> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
>> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
>> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>> >> >
>> >> >
>> >> > Could anyone help me?
>> >> >
>> >> > Here is snippet from the region server log:
>> >> > http://pastebin.com/YCUDLqc3
>> >> >
>> >> > Version:
>> >> > HBase: 0.20.5
>> >> > Hadoop: 0.20.2
>> >> > Zookeeper: 3.3.0
>> >> >
>> >> >
>> >> >
>> >
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D. 

The out file is like this. It has an "NullPointerException" error. 

2010-08-24 02:30:14.187::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2010-08-24 02:30:14.187::INFO:  jetty-6.1.14
2010-08-24 02:30:14.122::INFO:  Started SelectChannelConnector@0.0.0.0:60030
Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
    at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
    at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
    at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
    at java.util.PriorityQueue.poll(PriorityQueue.java:523)
    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
    at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
    at org.apache.hadoop.hbase.Leases.run(Leases.java:98)


> Date: Tue, 24 Aug 2010 11:16:34 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> The last log to look at would be the .out file.
> 
> J-D
> 
> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >
> > Thank you J-D,
> >
> > I posted today's whole RS log:
> > http://pastebin.com/djGnNJxk
> >
> > GC log:
> > http://pastebin.com/AQH5kUCE
> >
> > I don't see the messages started with "We slept".
> >
> >
> >
> >
> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
> >> Subject: Re: Region servers down...
> >> From: jdcryans@apache.org
> >> To: user@hbase.apache.org
> >>
> >> I don't really see the cause of the shutdown in there, it seems it was
> >> already under way. Do you see messages starting with "We slept" and
> >> then telling how long it slept? It should be not very far from that in
> >> the log.
> >>
> >> J-D
> >>
> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >
> >> > Hi,
> >> >
> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >> >
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >> >
> >> >
> >> > Could anyone help me?
> >> >
> >> > Here is snippet from the region server log:
> >> > http://pastebin.com/YCUDLqc3
> >> >
> >> > Version:
> >> > HBase: 0.20.5
> >> > Hadoop: 0.20.2
> >> > Zookeeper: 3.3.0
> >> >
> >> >
> >> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
The last log to look at would be the .out file.

J-D

2010/8/23 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D,
>
> I posted today's whole RS log:
> http://pastebin.com/djGnNJxk
>
> GC log:
> http://pastebin.com/AQH5kUCE
>
> I don't see the messages started with "We slept".
>
>
>
>
>> Date: Mon, 23 Aug 2010 23:00:32 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> I don't really see the cause of the shutdown in there, it seems it was
>> already under way. Do you see messages starting with "We slept" and
>> then telling how long it slept? It should be not very far from that in
>> the log.
>>
>> J-D
>>
>> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >
>> > Hi,
>> >
>> > RS of HBase was frequently down when running. And job will failed after the region server down.
>> >
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
>> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
>> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
>> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>> >
>> >
>> > Could anyone help me?
>> >
>> > Here is snippet from the region server log:
>> > http://pastebin.com/YCUDLqc3
>> >
>> > Version:
>> > HBase: 0.20.5
>> > Hadoop: 0.20.2
>> > Zookeeper: 3.3.0
>> >
>> >
>> >
>

Re: Region servers down...

Posted by Ted Yu <yu...@gmail.com>.
It would be beneficial to separate the RS on 192.168.158.179 onto another
machine.

2010/8/23 xiujin yang <xi...@hotmail.com>

>
> Hi
>
> My cluster is in this way.
> Hadoop & HBase are deployed on different machine.
> HBase use the hdfs of Hadoop.
>
> Machine
>
> 4 CPU & 6 G RAM
>
>
>
> Hadoop
>
> 192.168.158.171
>
> 192.168.158.172
>
> 192.168.158.174Send
>
> 192.168.158.177
>
> 192.168.158.176
>
> 192.168.158.180
>
>
> 192.168.158.186
>
>
>
>
>
>
>
> HBase
>
> 192.168.158.179  HMaster & RS
>
>
>
>
> 192.168.158.187  RS
>
>
>
>
>
> 192.168.158.188  RS
>
>
>
>
>
>
>
> At first,we deployed all machine Hadoop & HBase & Mapreduce, we found it
> was easy to make RS down because of memory problem.
> Task of Map/Reduce will eat too much memeory. And Hbase need to use swap.
> So we divided them.
>
> Is this the reason? Or memory is tow small?
>
>
>
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Hi 

My cluster is in this way.  
Hadoop & HBase are deployed on different machine. 
HBase use the hdfs of Hadoop.  

Machine

4 CPU & 6 G RAM



Hadoop 

192.168.158.171

192.168.158.172

192.168.158.174Send

192.168.158.177

192.168.158.176

192.168.158.180


192.168.158.186







HBase

192.168.158.179  HMaster & RS




192.168.158.187  RS





192.168.158.188  RS







At first,we deployed all machine Hadoop & HBase & Mapreduce, we found it was easy to make RS down because of memory problem.  
Task of Map/Reduce will eat too much memeory. And Hbase need to use swap. So we divided them. 

Is this the reason? Or memory is tow small?



 		 	   		  

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D,

I posted today's whole RS log:
http://pastebin.com/djGnNJxk

GC log:
http://pastebin.com/AQH5kUCE

I don't see the messages started with "We slept". 




> Date: Mon, 23 Aug 2010 23:00:32 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> I don't really see the cause of the shutdown in there, it seems it was
> already under way. Do you see messages starting with "We slept" and
> then telling how long it slept? It should be not very far from that in
> the log.
> 
> J-D
> 
> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >
> > Hi,
> >
> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >
> >
> > Could anyone help me?
> >
> > Here is snippet from the region server log:
> > http://pastebin.com/YCUDLqc3
> >
> > Version:
> > HBase: 0.20.5
> > Hadoop: 0.20.2
> > Zookeeper: 3.3.0
> >
> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I don't really see the cause of the shutdown in there, it seems it was
already under way. Do you see messages starting with "We slept" and
then telling how long it slept? It should be not very far from that in
the log.

J-D

2010/8/23 xiujin yang <xi...@hotmail.com>:
>
> Hi,
>
> RS of HBase was frequently down when running. And job will failed after the region server down.
>
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>
>
> Could anyone help me?
>
> Here is snippet from the region server log:
> http://pastebin.com/YCUDLqc3
>
> Version:
> HBase: 0.20.5
> Hadoop: 0.20.2
> Zookeeper: 3.3.0
>
>
>

Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Hi,

RS of HBase was frequently down when running. And job will failed after the region server down. 

[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
[regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
[Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
[Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
[HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed


Could anyone help me? 

Here is snippet from the region server log:
http://pastebin.com/YCUDLqc3

Version: 
HBase: 0.20.5
Hadoop: 0.20.2
Zookeeper: 3.3.0


 		 	   		  

RE: Regions offlined..

Posted by Jonathan Gray <jg...@facebook.com>.
A new version of HBCK should be available soon which will detect and repair this situation.

Hopefully we can have a patch up tomorrow.

JG

> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> Sent: Tuesday, August 24, 2010 10:09 AM
> To: user@hbase.apache.org
> Subject: Re: Regions offlined..
> 
> On Tue, Aug 24, 2010 at 7:40 AM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
> > I keep getting 1 or 2 out of 80000 regions offlined (and this is on a
> version where the offline-region bug was fixed: see below for the
> link). Can you guys let me know a likely cause?
> >
> 
> If same stacktrace as pasted in previous message, please provide more
> from the log file.  I'd like to see how the scenario came about.  We
> want to cut another 0.89 in next day or so.  Would be good to get fix
> in for your issue if it not fixed already.
> 
> > I was restarting the db as a way to sidestep for now, but it takes a
> long time to enable the db contents..
> 
>   <property>
>     <name>hbase.regions.percheckin</name>
>     <value>10</value>
>     <description>Maximum number of regions that can be assigned in a
> single go
>     to a region server.
>     </description>
>   </property>
> 
> Make the above setting 100 for your case.
> 
> 
> Other ways I can think of are 1) deleting those entries from the META
> table and reinsert
> >       2) Is it possible to manually override the state in zk?
> 
> 
> Yes, you can manually edit zk. Its messy but its no different than
> updating a row in a table.
> 
> Are the regions offlined or is there a hole in the table?
> 
> If you do:
> 
> echo "scan '.META.'" | ./bin/hbase shell --format-width=300  &>
> /tmp/meta.txt
> 
> .. can you find the rows that are giving you issue and search their
> location in meta.txt and see if offlined or missing regions?
> 
> (I can take a look if you want me to send me meta.txt and the problem
> rows on back channel?)
> 
> St.Ack
> 
> >  Can you let me know what can be done to get around this problem for
> now?
> >
> > Thank you
> > Vidhya
> >
> >
> > On 8/20/10 5:01 PM, "Vidhyashankar Venkataraman" <vidhyash@yahoo-
> inc.com> wrote:
> >
> > Changes.txt says that this particular issue was fixed..
> >
> > Could there be another reason why I see this problem?
> >
> > I know that restarting might resolve this issue but I just wanted to
> check with you guys the potential cause for the problem..
> >
> > Thank you
> > Vidhya
> >
> > On 8/20/10 4:56 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:
> >
> > 0.89 are snapshots of trunk, so you may or may not have it in your
> > version. Check you CHANGES.txt file to be sure.
> >
> > J-D
> >
> > On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman
> > <vi...@yahoo-inc.com> wrote:
> >> I am seeing a couple of regions offlined by the master because of an
> exception (attached below) at the RS to which the master tried to
> assign...
> >>
> >>  The following jira says the issue has been resolved: But the change
> is in 0.90.. I am using 0.89 right now: Can you guys let me know of
> what changes went into  0.89 and what did not?
> >>
> >> https://issues.apache.org/jira/browse/HBASE-
> 2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12891806#action_12891806
> >>
> >> Thank you
> >> Vidhya
> >>
> >>
> >> 2010-08-20 19:18:27,333 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
> event, state: SyncConnected, type: NodeDataChanged, path:
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >> 2010-08-20 19:18:27,335 WARN
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.
> net,60020,1282326954084>Failed to write data to ZooKeeper
> >> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >>        at
> org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> >>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper
> Wrapper.java:1062)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEve
> ntData(RSZookeeperUpdater.java:161)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpen
> Event(RSZookeeperUpdater.java:115)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSe
> rver.java:1441)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSe
> rver.java:1350)
> >>        at java.lang.Thread.run(Thread.java:619)
> >> 2010-08-20 19:18:27,335 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening
> DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
> >> java.io.IOException:
> org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper
> Wrapper.java:1072)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEve
> ntData(RSZookeeperUpdater.java:161)
> >>        at
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpen
> Event(RSZookeeperUpdater.java:115)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSe
> rver.java:1441)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSe
> rver.java:1350)
> >>        at java.lang.Thread.run(Thread.java:619)
> >> Caused by: org.apache.zookeeper.KeeperException$BadVersionException:
> KeeperErrorCode = BadVersion for
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
> >>        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >>        at
> org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
> >>        at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeper
> Wrapper.java:1062)
> >>        ... 5 more
> >> 2010-08-20 19:18:27,336 ERROR
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open
> of region 5da7abbffde229aaab56382c3812363d
> >> 2010-08-20 19:18:27,337 DEBUG
> org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode
> /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with
> [RS2ZK_REGION_CLOSED] expected version = 2
> >>
> >>
> >>
> >>
> >
> >
> >

Re: Regions offlined..

Posted by Stack <st...@duboce.net>.
On Tue, Aug 24, 2010 at 7:40 AM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> I keep getting 1 or 2 out of 80000 regions offlined (and this is on a version where the offline-region bug was fixed: see below for the link). Can you guys let me know a likely cause?
>

If same stacktrace as pasted in previous message, please provide more
from the log file.  I'd like to see how the scenario came about.  We
want to cut another 0.89 in next day or so.  Would be good to get fix
in for your issue if it not fixed already.

> I was restarting the db as a way to sidestep for now, but it takes a long time to enable the db contents..

  <property>
    <name>hbase.regions.percheckin</name>
    <value>10</value>
    <description>Maximum number of regions that can be assigned in a single go
    to a region server.
    </description>
  </property>

Make the above setting 100 for your case.


Other ways I can think of are 1) deleting those entries from the META
table and reinsert
>       2) Is it possible to manually override the state in zk?


Yes, you can manually edit zk. Its messy but its no different than
updating a row in a table.

Are the regions offlined or is there a hole in the table?

If you do:

echo "scan '.META.'" | ./bin/hbase shell --format-width=300  &> /tmp/meta.txt

.. can you find the rows that are giving you issue and search their
location in meta.txt and see if offlined or missing regions?

(I can take a look if you want me to send me meta.txt and the problem
rows on back channel?)

St.Ack

>  Can you let me know what can be done to get around this problem for now?
>
> Thank you
> Vidhya
>
>
> On 8/20/10 5:01 PM, "Vidhyashankar Venkataraman" <vi...@yahoo-inc.com> wrote:
>
> Changes.txt says that this particular issue was fixed..
>
> Could there be another reason why I see this problem?
>
> I know that restarting might resolve this issue but I just wanted to check with you guys the potential cause for the problem..
>
> Thank you
> Vidhya
>
> On 8/20/10 4:56 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:
>
> 0.89 are snapshots of trunk, so you may or may not have it in your
> version. Check you CHANGES.txt file to be sure.
>
> J-D
>
> On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman
> <vi...@yahoo-inc.com> wrote:
>> I am seeing a couple of regions offlined by the master because of an exception (attached below) at the RS to which the master tried to assign...
>>
>>  The following jira says the issue has been resolved: But the change is in 0.90.. I am using 0.89 right now: Can you guys let me know of what changes went into  0.89 and what did not?
>>
>> https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891806#action_12891806
>>
>> Thank you
>> Vidhya
>>
>>
>> 2010-08-20 19:18:27,333 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>> 2010-08-20 19:18:27,335 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.net,60020,1282326954084>Failed to write data to ZooKeeper
>> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>>        at java.lang.Thread.run(Thread.java:619)
>> 2010-08-20 19:18:27,335 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
>> java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1072)
>>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>>        at java.lang.Thread.run(Thread.java:619)
>> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>>        ... 5 more
>> 2010-08-20 19:18:27,336 ERROR org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of region 5da7abbffde229aaab56382c3812363d
>> 2010-08-20 19:18:27,337 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with [RS2ZK_REGION_CLOSED] expected version = 2
>>
>>
>>
>>
>
>
>

Re: Regions offlined..

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
I keep getting 1 or 2 out of 80000 regions offlined (and this is on a version where the offline-region bug was fixed: see below for the link). Can you guys let me know a likely cause?

I was restarting the db as a way to sidestep for now, but it takes a long time to enable the db contents.. Other ways I can think of are 1) deleting those entries from the META table and reinsert
       2) Is it possible to manually override the state in zk?
  Can you let me know what can be done to get around this problem for now?

Thank you
Vidhya


On 8/20/10 5:01 PM, "Vidhyashankar Venkataraman" <vi...@yahoo-inc.com> wrote:

Changes.txt says that this particular issue was fixed..

Could there be another reason why I see this problem?

I know that restarting might resolve this issue but I just wanted to check with you guys the potential cause for the problem..

Thank you
Vidhya

On 8/20/10 4:56 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:

0.89 are snapshots of trunk, so you may or may not have it in your
version. Check you CHANGES.txt file to be sure.

J-D

On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> I am seeing a couple of regions offlined by the master because of an exception (attached below) at the RS to which the master tried to assign...
>
>  The following jira says the issue has been resolved: But the change is in 0.90.. I am using 0.89 right now: Can you guys let me know of what changes went into  0.89 and what did not?
>
> https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891806#action_12891806
>
> Thank you
> Vidhya
>
>
> 2010-08-20 19:18:27,333 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> 2010-08-20 19:18:27,335 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.net,60020,1282326954084>Failed to write data to ZooKeeper
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>        at java.lang.Thread.run(Thread.java:619)
> 2010-08-20 19:18:27,335 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
> java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1072)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>        ... 5 more
> 2010-08-20 19:18:27,336 ERROR org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of region 5da7abbffde229aaab56382c3812363d
> 2010-08-20 19:18:27,337 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with [RS2ZK_REGION_CLOSED] expected version = 2
>
>
>
>



Re: Regions offlined..

Posted by Vidhyashankar Venkataraman <vi...@yahoo-inc.com>.
Changes.txt says that this particular issue was fixed..

Could there be another reason why I see this problem?

I know that restarting might resolve this issue but I just wanted to check with you guys the potential cause for the problem..

Thank you
Vidhya

On 8/20/10 4:56 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:

0.89 are snapshots of trunk, so you may or may not have it in your
version. Check you CHANGES.txt file to be sure.

J-D

On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> I am seeing a couple of regions offlined by the master because of an exception (attached below) at the RS to which the master tried to assign...
>
>  The following jira says the issue has been resolved: But the change is in 0.90.. I am using 0.89 right now: Can you guys let me know of what changes went into  0.89 and what did not?
>
> https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891806#action_12891806
>
> Thank you
> Vidhya
>
>
> 2010-08-20 19:18:27,333 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> 2010-08-20 19:18:27,335 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.net,60020,1282326954084>Failed to write data to ZooKeeper
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>        at java.lang.Thread.run(Thread.java:619)
> 2010-08-20 19:18:27,335 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
> java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1072)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>        ... 5 more
> 2010-08-20 19:18:27,336 ERROR org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of region 5da7abbffde229aaab56382c3812363d
> 2010-08-20 19:18:27,337 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with [RS2ZK_REGION_CLOSED] expected version = 2
>
>
>
>


Re: Regions offlined..

Posted by Jean-Daniel Cryans <jd...@apache.org>.
0.89 are snapshots of trunk, so you may or may not have it in your
version. Check you CHANGES.txt file to be sure.

J-D

On Fri, Aug 20, 2010 at 4:52 PM, Vidhyashankar Venkataraman
<vi...@yahoo-inc.com> wrote:
> I am seeing a couple of regions offlined by the master because of an exception (attached below) at the RS to which the master tried to assign...
>
>  The following jira says the issue has been resolved: But the change is in 0.90.. I am using 0.89 right now: Can you guys let me know of what changes went into  0.89 and what did not?
>
> https://issues.apache.org/jira/browse/HBASE-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891806#action_12891806
>
> Thank you
> Vidhya
>
>
> 2010-08-20 19:18:27,333 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeDataChanged, path: /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
> 2010-08-20 19:18:27,335 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,b3130247.yst.yahoo.net,60020,1282326954084>Failed to write data to ZooKeeper
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>        at java.lang.Thread.run(Thread.java:619)
> 2010-08-20 19:18:27,335 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening DocDB,0000010644000000,1282331147892.5da7abbffde229aaab56382c3812363d.
> java.io.IOException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1072)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.updateZKWithEventData(RSZookeeperUpdater.java:161)
>        at org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater.startRegionOpenEvent(RSZookeeperUpdater.java:115)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1441)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1350)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeZNode(ZooKeeperWrapper.java:1062)
>        ... 5 more
> 2010-08-20 19:18:27,336 ERROR org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of region 5da7abbffde229aaab56382c3812363d
> 2010-08-20 19:18:27,337 DEBUG org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Updating ZNode /hbase/UNASSIGNED/5da7abbffde229aaab56382c3812363d with [RS2ZK_REGION_CLOSED] expected version = 2
>
>
>
>