You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by xiujin yang <xi...@hotmail.com> on 2010/08/24 07:49:37 UTC

Region servers down...

Hi,

RS of HBase was frequently down when running. And job will failed after the region server down. 

[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
[regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
[Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
[Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
[HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed


Could anyone help me? 

Here is snippet from the region server log:
http://pastebin.com/YCUDLqc3

Version: 
HBase: 0.20.5
Hadoop: 0.20.2
Zookeeper: 3.3.0


 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
What Stack said, and try setting your split size bigger on your tables
in order to limit the number of them. Bigger files = less smaller
files = less occupied xcievers to answer request to those files. See
the help in the shell for "alter", look for the MAX_FILESIZE value
(which is in bytes, defaults to 256MB, and try 1GB).

J-D

On Wed, Sep 1, 2010 at 10:25 PM, Stack <st...@duboce.net> wrote:
> Sounds like 2047 is not enough.  Up it again.  4k?
> St.Ack
>
> 2010/9/1 xiujin yang <xi...@hotmail.com>:
>>
>> Thank you J-D.
>>
>>
>> I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047"
>>
>>
>> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
>> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
>>    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>>    at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> 172:
>> http://pastebin.com/cdw2svHT
>>
>> 177:
>> http://pastebin.com/iA3jxfuq
>>
>> Our cluster 4CPU & 6G RAM.  is as following:
>> HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX?
>>
>> Hadoop
>> 192.168.158.176 Master
>>
>>
>> 192.168.158.171 Slave
>> 192.168.158.172 Slave
>>
>> 192.168.158.174 Slave
>>
>> 192.168.158.177 Slave & SNN
>>
>> 192.168.158.180 Slave
>>
>> 192.168.158.186 Slave
>>
>>
>> HBase Only
>> 192.168.158.179  HMaster & RS & ZK
>>
>>
>> 192.168.158.187  RS & ZK
>>
>>
>> 192.168.158.188  RS & ZK
>>
>>
>>
>>
>> Thank you in advance.
>>
>> -- Xiujin Yang.
>> -----------------------------------------------------------------
>> My linkedin: http://cn.linkedin.com/in/xiujinyang
>>
>>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>>> Subject: Re: Region servers down...
>>> From: jdcryans@apache.org
>>> To: user@hbase.apache.org
>>>
>>> This is errors coming from HDFS, I would start looking at the datanode
>>> log on the same machine for any exceptions thrown at the same time.
>>> Also make sure your cluster is properly configured according to the
>>> last bullet point in the requirements
>>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>>
>>> J-D
>>>
>>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>>> >
>>> >
>>> > HBase: 0.20.6
>>> > Hadoop: 0.20.2
>>> >
>>> > After I upgrage to 0.20.6,
>>> > It run no more than one week and one Region server down again.
>>> >
>>> > Please check HBase log:
>>> >
>>> >
>>> > http://pastebin.com/J9LugZ17
>>> >
>>> >
>>> >
>>> > HBase out :
>>> > http://pastebin.com/QKbpSMwq
>>> >
>>> >
>>> > Thank you in advance.
>>> >
>>> > Best,
>>> >
>>> > -- Xiujin Yang.
>>> > -----------------------------------------------------------------
>>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>>> >
>>> >
>>> >
>>> >
>>
>

Re: Region servers down...

Posted by Stack <st...@duboce.net>.
Sounds like 2047 is not enough.  Up it again.  4k?
St.Ack

2010/9/1 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D.
>
>
> I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047"
>
>
> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
>    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>    at java.lang.Thread.run(Thread.java:619)
>
>
>
> 172:
> http://pastebin.com/cdw2svHT
>
> 177:
> http://pastebin.com/iA3jxfuq
>
> Our cluster 4CPU & 6G RAM.  is as following:
> HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX?
>
> Hadoop
> 192.168.158.176 Master
>
>
> 192.168.158.171 Slave
> 192.168.158.172 Slave
>
> 192.168.158.174 Slave
>
> 192.168.158.177 Slave & SNN
>
> 192.168.158.180 Slave
>
> 192.168.158.186 Slave
>
>
> HBase Only
> 192.168.158.179  HMaster & RS & ZK
>
>
> 192.168.158.187  RS & ZK
>
>
> 192.168.158.188  RS & ZK
>
>
>
>
> Thank you in advance.
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> This is errors coming from HDFS, I would start looking at the datanode
>> log on the same machine for any exceptions thrown at the same time.
>> Also make sure your cluster is properly configured according to the
>> last bullet point in the requirements
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>
>> J-D
>>
>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>> >
>> >
>> > HBase: 0.20.6
>> > Hadoop: 0.20.2
>> >
>> > After I upgrage to 0.20.6,
>> > It run no more than one week and one Region server down again.
>> >
>> > Please check HBase log:
>> >
>> >
>> > http://pastebin.com/J9LugZ17
>> >
>> >
>> >
>> > HBase out :
>> > http://pastebin.com/QKbpSMwq
>> >
>> >
>> > Thank you in advance.
>> >
>> > Best,
>> >
>> > -- Xiujin Yang.
>> > -----------------------------------------------------------------
>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>> >
>> >
>> >
>> >
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D.


I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047" 


[2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
    at java.lang.Thread.run(Thread.java:619)



172:
http://pastebin.com/cdw2svHT

177:
http://pastebin.com/iA3jxfuq

Our cluster 4CPU & 6G RAM.  is as following:  
HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX? 

Hadoop 
192.168.158.176 Master


192.168.158.171 Slave
192.168.158.172 Slave

192.168.158.174 Slave

192.168.158.177 Slave & SNN

192.168.158.180 Slave

192.168.158.186 Slave


HBase Only 
192.168.158.179  HMaster & RS & ZK


192.168.158.187  RS & ZK


192.168.158.188  RS & ZK




Thank you in advance. 

-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang

> Date: Wed, 1 Sep 2010 10:30:44 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> This is errors coming from HDFS, I would start looking at the datanode
> log on the same machine for any exceptions thrown at the same time.
> Also make sure your cluster is properly configured according to the
> last bullet point in the requirements
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
> 
> J-D
> 
> 2010/8/31 xiujin yang <xi...@hotmail.com>:
> >
> >
> > HBase: 0.20.6
> > Hadoop: 0.20.2
> >
> > After I upgrage to 0.20.6,
> > It run no more than one week and one Region server down again.
> >
> > Please check HBase log:
> >
> >
> > http://pastebin.com/J9LugZ17
> >
> >
> >
> > HBase out :
> > http://pastebin.com/QKbpSMwq
> >
> >
> > Thank you in advance.
> >
> > Best,
> >
> > -- Xiujin Yang.
> > -----------------------------------------------------------------
> > My linkedin: http://cn.linkedin.com/in/xiujinyang
> >
> >
> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
This is errors coming from HDFS, I would start looking at the datanode
log on the same machine for any exceptions thrown at the same time.
Also make sure your cluster is properly configured according to the
last bullet point in the requirements
http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements

J-D

2010/8/31 xiujin yang <xi...@hotmail.com>:
>
>
> HBase: 0.20.6
> Hadoop: 0.20.2
>
> After I upgrage to 0.20.6,
> It run no more than one week and one Region server down again.
>
> Please check HBase log:
>
>
> http://pastebin.com/J9LugZ17
>
>
>
> HBase out :
> http://pastebin.com/QKbpSMwq
>
>
> Thank you in advance.
>
> Best,
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>
>
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.

HBase: 0.20.6
Hadoop: 0.20.2

After I upgrage to 0.20.6, 
It run no more than one week and one Region server down again.  

Please check HBase log:


http://pastebin.com/J9LugZ17



HBase out :
http://pastebin.com/QKbpSMwq


Thank you in advance. 

Best,

-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang 



 		 	   		  

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you very much, J-D.

I was trapped by the problem for a long time. 

Thank you again. 

I will upgrade to 0.20.6.

Best regards,

Xiujin Yang. 

> Date: Wed, 25 Aug 2010 09:30:55 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> That's https://issues.apache.org/jira/browse/HBASE-2797, please
> upgrade to 0.20.6 (no migration needed, just copy over the configs).
> 
> J-D
> 
> 2010/8/24 xiujin yang <xi...@hotmail.com>:
> >
> > Thank you J-D.
> >
> > The out file is like this. It has an "NullPointerException" error.
> >
> > 2010-08-24 02:30:14.187::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
> > 2010-08-24 02:30:14.187::INFO:  jetty-6.1.14
> > 2010-08-24 02:30:14.122::INFO:  Started SelectChannelConnector@0.0.0.0:60030
> > Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
> >    at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
> >    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
> >    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
> >    at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
> >    at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
> >    at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
> >    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
> >    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
> >    at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
> >    at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
> >    at java.util.PriorityQueue.poll(PriorityQueue.java:523)
> >    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
> >    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
> >    at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
> >    at org.apache.hadoop.hbase.Leases.run(Leases.java:98)
> >
> >
> >> Date: Tue, 24 Aug 2010 11:16:34 -0700
> >> Subject: Re: Region servers down...
> >> From: jdcryans@apache.org
> >> To: user@hbase.apache.org
> >>
> >> The last log to look at would be the .out file.
> >>
> >> J-D
> >>
> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >
> >> > Thank you J-D,
> >> >
> >> > I posted today's whole RS log:
> >> > http://pastebin.com/djGnNJxk
> >> >
> >> > GC log:
> >> > http://pastebin.com/AQH5kUCE
> >> >
> >> > I don't see the messages started with "We slept".
> >> >
> >> >
> >> >
> >> >
> >> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
> >> >> Subject: Re: Region servers down...
> >> >> From: jdcryans@apache.org
> >> >> To: user@hbase.apache.org
> >> >>
> >> >> I don't really see the cause of the shutdown in there, it seems it was
> >> >> already under way. Do you see messages starting with "We slept" and
> >> >> then telling how long it slept? It should be not very far from that in
> >> >> the log.
> >> >>
> >> >> J-D
> >> >>
> >> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >> >> >
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> >> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> >> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> >> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> >> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >> >> >
> >> >> >
> >> >> > Could anyone help me?
> >> >> >
> >> >> > Here is snippet from the region server log:
> >> >> > http://pastebin.com/YCUDLqc3
> >> >> >
> >> >> > Version:
> >> >> > HBase: 0.20.5
> >> >> > Hadoop: 0.20.2
> >> >> > Zookeeper: 3.3.0
> >> >> >
> >> >> >
> >> >> >
> >> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's https://issues.apache.org/jira/browse/HBASE-2797, please
upgrade to 0.20.6 (no migration needed, just copy over the configs).

J-D

2010/8/24 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D.
>
> The out file is like this. It has an "NullPointerException" error.
>
> 2010-08-24 02:30:14.187::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
> 2010-08-24 02:30:14.187::INFO:  jetty-6.1.14
> 2010-08-24 02:30:14.122::INFO:  Started SelectChannelConnector@0.0.0.0:60030
> Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
>    at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
>    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
>    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
>    at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
>    at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
>    at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
>    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
>    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
>    at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
>    at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
>    at java.util.PriorityQueue.poll(PriorityQueue.java:523)
>    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
>    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
>    at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
>    at org.apache.hadoop.hbase.Leases.run(Leases.java:98)
>
>
>> Date: Tue, 24 Aug 2010 11:16:34 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> The last log to look at would be the .out file.
>>
>> J-D
>>
>> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >
>> > Thank you J-D,
>> >
>> > I posted today's whole RS log:
>> > http://pastebin.com/djGnNJxk
>> >
>> > GC log:
>> > http://pastebin.com/AQH5kUCE
>> >
>> > I don't see the messages started with "We slept".
>> >
>> >
>> >
>> >
>> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
>> >> Subject: Re: Region servers down...
>> >> From: jdcryans@apache.org
>> >> To: user@hbase.apache.org
>> >>
>> >> I don't really see the cause of the shutdown in there, it seems it was
>> >> already under way. Do you see messages starting with "We slept" and
>> >> then telling how long it slept? It should be not very far from that in
>> >> the log.
>> >>
>> >> J-D
>> >>
>> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >> >
>> >> > Hi,
>> >> >
>> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
>> >> >
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
>> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
>> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
>> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>> >> >
>> >> >
>> >> > Could anyone help me?
>> >> >
>> >> > Here is snippet from the region server log:
>> >> > http://pastebin.com/YCUDLqc3
>> >> >
>> >> > Version:
>> >> > HBase: 0.20.5
>> >> > Hadoop: 0.20.2
>> >> > Zookeeper: 3.3.0
>> >> >
>> >> >
>> >> >
>> >
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D. 

The out file is like this. It has an "NullPointerException" error. 

2010-08-24 02:30:14.187::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2010-08-24 02:30:14.187::INFO:  jetty-6.1.14
2010-08-24 02:30:14.122::INFO:  Started SelectChannelConnector@0.0.0.0:60030
Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
    at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
    at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
    at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
    at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
    at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
    at java.util.PriorityQueue.poll(PriorityQueue.java:523)
    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
    at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
    at org.apache.hadoop.hbase.Leases.run(Leases.java:98)


> Date: Tue, 24 Aug 2010 11:16:34 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> The last log to look at would be the .out file.
> 
> J-D
> 
> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >
> > Thank you J-D,
> >
> > I posted today's whole RS log:
> > http://pastebin.com/djGnNJxk
> >
> > GC log:
> > http://pastebin.com/AQH5kUCE
> >
> > I don't see the messages started with "We slept".
> >
> >
> >
> >
> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
> >> Subject: Re: Region servers down...
> >> From: jdcryans@apache.org
> >> To: user@hbase.apache.org
> >>
> >> I don't really see the cause of the shutdown in there, it seems it was
> >> already under way. Do you see messages starting with "We slept" and
> >> then telling how long it slept? It should be not very far from that in
> >> the log.
> >>
> >> J-D
> >>
> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >
> >> > Hi,
> >> >
> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >> >
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >> >
> >> >
> >> > Could anyone help me?
> >> >
> >> > Here is snippet from the region server log:
> >> > http://pastebin.com/YCUDLqc3
> >> >
> >> > Version:
> >> > HBase: 0.20.5
> >> > Hadoop: 0.20.2
> >> > Zookeeper: 3.3.0
> >> >
> >> >
> >> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
The last log to look at would be the .out file.

J-D

2010/8/23 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D,
>
> I posted today's whole RS log:
> http://pastebin.com/djGnNJxk
>
> GC log:
> http://pastebin.com/AQH5kUCE
>
> I don't see the messages started with "We slept".
>
>
>
>
>> Date: Mon, 23 Aug 2010 23:00:32 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> I don't really see the cause of the shutdown in there, it seems it was
>> already under way. Do you see messages starting with "We slept" and
>> then telling how long it slept? It should be not very far from that in
>> the log.
>>
>> J-D
>>
>> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >
>> > Hi,
>> >
>> > RS of HBase was frequently down when running. And job will failed after the region server down.
>> >
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
>> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
>> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
>> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>> >
>> >
>> > Could anyone help me?
>> >
>> > Here is snippet from the region server log:
>> > http://pastebin.com/YCUDLqc3
>> >
>> > Version:
>> > HBase: 0.20.5
>> > Hadoop: 0.20.2
>> > Zookeeper: 3.3.0
>> >
>> >
>> >
>

Re: Region servers down...

Posted by Ted Yu <yu...@gmail.com>.
It would be beneficial to separate the RS on 192.168.158.179 onto another
machine.

2010/8/23 xiujin yang <xi...@hotmail.com>

>
> Hi
>
> My cluster is in this way.
> Hadoop & HBase are deployed on different machine.
> HBase use the hdfs of Hadoop.
>
> Machine
>
> 4 CPU & 6 G RAM
>
>
>
> Hadoop
>
> 192.168.158.171
>
> 192.168.158.172
>
> 192.168.158.174Send
>
> 192.168.158.177
>
> 192.168.158.176
>
> 192.168.158.180
>
>
> 192.168.158.186
>
>
>
>
>
>
>
> HBase
>
> 192.168.158.179  HMaster & RS
>
>
>
>
> 192.168.158.187  RS
>
>
>
>
>
> 192.168.158.188  RS
>
>
>
>
>
>
>
> At first,we deployed all machine Hadoop & HBase & Mapreduce, we found it
> was easy to make RS down because of memory problem.
> Task of Map/Reduce will eat too much memeory. And Hbase need to use swap.
> So we divided them.
>
> Is this the reason? Or memory is tow small?
>
>
>
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Hi 

My cluster is in this way.  
Hadoop & HBase are deployed on different machine. 
HBase use the hdfs of Hadoop.  

Machine

4 CPU & 6 G RAM



Hadoop 

192.168.158.171

192.168.158.172

192.168.158.174Send

192.168.158.177

192.168.158.176

192.168.158.180


192.168.158.186







HBase

192.168.158.179  HMaster & RS




192.168.158.187  RS





192.168.158.188  RS







At first,we deployed all machine Hadoop & HBase & Mapreduce, we found it was easy to make RS down because of memory problem.  
Task of Map/Reduce will eat too much memeory. And Hbase need to use swap. So we divided them. 

Is this the reason? Or memory is tow small?



 		 	   		  

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D,

I posted today's whole RS log:
http://pastebin.com/djGnNJxk

GC log:
http://pastebin.com/AQH5kUCE

I don't see the messages started with "We slept". 




> Date: Mon, 23 Aug 2010 23:00:32 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> I don't really see the cause of the shutdown in there, it seems it was
> already under way. Do you see messages starting with "We slept" and
> then telling how long it slept? It should be not very far from that in
> the log.
> 
> J-D
> 
> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >
> > Hi,
> >
> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >
> >
> > Could anyone help me?
> >
> > Here is snippet from the region server log:
> > http://pastebin.com/YCUDLqc3
> >
> > Version:
> > HBase: 0.20.5
> > Hadoop: 0.20.2
> > Zookeeper: 3.3.0
> >
> >
> >
 		 	   		  

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I don't really see the cause of the shutdown in there, it seems it was
already under way. Do you see messages starting with "We slept" and
then telling how long it slept? It should be not very far from that in
the log.

J-D

2010/8/23 xiujin yang <xi...@hotmail.com>:
>
> Hi,
>
> RS of HBase was frequently down when running. And job will failed after the region server down.
>
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>
>
> Could anyone help me?
>
> Here is snippet from the region server log:
> http://pastebin.com/YCUDLqc3
>
> Version:
> HBase: 0.20.5
> Hadoop: 0.20.2
> Zookeeper: 3.3.0
>
>
>