You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by xiujin yang <xi...@hotmail.com> on 2010/08/24 07:49:37 UTC
Region servers down...
Hi,
RS of HBase was frequently down when running. And job will failed after the region server down.
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
[regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
[regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
[Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
[Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
[HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
Could anyone help me?
Here is snippet from the region server log:
http://pastebin.com/YCUDLqc3
Version:
HBase: 0.20.5
Hadoop: 0.20.2
Zookeeper: 3.3.0
Re: Region servers down...
Posted by Jean-Daniel Cryans <jd...@apache.org>.
What Stack said, and try setting your split size bigger on your tables
in order to limit the number of them. Bigger files = less smaller
files = less occupied xcievers to answer request to those files. See
the help in the shell for "alter", look for the MAX_FILESIZE value
(which is in bytes, defaults to 256MB, and try 1GB).
J-D
On Wed, Sep 1, 2010 at 10:25 PM, Stack <st...@duboce.net> wrote:
> Sounds like 2047 is not enough. Up it again. 4k?
> St.Ack
>
> 2010/9/1 xiujin yang <xi...@hotmail.com>:
>>
>> Thank you J-D.
>>
>>
>> I've checked two datanode log and found the same error. "exceeds the limit of concurrent xcievers 2047"
>>
>>
>> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
>> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
>> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>> at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> 172:
>> http://pastebin.com/cdw2svHT
>>
>> 177:
>> http://pastebin.com/iA3jxfuq
>>
>> Our cluster 4CPU & 6G RAM. is as following:
>> HBase region server node don't have HDFS. Is this related with the error? Do I need to increase the xcievers from 2047 --> XX?
>>
>> Hadoop
>> 192.168.158.176 Master
>>
>>
>> 192.168.158.171 Slave
>> 192.168.158.172 Slave
>>
>> 192.168.158.174 Slave
>>
>> 192.168.158.177 Slave & SNN
>>
>> 192.168.158.180 Slave
>>
>> 192.168.158.186 Slave
>>
>>
>> HBase Only
>> 192.168.158.179 HMaster & RS & ZK
>>
>>
>> 192.168.158.187 RS & ZK
>>
>>
>> 192.168.158.188 RS & ZK
>>
>>
>>
>>
>> Thank you in advance.
>>
>> -- Xiujin Yang.
>> -----------------------------------------------------------------
>> My linkedin: http://cn.linkedin.com/in/xiujinyang
>>
>>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>>> Subject: Re: Region servers down...
>>> From: jdcryans@apache.org
>>> To: user@hbase.apache.org
>>>
>>> This is errors coming from HDFS, I would start looking at the datanode
>>> log on the same machine for any exceptions thrown at the same time.
>>> Also make sure your cluster is properly configured according to the
>>> last bullet point in the requirements
>>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>>
>>> J-D
>>>
>>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>>> >
>>> >
>>> > HBase: 0.20.6
>>> > Hadoop: 0.20.2
>>> >
>>> > After I upgrage to 0.20.6,
>>> > It run no more than one week and one Region server down again.
>>> >
>>> > Please check HBase log:
>>> >
>>> >
>>> > http://pastebin.com/J9LugZ17
>>> >
>>> >
>>> >
>>> > HBase out :
>>> > http://pastebin.com/QKbpSMwq
>>> >
>>> >
>>> > Thank you in advance.
>>> >
>>> > Best,
>>> >
>>> > -- Xiujin Yang.
>>> > -----------------------------------------------------------------
>>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>>> >
>>> >
>>> >
>>> >
>>
>
Re: Region servers down...
Posted by Stack <st...@duboce.net>.
Sounds like 2047 is not enough. Up it again. 4k?
St.Ack
2010/9/1 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D.
>
>
> I've checked two datanode log and found the same error. "exceeds the limit of concurrent xcievers 2047"
>
>
> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
> at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
> at java.lang.Thread.run(Thread.java:619)
>
>
>
> 172:
> http://pastebin.com/cdw2svHT
>
> 177:
> http://pastebin.com/iA3jxfuq
>
> Our cluster 4CPU & 6G RAM. is as following:
> HBase region server node don't have HDFS. Is this related with the error? Do I need to increase the xcievers from 2047 --> XX?
>
> Hadoop
> 192.168.158.176 Master
>
>
> 192.168.158.171 Slave
> 192.168.158.172 Slave
>
> 192.168.158.174 Slave
>
> 192.168.158.177 Slave & SNN
>
> 192.168.158.180 Slave
>
> 192.168.158.186 Slave
>
>
> HBase Only
> 192.168.158.179 HMaster & RS & ZK
>
>
> 192.168.158.187 RS & ZK
>
>
> 192.168.158.188 RS & ZK
>
>
>
>
> Thank you in advance.
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> This is errors coming from HDFS, I would start looking at the datanode
>> log on the same machine for any exceptions thrown at the same time.
>> Also make sure your cluster is properly configured according to the
>> last bullet point in the requirements
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>
>> J-D
>>
>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>> >
>> >
>> > HBase: 0.20.6
>> > Hadoop: 0.20.2
>> >
>> > After I upgrage to 0.20.6,
>> > It run no more than one week and one Region server down again.
>> >
>> > Please check HBase log:
>> >
>> >
>> > http://pastebin.com/J9LugZ17
>> >
>> >
>> >
>> > HBase out :
>> > http://pastebin.com/QKbpSMwq
>> >
>> >
>> > Thank you in advance.
>> >
>> > Best,
>> >
>> > -- Xiujin Yang.
>> > -----------------------------------------------------------------
>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>> >
>> >
>> >
>> >
>
RE: Region servers down...
Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D.
I've checked two datanode log and found the same error. "exceeds the limit of concurrent xcievers 2047"
[2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
at java.lang.Thread.run(Thread.java:619)
172:
http://pastebin.com/cdw2svHT
177:
http://pastebin.com/iA3jxfuq
Our cluster 4CPU & 6G RAM. is as following:
HBase region server node don't have HDFS. Is this related with the error? Do I need to increase the xcievers from 2047 --> XX?
Hadoop
192.168.158.176 Master
192.168.158.171 Slave
192.168.158.172 Slave
192.168.158.174 Slave
192.168.158.177 Slave & SNN
192.168.158.180 Slave
192.168.158.186 Slave
HBase Only
192.168.158.179 HMaster & RS & ZK
192.168.158.187 RS & ZK
192.168.158.188 RS & ZK
Thank you in advance.
-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang
> Date: Wed, 1 Sep 2010 10:30:44 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
>
> This is errors coming from HDFS, I would start looking at the datanode
> log on the same machine for any exceptions thrown at the same time.
> Also make sure your cluster is properly configured according to the
> last bullet point in the requirements
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>
> J-D
>
> 2010/8/31 xiujin yang <xi...@hotmail.com>:
> >
> >
> > HBase: 0.20.6
> > Hadoop: 0.20.2
> >
> > After I upgrage to 0.20.6,
> > It run no more than one week and one Region server down again.
> >
> > Please check HBase log:
> >
> >
> > http://pastebin.com/J9LugZ17
> >
> >
> >
> > HBase out :
> > http://pastebin.com/QKbpSMwq
> >
> >
> > Thank you in advance.
> >
> > Best,
> >
> > -- Xiujin Yang.
> > -----------------------------------------------------------------
> > My linkedin: http://cn.linkedin.com/in/xiujinyang
> >
> >
> >
> >
Re: Region servers down...
Posted by Jean-Daniel Cryans <jd...@apache.org>.
This is errors coming from HDFS, I would start looking at the datanode
log on the same machine for any exceptions thrown at the same time.
Also make sure your cluster is properly configured according to the
last bullet point in the requirements
http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
J-D
2010/8/31 xiujin yang <xi...@hotmail.com>:
>
>
> HBase: 0.20.6
> Hadoop: 0.20.2
>
> After I upgrage to 0.20.6,
> It run no more than one week and one Region server down again.
>
> Please check HBase log:
>
>
> http://pastebin.com/J9LugZ17
>
>
>
> HBase out :
> http://pastebin.com/QKbpSMwq
>
>
> Thank you in advance.
>
> Best,
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>
>
>
RE: Region servers down...
Posted by xiujin yang <xi...@hotmail.com>.
HBase: 0.20.6
Hadoop: 0.20.2
After I upgrage to 0.20.6,
It run no more than one week and one Region server down again.
Please check HBase log:
http://pastebin.com/J9LugZ17
HBase out :
http://pastebin.com/QKbpSMwq
Thank you in advance.
Best,
-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang
RE: Region servers down...
Posted by xiujin yang <xi...@hotmail.com>.
Thank you very much, J-D.
I was trapped by the problem for a long time.
Thank you again.
I will upgrade to 0.20.6.
Best regards,
Xiujin Yang.
> Date: Wed, 25 Aug 2010 09:30:55 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
>
> That's https://issues.apache.org/jira/browse/HBASE-2797, please
> upgrade to 0.20.6 (no migration needed, just copy over the configs).
>
> J-D
>
> 2010/8/24 xiujin yang <xi...@hotmail.com>:
> >
> > Thank you J-D.
> >
> > The out file is like this. It has an "NullPointerException" error.
> >
> > 2010-08-24 02:30:14.187::INFO: Logging to STDERR via org.mortbay.log.StdErrLog
> > 2010-08-24 02:30:14.187::INFO: jetty-6.1.14
> > 2010-08-24 02:30:14.122::INFO: Started SelectChannelConnector@0.0.0.0:60030
> > Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
> > at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
> > at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
> > at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
> > at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
> > at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
> > at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
> > at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
> > at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
> > at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
> > at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
> > at java.util.PriorityQueue.poll(PriorityQueue.java:523)
> > at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
> > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
> > at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
> > at org.apache.hadoop.hbase.Leases.run(Leases.java:98)
> >
> >
> >> Date: Tue, 24 Aug 2010 11:16:34 -0700
> >> Subject: Re: Region servers down...
> >> From: jdcryans@apache.org
> >> To: user@hbase.apache.org
> >>
> >> The last log to look at would be the .out file.
> >>
> >> J-D
> >>
> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >
> >> > Thank you J-D,
> >> >
> >> > I posted today's whole RS log:
> >> > http://pastebin.com/djGnNJxk
> >> >
> >> > GC log:
> >> > http://pastebin.com/AQH5kUCE
> >> >
> >> > I don't see the messages started with "We slept".
> >> >
> >> >
> >> >
> >> >
> >> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
> >> >> Subject: Re: Region servers down...
> >> >> From: jdcryans@apache.org
> >> >> To: user@hbase.apache.org
> >> >>
> >> >> I don't really see the cause of the shutdown in there, it seems it was
> >> >> already under way. Do you see messages starting with "We slept" and
> >> >> then telling how long it slept? It should be not very far from that in
> >> >> the log.
> >> >>
> >> >> J-D
> >> >>
> >> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >> >> >
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> >> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> >> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> >> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> >> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> >> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >> >> >
> >> >> >
> >> >> > Could anyone help me?
> >> >> >
> >> >> > Here is snippet from the region server log:
> >> >> > http://pastebin.com/YCUDLqc3
> >> >> >
> >> >> > Version:
> >> >> > HBase: 0.20.5
> >> >> > Hadoop: 0.20.2
> >> >> > Zookeeper: 3.3.0
> >> >> >
> >> >> >
> >> >> >
> >> >
> >
Re: Region servers down...
Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's https://issues.apache.org/jira/browse/HBASE-2797, please
upgrade to 0.20.6 (no migration needed, just copy over the configs).
J-D
2010/8/24 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D.
>
> The out file is like this. It has an "NullPointerException" error.
>
> 2010-08-24 02:30:14.187::INFO: Logging to STDERR via org.mortbay.log.StdErrLog
> 2010-08-24 02:30:14.187::INFO: jetty-6.1.14
> 2010-08-24 02:30:14.122::INFO: Started SelectChannelConnector@0.0.0.0:60030
> Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
> at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
> at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
> at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
> at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
> at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
> at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
> at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
> at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
> at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
> at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
> at java.util.PriorityQueue.poll(PriorityQueue.java:523)
> at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
> at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
> at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
> at org.apache.hadoop.hbase.Leases.run(Leases.java:98)
>
>
>> Date: Tue, 24 Aug 2010 11:16:34 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> The last log to look at would be the .out file.
>>
>> J-D
>>
>> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >
>> > Thank you J-D,
>> >
>> > I posted today's whole RS log:
>> > http://pastebin.com/djGnNJxk
>> >
>> > GC log:
>> > http://pastebin.com/AQH5kUCE
>> >
>> > I don't see the messages started with "We slept".
>> >
>> >
>> >
>> >
>> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
>> >> Subject: Re: Region servers down...
>> >> From: jdcryans@apache.org
>> >> To: user@hbase.apache.org
>> >>
>> >> I don't really see the cause of the shutdown in there, it seems it was
>> >> already under way. Do you see messages starting with "We slept" and
>> >> then telling how long it slept? It should be not very far from that in
>> >> the log.
>> >>
>> >> J-D
>> >>
>> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >> >
>> >> > Hi,
>> >> >
>> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
>> >> >
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
>> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
>> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
>> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
>> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>> >> >
>> >> >
>> >> > Could anyone help me?
>> >> >
>> >> > Here is snippet from the region server log:
>> >> > http://pastebin.com/YCUDLqc3
>> >> >
>> >> > Version:
>> >> > HBase: 0.20.5
>> >> > Hadoop: 0.20.2
>> >> > Zookeeper: 3.3.0
>> >> >
>> >> >
>> >> >
>> >
>
RE: Region servers down...
Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D.
The out file is like this. It has an "NullPointerException" error.
2010-08-24 02:30:14.187::INFO: Logging to STDERR via org.mortbay.log.StdErrLog
2010-08-24 02:30:14.187::INFO: jetty-6.1.14
2010-08-24 02:30:14.122::INFO: Started SelectChannelConnector@0.0.0.0:60030
Exception in thread "regionserver/192.168.158.187:60020.leaseChecker" java.lang.NullPointerException
at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:641)
at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
at java.util.PriorityQueue.poll(PriorityQueue.java:523)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
at org.apache.hadoop.hbase.regionserver.HRegionServer$ScannerListener.leaseExpired(HRegionServer.java:1962)
at org.apache.hadoop.hbase.Leases.run(Leases.java:98)
> Date: Tue, 24 Aug 2010 11:16:34 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
>
> The last log to look at would be the .out file.
>
> J-D
>
> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >
> > Thank you J-D,
> >
> > I posted today's whole RS log:
> > http://pastebin.com/djGnNJxk
> >
> > GC log:
> > http://pastebin.com/AQH5kUCE
> >
> > I don't see the messages started with "We slept".
> >
> >
> >
> >
> >> Date: Mon, 23 Aug 2010 23:00:32 -0700
> >> Subject: Re: Region servers down...
> >> From: jdcryans@apache.org
> >> To: user@hbase.apache.org
> >>
> >> I don't really see the cause of the shutdown in there, it seems it was
> >> already under way. Do you see messages starting with "We slept" and
> >> then telling how long it slept? It should be not very far from that in
> >> the log.
> >>
> >> J-D
> >>
> >> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >> >
> >> > Hi,
> >> >
> >> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >> >
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> >> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> >> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> >> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> >> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >> >
> >> >
> >> > Could anyone help me?
> >> >
> >> > Here is snippet from the region server log:
> >> > http://pastebin.com/YCUDLqc3
> >> >
> >> > Version:
> >> > HBase: 0.20.5
> >> > Hadoop: 0.20.2
> >> > Zookeeper: 3.3.0
> >> >
> >> >
> >> >
> >
Re: Region servers down...
Posted by Jean-Daniel Cryans <jd...@apache.org>.
The last log to look at would be the .out file.
J-D
2010/8/23 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D,
>
> I posted today's whole RS log:
> http://pastebin.com/djGnNJxk
>
> GC log:
> http://pastebin.com/AQH5kUCE
>
> I don't see the messages started with "We slept".
>
>
>
>
>> Date: Mon, 23 Aug 2010 23:00:32 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> I don't really see the cause of the shutdown in there, it seems it was
>> already under way. Do you see messages starting with "We slept" and
>> then telling how long it slept? It should be not very far from that in
>> the log.
>>
>> J-D
>>
>> 2010/8/23 xiujin yang <xi...@hotmail.com>:
>> >
>> > Hi,
>> >
>> > RS of HBase was frequently down when running. And job will failed after the region server down.
>> >
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
>> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
>> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
>> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
>> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>> >
>> >
>> > Could anyone help me?
>> >
>> > Here is snippet from the region server log:
>> > http://pastebin.com/YCUDLqc3
>> >
>> > Version:
>> > HBase: 0.20.5
>> > Hadoop: 0.20.2
>> > Zookeeper: 3.3.0
>> >
>> >
>> >
>
Re: Region servers down...
Posted by Ted Yu <yu...@gmail.com>.
It would be beneficial to separate the RS on 192.168.158.179 onto another
machine.
2010/8/23 xiujin yang <xi...@hotmail.com>
>
> Hi
>
> My cluster is in this way.
> Hadoop & HBase are deployed on different machine.
> HBase use the hdfs of Hadoop.
>
> Machine
>
> 4 CPU & 6 G RAM
>
>
>
> Hadoop
>
> 192.168.158.171
>
> 192.168.158.172
>
> 192.168.158.174Send
>
> 192.168.158.177
>
> 192.168.158.176
>
> 192.168.158.180
>
>
> 192.168.158.186
>
>
>
>
>
>
>
> HBase
>
> 192.168.158.179 HMaster & RS
>
>
>
>
> 192.168.158.187 RS
>
>
>
>
>
> 192.168.158.188 RS
>
>
>
>
>
>
>
> At first,we deployed all machine Hadoop & HBase & Mapreduce, we found it
> was easy to make RS down because of memory problem.
> Task of Map/Reduce will eat too much memeory. And Hbase need to use swap.
> So we divided them.
>
> Is this the reason? Or memory is tow small?
>
>
>
>
RE: Region servers down...
Posted by xiujin yang <xi...@hotmail.com>.
Hi
My cluster is in this way.
Hadoop & HBase are deployed on different machine.
HBase use the hdfs of Hadoop.
Machine
4 CPU & 6 G RAM
Hadoop
192.168.158.171
192.168.158.172
192.168.158.174Send
192.168.158.177
192.168.158.176
192.168.158.180
192.168.158.186
HBase
192.168.158.179 HMaster & RS
192.168.158.187 RS
192.168.158.188 RS
At first,we deployed all machine Hadoop & HBase & Mapreduce, we found it was easy to make RS down because of memory problem.
Task of Map/Reduce will eat too much memeory. And Hbase need to use swap. So we divided them.
Is this the reason? Or memory is tow small?
RE: Region servers down...
Posted by xiujin yang <xi...@hotmail.com>.
Thank you J-D,
I posted today's whole RS log:
http://pastebin.com/djGnNJxk
GC log:
http://pastebin.com/AQH5kUCE
I don't see the messages started with "We slept".
> Date: Mon, 23 Aug 2010 23:00:32 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
>
> I don't really see the cause of the shutdown in there, it seems it was
> already under way. Do you see messages starting with "We slept" and
> then telling how long it slept? It should be not very far from that in
> the log.
>
> J-D
>
> 2010/8/23 xiujin yang <xi...@hotmail.com>:
> >
> > Hi,
> >
> > RS of HBase was frequently down when running. And job will failed after the region server down.
> >
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> > [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> > [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> > [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> > [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
> >
> >
> > Could anyone help me?
> >
> > Here is snippet from the region server log:
> > http://pastebin.com/YCUDLqc3
> >
> > Version:
> > HBase: 0.20.5
> > Hadoop: 0.20.2
> > Zookeeper: 3.3.0
> >
> >
> >
Re: Region servers down...
Posted by Jean-Daniel Cryans <jd...@apache.org>.
I don't really see the cause of the shutdown in there, it seems it was
already under way. Do you see messages starting with "We slept" and
then telling how long it slept? It should be not very far from that in
the log.
J-D
2010/8/23 xiujin yang <xi...@hotmail.com>:
>
> Hi,
>
> RS of HBase was frequently down when running. And job will failed after the region server down.
>
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed whitetable,com.cnet.download:http/Seal-Online/3640-7540_4-10816413.html,1282619615378
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region server is shutting down at: 192.168.158.187:60020
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:14,929 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 192.168.158.187:60020
> [regionserver/192.168.158.187:60020.worker] 2010-08-24 04:15:15,803 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,829 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010005 closed
> [regionserver/192.168.158.187:60020] 2010-08-24 04:15:15,928 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/192.168.158.187:60020 exiting
> [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
> [Thread-17] 2010-08-24 04:15:16,115 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> [HCM.shutdownHook] 2010-08-24 04:15:16,115 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12a9f2d85010041 closed
>
>
> Could anyone help me?
>
> Here is snippet from the region server log:
> http://pastebin.com/YCUDLqc3
>
> Version:
> HBase: 0.20.5
> Hadoop: 0.20.2
> Zookeeper: 3.3.0
>
>
>