You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by xiujin yang <xi...@hotmail.com> on 2010/09/01 07:28:13 UTC

RE: Region servers down...


HBase: 0.20.6
Hadoop: 0.20.2

After I upgrage to 0.20.6, 
It run no more than one week and one Region server down again.  

Please check HBase log:


http://pastebin.com/J9LugZ17



HBase out :
http://pastebin.com/QKbpSMwq


Thank you in advance. 

Best,

-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.

What Stack said, and try setting your split size bigger on your tables
in order to limit the number of them. Bigger files = less smaller
files = less occupied xcievers to answer request to those files. See
the help in the shell for "alter", look for the MAX_FILESIZE value
(which is in bytes, defaults to 256MB, and try 1GB).

J-D

On Wed, Sep 1, 2010 at 10:25 PM, Stack <st...@duboce.net> wrote:
> Sounds like 2047 is not enough.  Up it again.  4k?
> St.Ack
>
> 2010/9/1 xiujin yang <xi...@hotmail.com>:
>>
>> Thank you J-D.
>>
>>
>> I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047"
>>
>>
>> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
>> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
>>    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>>    at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>> 172:
>> http://pastebin.com/cdw2svHT
>>
>> 177:
>> http://pastebin.com/iA3jxfuq
>>
>> Our cluster 4CPU & 6G RAM.  is as following:
>> HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX?
>>
>> Hadoop
>> 192.168.158.176 Master
>>
>>
>> 192.168.158.171 Slave
>> 192.168.158.172 Slave
>>
>> 192.168.158.174 Slave
>>
>> 192.168.158.177 Slave & SNN
>>
>> 192.168.158.180 Slave
>>
>> 192.168.158.186 Slave
>>
>>
>> HBase Only
>> 192.168.158.179  HMaster & RS & ZK
>>
>>
>> 192.168.158.187  RS & ZK
>>
>>
>> 192.168.158.188  RS & ZK
>>
>>
>>
>>
>> Thank you in advance.
>>
>> -- Xiujin Yang.
>> -----------------------------------------------------------------
>> My linkedin: http://cn.linkedin.com/in/xiujinyang
>>
>>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>>> Subject: Re: Region servers down...
>>> From: jdcryans@apache.org
>>> To: user@hbase.apache.org
>>>
>>> This is errors coming from HDFS, I would start looking at the datanode
>>> log on the same machine for any exceptions thrown at the same time.
>>> Also make sure your cluster is properly configured according to the
>>> last bullet point in the requirements
>>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>>
>>> J-D
>>>
>>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>>> >
>>> >
>>> > HBase: 0.20.6
>>> > Hadoop: 0.20.2
>>> >
>>> > After I upgrage to 0.20.6,
>>> > It run no more than one week and one Region server down again.
>>> >
>>> > Please check HBase log:
>>> >
>>> >
>>> > http://pastebin.com/J9LugZ17
>>> >
>>> >
>>> >
>>> > HBase out :
>>> > http://pastebin.com/QKbpSMwq
>>> >
>>> >
>>> > Thank you in advance.
>>> >
>>> > Best,
>>> >
>>> > -- Xiujin Yang.
>>> > -----------------------------------------------------------------
>>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>>> >
>>> >
>>> >
>>> >
>>
>

Re: Region servers down...

Posted by Stack <st...@duboce.net>.

Sounds like 2047 is not enough.  Up it again.  4k?
St.Ack

2010/9/1 xiujin yang <xi...@hotmail.com>:
>
> Thank you J-D.
>
>
> I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047"
>
>
> [2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
>    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>    at java.lang.Thread.run(Thread.java:619)
>
>
>
> 172:
> http://pastebin.com/cdw2svHT
>
> 177:
> http://pastebin.com/iA3jxfuq
>
> Our cluster 4CPU & 6G RAM.  is as following:
> HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX?
>
> Hadoop
> 192.168.158.176 Master
>
>
> 192.168.158.171 Slave
> 192.168.158.172 Slave
>
> 192.168.158.174 Slave
>
> 192.168.158.177 Slave & SNN
>
> 192.168.158.180 Slave
>
> 192.168.158.186 Slave
>
>
> HBase Only
> 192.168.158.179  HMaster & RS & ZK
>
>
> 192.168.158.187  RS & ZK
>
>
> 192.168.158.188  RS & ZK
>
>
>
>
> Thank you in advance.
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>> Date: Wed, 1 Sep 2010 10:30:44 -0700
>> Subject: Re: Region servers down...
>> From: jdcryans@apache.org
>> To: user@hbase.apache.org
>>
>> This is errors coming from HDFS, I would start looking at the datanode
>> log on the same machine for any exceptions thrown at the same time.
>> Also make sure your cluster is properly configured according to the
>> last bullet point in the requirements
>> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>>
>> J-D
>>
>> 2010/8/31 xiujin yang <xi...@hotmail.com>:
>> >
>> >
>> > HBase: 0.20.6
>> > Hadoop: 0.20.2
>> >
>> > After I upgrage to 0.20.6,
>> > It run no more than one week and one Region server down again.
>> >
>> > Please check HBase log:
>> >
>> >
>> > http://pastebin.com/J9LugZ17
>> >
>> >
>> >
>> > HBase out :
>> > http://pastebin.com/QKbpSMwq
>> >
>> >
>> > Thank you in advance.
>> >
>> > Best,
>> >
>> > -- Xiujin Yang.
>> > -----------------------------------------------------------------
>> > My linkedin: http://cn.linkedin.com/in/xiujinyang
>> >
>> >
>> >
>> >
>

RE: Region servers down...

Posted by xiujin yang <xi...@hotmail.com>.

Thank you J-D.


I've checked two datanode log and found the same error.  "exceeds the limit of concurrent xcievers 2047" 


[2010-08-31 10:43:26][ERROR][org.apache.hadoop.hdfs.server.datanode.DataXceiver@5a809419][org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:131)] DatanodeRegistration(192.168.158.172:50010, storageID=DS-1961101492-192.168.158.172-50010-1273570850144, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 2048 exceeds the limit of concurrent xcievers 2047
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
    at java.lang.Thread.run(Thread.java:619)



172:
http://pastebin.com/cdw2svHT

177:
http://pastebin.com/iA3jxfuq

Our cluster 4CPU & 6G RAM.  is as following:  
HBase region server node don't have HDFS. Is this related with the error?  Do I need to increase the xcievers from 2047 --> XX? 

Hadoop 
192.168.158.176 Master


192.168.158.171 Slave
192.168.158.172 Slave

192.168.158.174 Slave

192.168.158.177 Slave & SNN

192.168.158.180 Slave

192.168.158.186 Slave


HBase Only 
192.168.158.179  HMaster & RS & ZK


192.168.158.187  RS & ZK


192.168.158.188  RS & ZK




Thank you in advance. 

-- Xiujin Yang.
-----------------------------------------------------------------
My linkedin: http://cn.linkedin.com/in/xiujinyang

> Date: Wed, 1 Sep 2010 10:30:44 -0700
> Subject: Re: Region servers down...
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> 
> This is errors coming from HDFS, I would start looking at the datanode
> log on the same machine for any exceptions thrown at the same time.
> Also make sure your cluster is properly configured according to the
> last bullet point in the requirements
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
> 
> J-D
> 
> 2010/8/31 xiujin yang <xi...@hotmail.com>:
> >
> >
> > HBase: 0.20.6
> > Hadoop: 0.20.2
> >
> > After I upgrage to 0.20.6,
> > It run no more than one week and one Region server down again.
> >
> > Please check HBase log:
> >
> >
> > http://pastebin.com/J9LugZ17
> >
> >
> >
> > HBase out :
> > http://pastebin.com/QKbpSMwq
> >
> >
> > Thank you in advance.
> >
> > Best,
> >
> > -- Xiujin Yang.
> > -----------------------------------------------------------------
> > My linkedin: http://cn.linkedin.com/in/xiujinyang
> >
> >
> >
> >

Re: Region servers down...

Posted by Jean-Daniel Cryans <jd...@apache.org>.

This is errors coming from HDFS, I would start looking at the datanode
log on the same machine for any exceptions thrown at the same time.
Also make sure your cluster is properly configured according to the
last bullet point in the requirements
http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements

J-D

2010/8/31 xiujin yang <xi...@hotmail.com>:
>
>
> HBase: 0.20.6
> Hadoop: 0.20.2
>
> After I upgrage to 0.20.6,
> It run no more than one week and one Region server down again.
>
> Please check HBase log:
>
>
> http://pastebin.com/J9LugZ17
>
>
>
> HBase out :
> http://pastebin.com/QKbpSMwq
>
>
> Thank you in advance.
>
> Best,
>
> -- Xiujin Yang.
> -----------------------------------------------------------------
> My linkedin: http://cn.linkedin.com/in/xiujinyang
>
>
>
>