You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Rui Xing <xi...@gmail.com> on 2008/10/07 11:41:37 UTC

region server problem

Hi All,

1). We are doing performance testing on hbase. The environment of the
testing is 3 data nodes, and 1 name node distributed on 4 machines. We
started one region server on each data node respectively. To insert the
data, one insertion client is started on each data node machine. But as the
data inserted, the region servers crashed one by one. One of the reasons is
listed as follows:

*==>
2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
while reading from blk_-806310822584979460 of
/hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*

... ...

*2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-806310822584979460 from any node:  java.io.IOException: No
live nodes contain current block*
2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region search1,r3_1_3_c157476,1223360357528 in
18mins, 39sec
2008-10-07 14:52:25,238 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0.0.0.0:60020.compactor exiting
2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed search1,r3_1_3_c157476,1223360357528
2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed -ROOT-,,0
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
10.2.6.104:60020
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
0.0.0.0:60020 exiting
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
===<

2). Another question is, under what circunstance will the region server
print logs of the thread information as below? It appears among the normal
log records.
===>
35 active threads
Thread 1281 (IPC Client connection to d3v1.corp.alimama.com/10.2.6.101:54310
):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.util.Hashtable.remove(Hashtable.java:435)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
... ...
===<

We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if any
clues can be dropped.

Regards,
-Ray

Re: region server problem

Posted by stack <st...@duboce.net>.

If you haven't upped the ulimit for file descriptors, if you have more 
than a handful of regions in your cluster, you start to experience 
'weirdness'.  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  You might 
not see the 'too many open files' message in your hbase logs (might be 
in your datanode logs) but symptom of not-enough-fds can be various, 
like an OOME.

St.Ack


Slava Gorelik wrote:
> Hi.
> I'll send log little bit later, with all answers on your questions, but what
> do you mean - "You have upped your file descriptors?" ?
>
> Best Regards.
>
>
> On Wed, Oct 8, 2008 at 11:41 PM, stack <st...@duboce.net> wrote:
>
>   
>> You have DEBUG enabled?  Can I see log from the regionserver that went
>> down?  Can you tell me more about your cluster? Number of nodes, number of
>> regions?  What your uploader looks like (is it a MR job)?  You have upped
>> your file descriptors?
>>
>> Thanks Slava.
>> St.Ack
>>
>>
>>
>> Slava Gorelik wrote:
>>
>>     
>>> HI.I'm also encountering error like this.
>>> I'm using Hbase 0.18.0 an Hadoop 0.18.0.
>>> I addition to this error, i'm getting that sometimes region servers are
>>> died, in the log i see region server shutdown, after starting compaction,
>>> because that some data blocks are not found.
>>>
>>> Best Regards.
>>>
>>> On Wed, Oct 8, 2008 at 11:29 PM, stack <st...@duboce.net> wrote:
>>>
>>>
>>>
>>>       
>>>> You should update to 0.2.1 if you can.  Make sure you've upped your file
>>>> descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also
>>>> see
>>>> how to enable DEBUG in same FAQ.
>>>>
>>>> Something odd is up when you see messages like this out of HDFS: ': No
>>>> live
>>>> nodes contain current block*'.  Thats lost data.
>>>>
>>>> Or messages like this, 'compaction completed on region
>>>> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
>>>> compactions are taking so long -- would seem to indicate your machines
>>>> are
>>>> severly overloaded or underpowered or both.  Can you study load when the
>>>> upload is running on these machines?  Perhaps try  throttling back to see
>>>> if
>>>> hbase survives longer?
>>>>
>>>> The regionserver will output thread dump in its RPC layer if critical
>>>> error
>>>> -- OOME -- or its been hung up for a long time IIRC.
>>>>
>>>> Check the '.out' logs too for you hbase install to see if they contain
>>>> any
>>>> errors.  Grep the datanode logs too for OOME or "too many open file
>>>> handles".
>>>>
>>>> St.Ack
>>>>
>>>> Rui Xing wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> Hi All,
>>>>>
>>>>> 1). We are doing performance testing on hbase. The environment of the
>>>>> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
>>>>> started one region server on each data node respectively. To insert the
>>>>> data, one insertion client is started on each data node machine. But as
>>>>> the
>>>>> data inserted, the region servers crashed one by one. One of the reasons
>>>>> is
>>>>> listed as follows:
>>>>>
>>>>> *==>
>>>>> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
>>>>> while reading from blk_-806310822584979460 of
>>>>> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
>>>>> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>>>>>
>>>>> ... ...
>>>>>
>>>>> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
>>>>> obtain block blk_-806310822584979460 from any node:
>>>>>  java.io.IOExceptionYou
>>>>>
>>>>> 2008-10-07 14:52:25,229 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
>>>>> 18mins, 39sec
>>>>> 2008-10-07 14:52:25,238 INFO
>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>>>>> regionserver/0.0.0.0:60020.compactor exiting
>>>>> 2008-10-07 14:52:25,284 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>> closed search1,r3_1_3_c157476,1223360357528
>>>>> 2008-10-07 14:52:25,291 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>> closed -ROOT-,,0
>>>>> 2008-10-07 14:52:25,291 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
>>>>> 10.2.6.104:60020
>>>>> 2008-10-07 14:52:25,291 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>>>>> 0.0.0.0:60020 exiting
>>>>> 2008-10-07 14:52:25,511 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
>>>>> thread.
>>>>> 2008-10-07 14:52:25,511 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
>>>>> complete
>>>>> ===<
>>>>>
>>>>> 2). Another question is, under what circunstance will the region server
>>>>> print logs of the thread information as below? It appears among the
>>>>> normal
>>>>> log records.
>>>>> ===>
>>>>> 35 active threads
>>>>> Thread 1281 (IPC Client connection to
>>>>> d3v1.corp.alimama.com/10.2.6.101:54310
>>>>> ):
>>>>>  State: RUNNABLE
>>>>>  Blocked count: 0
>>>>>  Waited count: 0
>>>>>  Stack:
>>>>>   java.util.Hashtable.remove(Hashtable.java:435)
>>>>>   org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
>>>>> ... ...
>>>>> ===<
>>>>>
>>>>> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated
>>>>> if
>>>>> any
>>>>> clues can be dropped.
>>>>>
>>>>> Regards,
>>>>> -Ray
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>         
>>>
>>>       
>>     
>
>

Re: region server problem

Posted by Slava Gorelik <sl...@gmail.com>.

Hi.
I'll send log little bit later, with all answers on your questions, but what
do you mean - "You have upped your file descriptors?" ?

Best Regards.


On Wed, Oct 8, 2008 at 11:41 PM, stack <st...@duboce.net> wrote:

> You have DEBUG enabled?  Can I see log from the regionserver that went
> down?  Can you tell me more about your cluster? Number of nodes, number of
> regions?  What your uploader looks like (is it a MR job)?  You have upped
> your file descriptors?
>
> Thanks Slava.
> St.Ack
>
>
>
> Slava Gorelik wrote:
>
>> HI.I'm also encountering error like this.
>> I'm using Hbase 0.18.0 an Hadoop 0.18.0.
>> I addition to this error, i'm getting that sometimes region servers are
>> died, in the log i see region server shutdown, after starting compaction,
>> because that some data blocks are not found.
>>
>> Best Regards.
>>
>> On Wed, Oct 8, 2008 at 11:29 PM, stack <st...@duboce.net> wrote:
>>
>>
>>
>>> You should update to 0.2.1 if you can.  Make sure you've upped your file
>>> descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also
>>> see
>>> how to enable DEBUG in same FAQ.
>>>
>>> Something odd is up when you see messages like this out of HDFS: ': No
>>> live
>>> nodes contain current block*'.  Thats lost data.
>>>
>>> Or messages like this, 'compaction completed on region
>>> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
>>> compactions are taking so long -- would seem to indicate your machines
>>> are
>>> severly overloaded or underpowered or both.  Can you study load when the
>>> upload is running on these machines?  Perhaps try  throttling back to see
>>> if
>>> hbase survives longer?
>>>
>>> The regionserver will output thread dump in its RPC layer if critical
>>> error
>>> -- OOME -- or its been hung up for a long time IIRC.
>>>
>>> Check the '.out' logs too for you hbase install to see if they contain
>>> any
>>> errors.  Grep the datanode logs too for OOME or "too many open file
>>> handles".
>>>
>>> St.Ack
>>>
>>> Rui Xing wrote:
>>>
>>>
>>>
>>>> Hi All,
>>>>
>>>> 1). We are doing performance testing on hbase. The environment of the
>>>> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
>>>> started one region server on each data node respectively. To insert the
>>>> data, one insertion client is started on each data node machine. But as
>>>> the
>>>> data inserted, the region servers crashed one by one. One of the reasons
>>>> is
>>>> listed as follows:
>>>>
>>>> *==>
>>>> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
>>>> while reading from blk_-806310822584979460 of
>>>> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
>>>> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>>>>
>>>> ... ...
>>>>
>>>> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
>>>> obtain block blk_-806310822584979460 from any node:
>>>>  java.io.IOExceptionYou
>>>>
>>>> 2008-10-07 14:52:25,229 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
>>>> 18mins, 39sec
>>>> 2008-10-07 14:52:25,238 INFO
>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>>>> regionserver/0.0.0.0:60020.compactor exiting
>>>> 2008-10-07 14:52:25,284 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> closed search1,r3_1_3_c157476,1223360357528
>>>> 2008-10-07 14:52:25,291 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> closed -ROOT-,,0
>>>> 2008-10-07 14:52:25,291 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
>>>> 10.2.6.104:60020
>>>> 2008-10-07 14:52:25,291 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>>>> 0.0.0.0:60020 exiting
>>>> 2008-10-07 14:52:25,511 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
>>>> thread.
>>>> 2008-10-07 14:52:25,511 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
>>>> complete
>>>> ===<
>>>>
>>>> 2). Another question is, under what circunstance will the region server
>>>> print logs of the thread information as below? It appears among the
>>>> normal
>>>> log records.
>>>> ===>
>>>> 35 active threads
>>>> Thread 1281 (IPC Client connection to
>>>> d3v1.corp.alimama.com/10.2.6.101:54310
>>>> ):
>>>>  State: RUNNABLE
>>>>  Blocked count: 0
>>>>  Waited count: 0
>>>>  Stack:
>>>>   java.util.Hashtable.remove(Hashtable.java:435)
>>>>   org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
>>>> ... ...
>>>> ===<
>>>>
>>>> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated
>>>> if
>>>> any
>>>> clues can be dropped.
>>>>
>>>> Regards,
>>>> -Ray
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>

Re: region server problem

Posted by stack <st...@duboce.net>.

You have DEBUG enabled?  Can I see log from the regionserver that went 
down?  Can you tell me more about your cluster? Number of nodes, number 
of regions?  What your uploader looks like (is it a MR job)?  You have 
upped your file descriptors?

Thanks Slava.
St.Ack


Slava Gorelik wrote:
> HI.I'm also encountering error like this.
> I'm using Hbase 0.18.0 an Hadoop 0.18.0.
> I addition to this error, i'm getting that sometimes region servers are
> died, in the log i see region server shutdown, after starting compaction,
> because that some data blocks are not found.
>
> Best Regards.
>
> On Wed, Oct 8, 2008 at 11:29 PM, stack <st...@duboce.net> wrote:
>
>   
>> You should update to 0.2.1 if you can.  Make sure you've upped your file
>> descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also see
>> how to enable DEBUG in same FAQ.
>>
>> Something odd is up when you see messages like this out of HDFS: ': No live
>> nodes contain current block*'.  Thats lost data.
>>
>> Or messages like this, 'compaction completed on region
>> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
>> compactions are taking so long -- would seem to indicate your machines are
>> severly overloaded or underpowered or both.  Can you study load when the
>> upload is running on these machines?  Perhaps try  throttling back to see if
>> hbase survives longer?
>>
>> The regionserver will output thread dump in its RPC layer if critical error
>> -- OOME -- or its been hung up for a long time IIRC.
>>
>> Check the '.out' logs too for you hbase install to see if they contain any
>> errors.  Grep the datanode logs too for OOME or "too many open file
>> handles".
>>
>> St.Ack
>>
>> Rui Xing wrote:
>>
>>     
>>> Hi All,
>>>
>>> 1). We are doing performance testing on hbase. The environment of the
>>> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
>>> started one region server on each data node respectively. To insert the
>>> data, one insertion client is started on each data node machine. But as
>>> the
>>> data inserted, the region servers crashed one by one. One of the reasons
>>> is
>>> listed as follows:
>>>
>>> *==>
>>> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
>>> while reading from blk_-806310822584979460 of
>>> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
>>> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>>>
>>> ... ...
>>>
>>> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
>>> obtain block blk_-806310822584979460 from any node:
>>>  java.io.IOExceptionYou
>>>
>>> 2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
>>> 18mins, 39sec
>>> 2008-10-07 14:52:25,238 INFO
>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>>> regionserver/0.0.0.0:60020.compactor exiting
>>> 2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>> closed search1,r3_1_3_c157476,1223360357528
>>> 2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>> closed -ROOT-,,0
>>> 2008-10-07 14:52:25,291 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
>>> 10.2.6.104:60020
>>> 2008-10-07 14:52:25,291 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>>> 0.0.0.0:60020 exiting
>>> 2008-10-07 14:52:25,511 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
>>> thread.
>>> 2008-10-07 14:52:25,511 INFO
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
>>> complete
>>> ===<
>>>
>>> 2). Another question is, under what circunstance will the region server
>>> print logs of the thread information as below? It appears among the normal
>>> log records.
>>> ===>
>>> 35 active threads
>>> Thread 1281 (IPC Client connection to
>>> d3v1.corp.alimama.com/10.2.6.101:54310
>>> ):
>>>  State: RUNNABLE
>>>  Blocked count: 0
>>>  Waited count: 0
>>>  Stack:
>>>    java.util.Hashtable.remove(Hashtable.java:435)
>>>    org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
>>> ... ...
>>> ===<
>>>
>>> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if
>>> any
>>> clues can be dropped.
>>>
>>> Regards,
>>> -Ray
>>>
>>>
>>>
>>>       
>>     
>
>

Re: region server problem

Posted by Slava Gorelik <sl...@gmail.com>.

HI.I'm also encountering error like this.
I'm using Hbase 0.18.0 an Hadoop 0.18.0.
I addition to this error, i'm getting that sometimes region servers are
died, in the log i see region server shutdown, after starting compaction,
because that some data blocks are not found.

Best Regards.

On Wed, Oct 8, 2008 at 11:29 PM, stack <st...@duboce.net> wrote:

> You should update to 0.2.1 if you can.  Make sure you've upped your file
> descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also see
> how to enable DEBUG in same FAQ.
>
> Something odd is up when you see messages like this out of HDFS: ': No live
> nodes contain current block*'.  Thats lost data.
>
> Or messages like this, 'compaction completed on region
> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
> compactions are taking so long -- would seem to indicate your machines are
> severly overloaded or underpowered or both.  Can you study load when the
> upload is running on these machines?  Perhaps try  throttling back to see if
> hbase survives longer?
>
> The regionserver will output thread dump in its RPC layer if critical error
> -- OOME -- or its been hung up for a long time IIRC.
>
> Check the '.out' logs too for you hbase install to see if they contain any
> errors.  Grep the datanode logs too for OOME or "too many open file
> handles".
>
> St.Ack
>
> Rui Xing wrote:
>
>> Hi All,
>>
>> 1). We are doing performance testing on hbase. The environment of the
>> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
>> started one region server on each data node respectively. To insert the
>> data, one insertion client is started on each data node machine. But as
>> the
>> data inserted, the region servers crashed one by one. One of the reasons
>> is
>> listed as follows:
>>
>> *==>
>> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
>> while reading from blk_-806310822584979460 of
>> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
>> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>>
>> ... ...
>>
>> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
>> obtain block blk_-806310822584979460 from any node:
>>  java.io.IOExceptionYou
>>
>> 2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
>> 18mins, 39sec
>> 2008-10-07 14:52:25,238 INFO
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>> regionserver/0.0.0.0:60020.compactor exiting
>> 2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> closed search1,r3_1_3_c157476,1223360357528
>> 2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> closed -ROOT-,,0
>> 2008-10-07 14:52:25,291 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
>> 10.2.6.104:60020
>> 2008-10-07 14:52:25,291 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>> 0.0.0.0:60020 exiting
>> 2008-10-07 14:52:25,511 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
>> thread.
>> 2008-10-07 14:52:25,511 INFO
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
>> complete
>> ===<
>>
>> 2). Another question is, under what circunstance will the region server
>> print logs of the thread information as below? It appears among the normal
>> log records.
>> ===>
>> 35 active threads
>> Thread 1281 (IPC Client connection to
>> d3v1.corp.alimama.com/10.2.6.101:54310
>> ):
>>  State: RUNNABLE
>>  Blocked count: 0
>>  Waited count: 0
>>  Stack:
>>    java.util.Hashtable.remove(Hashtable.java:435)
>>    org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
>> ... ...
>> ===<
>>
>> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if
>> any
>> clues can be dropped.
>>
>> Regards,
>> -Ray
>>
>>
>>
>
>

Re: region server problem

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Ray,

A region is the basic unit of distribution in HBase. A new table starts with
only 1 region and more will be created with region splits. This by default
happens when a single family has more than 256MB. It is configurable in the
conf files or by table,
HBASE-903<http://issues.apache.org/jira/browse/HBASE-903>describes a
way for doing it in the shell. For example, setting the value to
64MB will enable a lot of splits early in the uploading process then, when
you have sufficient distribution, you can gradually up it (don't forget to
lower the memcache size too of you go near 64MB). Andrew Purtell is also
working on a sweet feature to force splits in
HBASE-902<http://issues.apache.org/jira/browse/HBASE-902>
.

Regards load balancing in general, currently the master doesn't handle "hot
regions" so if one region gets all the hits, you can't do much (appart from
what I wrote before).

J-D

On Mon, Oct 13, 2008 at 6:19 AM, Rui Xing <xi...@gmail.com> wrote:

> It was solved by upping file descriptor number and upgrading hbase version.
> The data can be loaded successfully now. Thanks a million for the advices.
>
>
>
> But we have observed another weird problem. Nearly all insertion requests
> were directed to one region server. So this incurs another question. How
> does master do overload balance?
>
>
>
> We started 50 insertion clients calling hbase APIs and one table on server
> side was created to store data.
>
>
>
> Thanks
>
> -Ray
>
> On Mon, Oct 13, 2008 at 12:50 AM, Slava Gorelik <slava.gorelik@gmail.com
> >wrote:
>
> > In this thread i was asked to provide some information about my hbase
> > cluster and some logs. So :
> > 1) Hadoop cluster is :
> >
> > Cluster Summary
> >  * * * 4961 files and directories, 2922 blocks = 7883 total. Heap Size is
> > 10.38 MB / 888.94 MB (1%)
> > *   Capacity : 814.33 GB DFS Remaining : 693.71 GB  DFS Used : 42.46 GB
> DFS
> > Used% : 5.21 %
> >
> >  Live Datanodes : 7
> >
> >  Node Last Contact  Admin State Size (GB) Used (%) Used (%) Remaining
> (GB)
> > Blocksredhat010 2In Service 113.434.89 93.981053 redhat011 0In
> > Service115.26
> > 5.34 97.09 1273 redhat012 1In Service115.265.36 96.95 1162 redhat013 0In
> > Service115.264.91 97.69 1291 redhat014 1In Service115.265.48 96.99 1361
> > redhat015 2In Service115.265.39 97.12 1291 suse010 1In Service124.625.13
> > 113.88 1335
> >
> > 2) Hbase cluster, the filesize is changed to 64mb and also flushsize to
> > 16mb
> > (because of small data that is written frequently):
> > Master Attributes Attribute NameValueDescription HBase Version0.18.0,
> > r697626HBase version and svn revision HBase CompiledSun Sep 21 16:00:50
> PDT
> > 2008, stackWhen HBase version was compiled and by whom Hadoop
> > Version0.18.0,
> > r686010Hadoop version and svn revision Hadoop CompiledThu Aug 14 19:48:33
> > UTC 2008, hadoopqaWhen Hadoop version was compiled and by whom Filesystem
> > hdfs://REDHAT010:9000/hbaseFilesystem HBase is running on HBase Root
> > Directoryhdfs://REDHAT010:9000/hbaseLocation of HBase home directory Load
> > average43.0Average load across all region servers. Naive computation.
> > Catalog
> > Tables TableDescription -ROOT-The -ROOT- table holds references to all
> > .META. regions. .META.The .META. table holds references to all User Table
> > regions User Tables
> >
> > 1 table(s) in set.
> >  TableDescription BizDB {NAME => 'BizDB', IS_ROOT => 'false', IS_META =>
> > 'false', FAMILIES => [{NAME => 'BusinessObject', BLOOMFILTER => 'false',
> > VERSIONS => '3', COMPRESSION => 'NONE', LENGTH => '2147483647', TTL =>
> > '-1',
> > IN_MEMORY => 'false', BLOCKCACHE => 'false'}]} Region Servers
> AddressStart
> > CodeLoad redhat011:600201223827465065requests: 0 regions: 43
> > redhat012:60020
> > 1223827465975requests: 0 regions: 43
> redhat013:600201223827465712requests:
> > 0
> > regions: 43 redhat014:600201223827465249requests: 0 regions: 43
> > redhat015.:600201223827465108requests: 0 regions: 43 suse010:60020
> > 1223813153133requests: 0 regions: 43 Total: servers: 6 requests: 0
> regions:
> > 258
> >
> > 3) Uploader is a simple java program that user BatchUpdate to upload.
> > 4) Descriptors are not upped.
> > 5) Logs from region server, i found number of exception on the same
> region
> > server:
> > 2008-10-08 07:41:58,246 WARN org.apache.hadoop.dfs.DFSClient: Exception
> > while reading from blk_2538465098022552520_15050 of
> > /hbase/BizDB/486345958/BusinessObject/mapfiles/8802744696946937845/data
> > from
> > 10.26.237.141:50010: java.io.IOException: Premeture EOF from inputStream
> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
> > at
> > org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:996)
> > at
> >
> >
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
> > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
> > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> > at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:858)
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1384)
> > at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1420)
> > at java.io.DataInputStream.readFully(DataInputStream.java:178)
> > at
> >
> >
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
> > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1830)
> > at org.apache.hadoop.io.MapFile$Reader.seekInternal(MapFile.java:463)
> > at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:558)
> > at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:541)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.getClosest(HStoreFile.java:761)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HStore.getFullFromMapFile(HStore.java:1179)
> > at org.apache.hadoop.hbase.regionserver.HStore.getFull(HStore.java:1160)
> > at
> org.apache.hadoop.hbase.regionserver.HRegion.getFull(HRegion.java:1221)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRow(HRegionServer.java:1036)
> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> >
> > Another exception:
> >
> > 2008-10-08 08:19:22,218 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > starting compaction on region
> > BizDB,1.1.PerfBO1.5eaecb0b-235f-4d62-bae3-f8e42a3f65ee,1223410715671
> > 2008-10-08 08:19:22,285 INFO org.apache.hadoop.hbase.regionserver.HLog:
> New
> > log writer created at
> > /hbase/log_10.26.237.141_1223394485409_60020/hlog.dat.1223446762266
> > 2008-10-08 08:19:22,370 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:22,370 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-2877152584708860910_17060
> > 2008-10-08 08:19:22,427 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:22,427 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_8480966852058311110_17062
> > 2008-10-08 08:19:22,822 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:22,822 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_1836763064916871218_17062
> > 2008-10-08 08:19:28,402 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Bad connect ack with
> > firstBadLink 10.26.237.138:50010
> > 2008-10-08 08:19:28,403 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-6294347938555137047_17063
> > 2008-10-08 08:19:28,432 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Bad connect ack with
> > firstBadLink 10.26.237.137:50010
> > 2008-10-08 08:19:28,432 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_5692207445386295686_17063
> > 2008-10-08 08:19:28,828 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Bad connect ack with
> > firstBadLink 10.26.237.139:50010
> > 2008-10-08 08:19:28,828 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-5426084204552912284_17063
> > 2008-10-08 08:19:34,439 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Bad connect ack with
> > firstBadLink 10.26.237.139:50010
> > 2008-10-08 08:19:34,440 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-3084067451611865531_17065
> > 2008-10-08 08:19:34,941 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Bad connect ack with
> > firstBadLink 10.26.237.140:50010
> > 2008-10-08 08:19:34,941 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_8531979798217012059_17068
> > 2008-10-08 08:19:40,444 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:40,445 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-690757568573941572_17074
> > 2008-10-08 08:19:40,946 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:40,946 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-3282075547420544604_17074
> > 2008-10-08 08:19:46,447 WARN org.apache.hadoop.dfs.DFSClient:
> DataStreamer
> > Exception: java.io.IOException: Unable to create new block.
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)
> >
> > 2008-10-08 08:19:46,447 WARN org.apache.hadoop.dfs.DFSClient: Error
> > Recovery
> > for block blk_-690757568573941572_17074 bad datanode[0]
> > 2008-10-08 08:19:46,459 ERROR
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> > failed for region
> > BizDB,1.1.PerfBO1.5eaecb0b-235f-4d62-bae3-f8e42a3f65ee,1223410715671
> > java.io.IOException: Could not get block locations. Aborting...
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
> > 2008-10-08 08:19:46,461 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > starting compaction on region
> > BizDB,1.1.PerfBO1.5c9d1b38-bb27-4693-9466-ded9b4e8c59e,1223412524168
> > 2008-10-08 08:19:46,564 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor
> > 2008-10-08 08:19:46,569 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:46,569 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-2167445393797967261_17083
> > 2008-10-08 08:19:46,951 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:46,951 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_5449441848613806871_17083
> > 2008-10-08 08:19:52,573 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:52,574 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-7625939221720637541_17092
> > 2008-10-08 08:19:52,955 INFO org.apache.hadoop.dfs.DFSClient: Exception
> in
> > createBlockOutputStream java.io.IOException: Could not read from stream
> > 2008-10-08 08:19:52,955 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-1769229717555876257_17092
> > 2008-10-08 08:19:58,957 WARN org.apache.hadoop.dfs.DFSClient:
> DataStreamer
> > Exception: java.io.IOException: Unable to create new block.
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
> > at
> >
> >
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)
> >
> > 2008-10-08 08:19:58,958 WARN org.apache.hadoop.dfs.DFSClient: Error
> > Recovery
> > for block blk_-1769229717555876257_17092 bad datanode[0]
> > 2008-10-08 08:19:58,958 FATAL
> org.apache.hadoop.hbase.regionserver.Flusher:
> > Replay of hlog required. Forcing server restart
> >
> > And another one:
> >
> > 2008-10-07 22:50:57,896 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Starting split of region
> > BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
> > 2008-10-07 22:50:58,163 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > closed
> BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
> > 2008-10-07 22:50:58,336 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 6 on 60020, call batchUpdate([B@154c8c3c, row =>
> > 1.1.PerfBO1.109900e7-af7b-4bf4-b682-50a46760701c, {column =>
> > BusinessObject:s2, value => '...', column => BusinessObject:s1, value =>
> > '...', column => BusinessObject:@@identifier@@, value => '...'}, -1)
> from
> > 10.26.237.185:37696: error:
> > org.apache.hadoop.hbase.NotServingRegionException: Region
> > BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
> closed
> > org.apache.hadoop.hbase.NotServingRegionException: Region
> > BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
> closed
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1810)
> > at
> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1875)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1406)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1380)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdate(HRegionServer.java:1109)
> > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> > 2008-10-07 22:50:58,951 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > region
> >
> >
> BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223412657926/1465662157
> > available
> > 2008-10-07 22:50:58,952 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > closed
> BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223412657926
> >
> >
> > Best Regards.
> >
>

Re: region server problem

Posted by Rui Xing <xi...@gmail.com>.

It was solved by upping file descriptor number and upgrading hbase version.
The data can be loaded successfully now. Thanks a million for the advices.



But we have observed another weird problem. Nearly all insertion requests
were directed to one region server. So this incurs another question. How
does master do overload balance?



We started 50 insertion clients calling hbase APIs and one table on server
side was created to store data.



Thanks

-Ray

On Mon, Oct 13, 2008 at 12:50 AM, Slava Gorelik <sl...@gmail.com>wrote:

> In this thread i was asked to provide some information about my hbase
> cluster and some logs. So :
> 1) Hadoop cluster is :
>
> Cluster Summary
>  * * * 4961 files and directories, 2922 blocks = 7883 total. Heap Size is
> 10.38 MB / 888.94 MB (1%)
> *   Capacity : 814.33 GB DFS Remaining : 693.71 GB  DFS Used : 42.46 GB DFS
> Used% : 5.21 %
>
>  Live Datanodes : 7
>
>  Node Last Contact  Admin State Size (GB) Used (%) Used (%) Remaining (GB)
> Blocksredhat010 2In Service 113.434.89 93.981053 redhat011 0In
> Service115.26
> 5.34 97.09 1273 redhat012 1In Service115.265.36 96.95 1162 redhat013 0In
> Service115.264.91 97.69 1291 redhat014 1In Service115.265.48 96.99 1361
> redhat015 2In Service115.265.39 97.12 1291 suse010 1In Service124.625.13
> 113.88 1335
>
> 2) Hbase cluster, the filesize is changed to 64mb and also flushsize to
> 16mb
> (because of small data that is written frequently):
> Master Attributes Attribute NameValueDescription HBase Version0.18.0,
> r697626HBase version and svn revision HBase CompiledSun Sep 21 16:00:50 PDT
> 2008, stackWhen HBase version was compiled and by whom Hadoop
> Version0.18.0,
> r686010Hadoop version and svn revision Hadoop CompiledThu Aug 14 19:48:33
> UTC 2008, hadoopqaWhen Hadoop version was compiled and by whom Filesystem
> hdfs://REDHAT010:9000/hbaseFilesystem HBase is running on HBase Root
> Directoryhdfs://REDHAT010:9000/hbaseLocation of HBase home directory Load
> average43.0Average load across all region servers. Naive computation.
> Catalog
> Tables TableDescription -ROOT-The -ROOT- table holds references to all
> .META. regions. .META.The .META. table holds references to all User Table
> regions User Tables
>
> 1 table(s) in set.
>  TableDescription BizDB {NAME => 'BizDB', IS_ROOT => 'false', IS_META =>
> 'false', FAMILIES => [{NAME => 'BusinessObject', BLOOMFILTER => 'false',
> VERSIONS => '3', COMPRESSION => 'NONE', LENGTH => '2147483647', TTL =>
> '-1',
> IN_MEMORY => 'false', BLOCKCACHE => 'false'}]} Region Servers AddressStart
> CodeLoad redhat011:600201223827465065requests: 0 regions: 43
> redhat012:60020
> 1223827465975requests: 0 regions: 43 redhat013:600201223827465712requests:
> 0
> regions: 43 redhat014:600201223827465249requests: 0 regions: 43
> redhat015.:600201223827465108requests: 0 regions: 43 suse010:60020
> 1223813153133requests: 0 regions: 43 Total: servers: 6 requests: 0 regions:
> 258
>
> 3) Uploader is a simple java program that user BatchUpdate to upload.
> 4) Descriptors are not upped.
> 5) Logs from region server, i found number of exception on the same region
> server:
> 2008-10-08 07:41:58,246 WARN org.apache.hadoop.dfs.DFSClient: Exception
> while reading from blk_2538465098022552520_15050 of
> /hbase/BizDB/486345958/BusinessObject/mapfiles/8802744696946937845/data
> from
> 10.26.237.141:50010: java.io.IOException: Premeture EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
> at
> org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:996)
> at
>
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:858)
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1384)
> at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1420)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at
>
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1830)
> at org.apache.hadoop.io.MapFile$Reader.seekInternal(MapFile.java:463)
> at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:558)
> at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:541)
> at
>
> org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.getClosest(HStoreFile.java:761)
> at
>
> org.apache.hadoop.hbase.regionserver.HStore.getFullFromMapFile(HStore.java:1179)
> at org.apache.hadoop.hbase.regionserver.HStore.getFull(HStore.java:1160)
> at org.apache.hadoop.hbase.regionserver.HRegion.getFull(HRegion.java:1221)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRow(HRegionServer.java:1036)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>
> Another exception:
>
> 2008-10-08 08:19:22,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region
> BizDB,1.1.PerfBO1.5eaecb0b-235f-4d62-bae3-f8e42a3f65ee,1223410715671
> 2008-10-08 08:19:22,285 INFO org.apache.hadoop.hbase.regionserver.HLog: New
> log writer created at
> /hbase/log_10.26.237.141_1223394485409_60020/hlog.dat.1223446762266
> 2008-10-08 08:19:22,370 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:22,370 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-2877152584708860910_17060
> 2008-10-08 08:19:22,427 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:22,427 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_8480966852058311110_17062
> 2008-10-08 08:19:22,822 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:22,822 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_1836763064916871218_17062
> 2008-10-08 08:19:28,402 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.26.237.138:50010
> 2008-10-08 08:19:28,403 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-6294347938555137047_17063
> 2008-10-08 08:19:28,432 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.26.237.137:50010
> 2008-10-08 08:19:28,432 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_5692207445386295686_17063
> 2008-10-08 08:19:28,828 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.26.237.139:50010
> 2008-10-08 08:19:28,828 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-5426084204552912284_17063
> 2008-10-08 08:19:34,439 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.26.237.139:50010
> 2008-10-08 08:19:34,440 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-3084067451611865531_17065
> 2008-10-08 08:19:34,941 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.26.237.140:50010
> 2008-10-08 08:19:34,941 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_8531979798217012059_17068
> 2008-10-08 08:19:40,444 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:40,445 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-690757568573941572_17074
> 2008-10-08 08:19:40,946 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:40,946 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-3282075547420544604_17074
> 2008-10-08 08:19:46,447 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)
>
> 2008-10-08 08:19:46,447 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery
> for block blk_-690757568573941572_17074 bad datanode[0]
> 2008-10-08 08:19:46,459 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
> failed for region
> BizDB,1.1.PerfBO1.5eaecb0b-235f-4d62-bae3-f8e42a3f65ee,1223410715671
> java.io.IOException: Could not get block locations. Aborting...
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
> 2008-10-08 08:19:46,461 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> starting compaction on region
> BizDB,1.1.PerfBO1.5c9d1b38-bb27-4693-9466-ded9b4e8c59e,1223412524168
> 2008-10-08 08:19:46,564 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor
> 2008-10-08 08:19:46,569 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:46,569 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-2167445393797967261_17083
> 2008-10-08 08:19:46,951 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:46,951 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_5449441848613806871_17083
> 2008-10-08 08:19:52,573 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:52,574 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-7625939221720637541_17092
> 2008-10-08 08:19:52,955 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Could not read from stream
> 2008-10-08 08:19:52,955 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_-1769229717555876257_17092
> 2008-10-08 08:19:58,957 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
> at
>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)
>
> 2008-10-08 08:19:58,958 WARN org.apache.hadoop.dfs.DFSClient: Error
> Recovery
> for block blk_-1769229717555876257_17092 bad datanode[0]
> 2008-10-08 08:19:58,958 FATAL org.apache.hadoop.hbase.regionserver.Flusher:
> Replay of hlog required. Forcing server restart
>
> And another one:
>
> 2008-10-07 22:50:57,896 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting split of region
> BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
> 2008-10-07 22:50:58,163 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> closed BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
> 2008-10-07 22:50:58,336 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 60020, call batchUpdate([B@154c8c3c, row =>
> 1.1.PerfBO1.109900e7-af7b-4bf4-b682-50a46760701c, {column =>
> BusinessObject:s2, value => '...', column => BusinessObject:s1, value =>
> '...', column => BusinessObject:@@identifier@@, value => '...'}, -1) from
> 10.26.237.185:37696: error:
> org.apache.hadoop.hbase.NotServingRegionException: Region
> BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818 closed
> org.apache.hadoop.hbase.NotServingRegionException: Region
> BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818 closed
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1810)
> at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1875)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1406)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1380)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdate(HRegionServer.java:1109)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> 2008-10-07 22:50:58,951 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region
>
> BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223412657926/1465662157
> available
> 2008-10-07 22:50:58,952 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> closed BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223412657926
>
>
> Best Regards.
>

Re: region server problem

Posted by Slava Gorelik <sl...@gmail.com>.

In this thread i was asked to provide some information about my hbase
cluster and some logs. So :
1) Hadoop cluster is :

Cluster Summary
 * * * 4961 files and directories, 2922 blocks = 7883 total. Heap Size is
10.38 MB / 888.94 MB (1%)
*   Capacity : 814.33 GB DFS Remaining : 693.71 GB  DFS Used : 42.46 GB DFS
Used% : 5.21 %

 Live Datanodes : 7

  Node Last Contact  Admin State Size (GB) Used (%) Used (%) Remaining (GB)
Blocksredhat010 2In Service 113.434.89 93.981053 redhat011 0In Service115.26
5.34 97.09 1273 redhat012 1In Service115.265.36 96.95 1162 redhat013 0In
Service115.264.91 97.69 1291 redhat014 1In Service115.265.48 96.99 1361
redhat015 2In Service115.265.39 97.12 1291 suse010 1In Service124.625.13
113.88 1335

2) Hbase cluster, the filesize is changed to 64mb and also flushsize to 16mb
(because of small data that is written frequently):
Master Attributes Attribute NameValueDescription HBase Version0.18.0,
r697626HBase version and svn revision HBase CompiledSun Sep 21 16:00:50 PDT
2008, stackWhen HBase version was compiled and by whom Hadoop Version0.18.0,
r686010Hadoop version and svn revision Hadoop CompiledThu Aug 14 19:48:33
UTC 2008, hadoopqaWhen Hadoop version was compiled and by whom Filesystem
hdfs://REDHAT010:9000/hbaseFilesystem HBase is running on HBase Root
Directoryhdfs://REDHAT010:9000/hbaseLocation of HBase home directory Load
average43.0Average load across all region servers. Naive computation. Catalog
Tables TableDescription -ROOT-The -ROOT- table holds references to all
.META. regions. .META.The .META. table holds references to all User Table
regions User Tables

1 table(s) in set.
 TableDescription BizDB {NAME => 'BizDB', IS_ROOT => 'false', IS_META =>
'false', FAMILIES => [{NAME => 'BusinessObject', BLOOMFILTER => 'false',
VERSIONS => '3', COMPRESSION => 'NONE', LENGTH => '2147483647', TTL => '-1',
IN_MEMORY => 'false', BLOCKCACHE => 'false'}]} Region Servers AddressStart
CodeLoad redhat011:600201223827465065requests: 0 regions: 43 redhat012:60020
1223827465975requests: 0 regions: 43 redhat013:600201223827465712requests: 0
regions: 43 redhat014:600201223827465249requests: 0 regions: 43
redhat015.:600201223827465108requests: 0 regions: 43 suse010:60020
1223813153133requests: 0 regions: 43 Total: servers: 6 requests: 0 regions:
258

3) Uploader is a simple java program that user BatchUpdate to upload.
4) Descriptors are not upped.
5) Logs from region server, i found number of exception on the same region
server:
2008-10-08 07:41:58,246 WARN org.apache.hadoop.dfs.DFSClient: Exception
while reading from blk_2538465098022552520_15050 of
/hbase/BizDB/486345958/BusinessObject/mapfiles/8802744696946937845/data from
10.26.237.141:50010: java.io.IOException: Premeture EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:996)
at
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:858)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1384)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1420)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1830)
at org.apache.hadoop.io.MapFile$Reader.seekInternal(MapFile.java:463)
at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:558)
at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:541)
at
org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.getClosest(HStoreFile.java:761)
at
org.apache.hadoop.hbase.regionserver.HStore.getFullFromMapFile(HStore.java:1179)
at org.apache.hadoop.hbase.regionserver.HStore.getFull(HStore.java:1160)
at org.apache.hadoop.hbase.regionserver.HRegion.getFull(HRegion.java:1221)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRow(HRegionServer.java:1036)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

Another exception:

2008-10-08 08:19:22,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region
BizDB,1.1.PerfBO1.5eaecb0b-235f-4d62-bae3-f8e42a3f65ee,1223410715671
2008-10-08 08:19:22,285 INFO org.apache.hadoop.hbase.regionserver.HLog: New
log writer created at
/hbase/log_10.26.237.141_1223394485409_60020/hlog.dat.1223446762266
2008-10-08 08:19:22,370 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:22,370 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-2877152584708860910_17060
2008-10-08 08:19:22,427 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:22,427 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_8480966852058311110_17062
2008-10-08 08:19:22,822 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:22,822 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_1836763064916871218_17062
2008-10-08 08:19:28,402 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 10.26.237.138:50010
2008-10-08 08:19:28,403 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-6294347938555137047_17063
2008-10-08 08:19:28,432 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 10.26.237.137:50010
2008-10-08 08:19:28,432 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_5692207445386295686_17063
2008-10-08 08:19:28,828 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 10.26.237.139:50010
2008-10-08 08:19:28,828 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-5426084204552912284_17063
2008-10-08 08:19:34,439 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 10.26.237.139:50010
2008-10-08 08:19:34,440 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-3084067451611865531_17065
2008-10-08 08:19:34,941 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 10.26.237.140:50010
2008-10-08 08:19:34,941 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_8531979798217012059_17068
2008-10-08 08:19:40,444 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:40,445 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-690757568573941572_17074
2008-10-08 08:19:40,946 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:40,946 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-3282075547420544604_17074
2008-10-08 08:19:46,447 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer
Exception: java.io.IOException: Unable to create new block.
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)

2008-10-08 08:19:46,447 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery
for block blk_-690757568573941572_17074 bad datanode[0]
2008-10-08 08:19:46,459 ERROR
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/Split
failed for region
BizDB,1.1.PerfBO1.5eaecb0b-235f-4d62-bae3-f8e42a3f65ee,1223410715671
java.io.IOException: Could not get block locations. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
2008-10-08 08:19:46,461 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region
BizDB,1.1.PerfBO1.5c9d1b38-bb27-4693-9466-ded9b4e8c59e,1223412524168
2008-10-08 08:19:46,564 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2008-10-08 08:19:46,569 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:46,569 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-2167445393797967261_17083
2008-10-08 08:19:46,951 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:46,951 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_5449441848613806871_17083
2008-10-08 08:19:52,573 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:52,574 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-7625939221720637541_17092
2008-10-08 08:19:52,955 INFO org.apache.hadoop.dfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Could not read from stream
2008-10-08 08:19:52,955 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
block blk_-1769229717555876257_17092
2008-10-08 08:19:58,957 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer
Exception: java.io.IOException: Unable to create new block.
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)

2008-10-08 08:19:58,958 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery
for block blk_-1769229717555876257_17092 bad datanode[0]
2008-10-08 08:19:58,958 FATAL org.apache.hadoop.hbase.regionserver.Flusher:
Replay of hlog required. Forcing server restart

And another one:

2008-10-07 22:50:57,896 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting split of region
BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
2008-10-07 22:50:58,163 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818
2008-10-07 22:50:58,336 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 60020, call batchUpdate([B@154c8c3c, row =>
1.1.PerfBO1.109900e7-af7b-4bf4-b682-50a46760701c, {column =>
BusinessObject:s2, value => '...', column => BusinessObject:s1, value =>
'...', column => BusinessObject:@@identifier@@, value => '...'}, -1) from
10.26.237.185:37696: error:
org.apache.hadoop.hbase.NotServingRegionException: Region
BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818 closed
org.apache.hadoop.hbase.NotServingRegionException: Region
BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223403629818 closed
at
org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1810)
at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1875)
at
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1406)
at
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1380)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdate(HRegionServer.java:1109)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-10-07 22:50:58,951 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region
BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223412657926/1465662157
available
2008-10-07 22:50:58,952 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed BizDB,1.1.PerfBO1.103c5752-efcd-4510-85eb-d491d5ca1fa9,1223412657926


Best Regards.

Re: region server problem

Posted by Andrew Purtell <ap...@yahoo.com>.

I've seen "No live nodes contain current block" from DFS as a
symptom of what looks like (at a minimum) a race during
compaction when DFS is coming apart under load. This is my
hypothesis based on log examination at the time: Certain
mapfile data and/or index files are apparently deleted before
they should be. The namenode instructs the data nodes to
delete all block replicas associated with the file, yet
somewhere a region server still thinks it still has a lease on
one or more of those blocks, yet all block replicas are
deleted... and that's that. The region server goes down, and
the region is toast. 

I was in firefighting mode so stamped out the problem in our
deployment by adding additional nodes to spread load, and by
increasing CPU resources available to the DFS name node. So
unfortunately I did not have the ability/time to dive deep on
an analysis of this. 

Probably there are some actions that HBase should not attempt
when DFS is severely loaded. For example, compaction, or 
optional flushes. I don't know much about the namenode
protocol. Is there a way to get load estimates? 

   - Andy

> From: stack <st...@duboce.net>
> Subject: Re: region server problem
> To: hbase-user@hadoop.apache.org
> Date: Wednesday, October 8, 2008, 2:29 PM
> You should update to 0.2.1 if you can.  Make sure you've
> upped your file descriptors too:  See
> http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also 
> see how to enable DEBUG in same FAQ.
> 
> Something odd is up when you see messages like this out of
> HDFS: ': No live nodes contain current block*'.  Thats lost
> data.
> 
> Or messages like this, 'compaction completed on region 
> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec'
> -- i.e. that compactions are taking so long -- would seem to
> indicate your machines are severly overloaded or underpowered
> or both.  Can you study load when the upload is running on
> these machines?  Perhaps try throttling back to see if hbase
> survives longer?
> 
> The regionserver will output thread dump in its RPC layer
> if critical error -- OOME -- or its been hung up for a long
> time IIRC.
> 
> Check the '.out' logs too for you hbase install to see if
> they contain any errors.  Grep the datanode logs too for OOME
> or "too many open file handles".
> 
> St.Ack
> 
> Rui Xing wrote:
> > Hi All,
> >
> > 1). We are doing performance testing on hbase. The
> environment of the
> > testing is 3 data nodes, and 1 name node distributed
> on 4 machines. We
> > started one region server on each data node
> respectively. To insert the
> > data, one insertion client is started on each data
> node machine. But as the
> > data inserted, the region servers crashed one by one.
> One of the reasons is
> > listed as follows:
> >
> > *==>
> > 2008-10-07 14:47:01,519 WARN
> org.apache.hadoop.dfs.DFSClient: Exception
> > while reading from blk_-806310822584979460 of
> >
> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data
> from
> > 10.2.6.102:50010: java.io.IOException: Premeture EOF
> from inputStream*
> >
> > ... ...
> >
> > *2008-10-07 14:47:01,521 INFO
> org.apache.hadoop.dfs.DFSClient: Could not
> > obtain block blk_-806310822584979460 from any node: 
> java.io.IOExceptionYou
> > 2008-10-07 14:52:25,229 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > compaction completed on region
> search1,r3_1_3_c157476,1223360357528 in
> > 18mins, 39sec
> > 2008-10-07 14:52:25,238 INFO
> >
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> > regionserver/0.0.0.0:60020.compactor exiting
> > 2008-10-07 14:52:25,284 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > closed search1,r3_1_3_c157476,1223360357528
> > 2008-10-07 14:52:25,291 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > closed -ROOT-,,0
> > 2008-10-07 14:52:25,291 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> aborting server at:
> > 10.2.6.104:60020
> > 2008-10-07 14:52:25,291 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> regionserver/
> > 0.0.0.0:60020 exiting
> > 2008-10-07 14:52:25,511 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> Starting shutdown
> > thread.
> > 2008-10-07 14:52:25,511 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> Shutdown thread complete
> > ===<
> >
> > 2). Another question is, under what circunstance will
> the region server
> > print logs of the thread information as below? It
> appears among the normal
> > log records.
> > ===>
> > 35 active threads
> > Thread 1281 (IPC Client connection to
> d3v1.corp.alimama.com/10.2.6.101:54310
> > ):
> >   State: RUNNABLE
> >   Blocked count: 0
> >   Waited count: 0
> >   Stack:
> >     java.util.Hashtable.remove(Hashtable.java:435)
> >    
> org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
> > ... ...
> > ===<
> >
> > We use hadoop 0.17.1 and hbase 0.2.0. It would be
> greatly appreciated if any
> > clues can be dropped.
> >
> > Regards,
> > -Ray
> >
> >

Re: region server problem

Posted by stack <st...@duboce.net>.

You should update to 0.2.1 if you can.  Make sure you've upped your file 
descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also 
see how to enable DEBUG in same FAQ.

Something odd is up when you see messages like this out of HDFS: ': No 
live nodes contain current block*'.  Thats lost data.

Or messages like this, 'compaction completed on region 
search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that 
compactions are taking so long -- would seem to indicate your machines 
are severly overloaded or underpowered or both.  Can you study load when 
the upload is running on these machines?  Perhaps try  throttling back 
to see if hbase survives longer?

The regionserver will output thread dump in its RPC layer if critical 
error -- OOME -- or its been hung up for a long time IIRC.

Check the '.out' logs too for you hbase install to see if they contain 
any errors.  Grep the datanode logs too for OOME or "too many open file 
handles".

St.Ack

Rui Xing wrote:
> Hi All,
>
> 1). We are doing performance testing on hbase. The environment of the
> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
> started one region server on each data node respectively. To insert the
> data, one insertion client is started on each data node machine. But as the
> data inserted, the region servers crashed one by one. One of the reasons is
> listed as follows:
>
> *==>
> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
> while reading from blk_-806310822584979460 of
> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>
> ... ...
>
> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
> obtain block blk_-806310822584979460 from any node:  java.io.IOExceptionYou
> 2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
> 18mins, 39sec
> 2008-10-07 14:52:25,238 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver/0.0.0.0:60020.compactor exiting
> 2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> closed search1,r3_1_3_c157476,1223360357528
> 2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> closed -ROOT-,,0
> 2008-10-07 14:52:25,291 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
> 10.2.6.104:60020
> 2008-10-07 14:52:25,291 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
> 0.0.0.0:60020 exiting
> 2008-10-07 14:52:25,511 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> thread.
> 2008-10-07 14:52:25,511 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
> ===<
>
> 2). Another question is, under what circunstance will the region server
> print logs of the thread information as below? It appears among the normal
> log records.
> ===>
> 35 active threads
> Thread 1281 (IPC Client connection to d3v1.corp.alimama.com/10.2.6.101:54310
> ):
>   State: RUNNABLE
>   Blocked count: 0
>   Waited count: 0
>   Stack:
>     java.util.Hashtable.remove(Hashtable.java:435)
>     org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
> ... ...
> ===<
>
> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if any
> clues can be dropped.
>
> Regards,
> -Ray
>
>