You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by stack <st...@duboce.net> on 2008/11/20 19:26:43 UTC

Re: Full table scan fails during map

David Alves wrote:
> Hi guys
>
>     We've got HBase(0.18.0, r695089) and Hadoop(0.18.0, r686010) 
> running for a while, and apart from the ocasional regionserver 
> stopping without notice (and whithout explanations from what we can 
> see in the logs), problem that we solve easily just by restarting it, 
> we now have come to face a more serious problem of what I think is 
> data loss.

What you think it is David?  A hang?  We've seen occasional hangups on 
HDFS.   You could try threaddumping and see if you can figure where 
things are blocked (Can do it in UI on problematic regionserver or by 
sending QUIT to the JVM PID).

>     We use Hbase as a links and documents database (similar to nutch) 
> in a 3 node cluster (4GB Mem on each node), the links database has a 4 
> regions and the document database now has 200 regions for a total of 
> 216 (with meta and root).

How much RAM allocated to HBase?  Each database has a single family or more?

>     After the crawl task, which went ok, (we now have 60GB/300GB full 
> in hdfs) we proceed to do a full table scan to create the indexes and 
> thats where things started to fail.
>     We are seing a problem in the logs (at the end of this email). 
> This repeats untils theres a retriesexausted exception and the task 
> fails in the map phase. Hadoop fsk tool tells us that hdfs is ok. I'm 
> still to explore the rest of the logs searching for some kind of error 
> I will post a new mail if I find anything.
>
>     Any help would be greatly appreciated.

Is this file in your HDFS: 
hdfs://cyclops-prod-1:9000/hbase/document/153945136/docDatum/mapfiles/5163556575658593611/data?  
If so, can you fetch it using ./bin/hadoop fs -get FILENAME?

What crawler are you using (out of interest).
St.Ack

Re: Full table scan fails during map

Posted by David Alves <da...@student.dei.uc.pt>.

Hi Stack

	Just for you to know you can spell :) I checked some other mails on  
the hadoop mailing list and its relly spelled like that.

regards
David
On Nov 20, 2008, at 7:08 PM, stack wrote:

> David Alves wrote:
>> Hi stack
>>
>>    Regarding the missing block part, my bad, i ddn't install this  
>> cluster and didn't verify the ulimit first, then a new problem  
>> arose because of the number of xceivers. Regarding this property, I  
>> can't find the default value in hadoop-default.xml is this normal?
> I think you'll find that not all configuration gets a mention in  
> hadoop-default.xml.  If I were to guess, the developer is thinking  
> that "...no one will ever want to change this config. but just in  
> case they ever do and are dogged enough to read source....."  HBase  
> has a few of these buried in source code.
>
>> I ask only because sometime ago you answered to someone to increase  
>> the dfs.datanode.max.xcievers proeprty but actually the exceptions  
>> refer to an *Xceiver (notice the change in the i,e order).
>
> Oh.  Bad.  I can't spell (nor practise what the mnemonic says).
>
>> Anyway the problem seemed to go away so probably that fixed it.
>>    Regarding the hang part the regionserver actually comes down (no  
>> more jvm process), next time it happens I'll investigate further.
> Please do.  Check the .out file if you haven't in the past.
>
> Good stuff,
> St.Ack

Re: Full table scan fails during map

Posted by stack <st...@duboce.net>.

David Alves wrote:
> Hi stack
>
>     Regarding the missing block part, my bad, i ddn't install this 
> cluster and didn't verify the ulimit first, then a new problem arose 
> because of the number of xceivers. Regarding this property, I can't 
> find the default value in hadoop-default.xml is this normal? 
I think you'll find that not all configuration gets a mention in 
hadoop-default.xml.  If I were to guess, the developer is thinking that 
"...no one will ever want to change this config. but just in case they 
ever do and are dogged enough to read source....."  HBase has a few of 
these buried in source code.

> I ask only because sometime ago you answered to someone to increase 
> the dfs.datanode.max.xcievers proeprty but actually the exceptions 
> refer to an *Xceiver (notice the change in the i,e order).

Oh.  Bad.  I can't spell (nor practise what the mnemonic says).

> Anyway the problem seemed to go away so probably that fixed it.
>     Regarding the hang part the regionserver actually comes down (no 
> more jvm process), next time it happens I'll investigate further.
Please do.  Check the .out file if you haven't in the past.

Good stuff,
St.Ack

Re: Full table scan fails during map

Posted by David Alves <da...@student.dei.uc.pt>.

Hi stack

	Regarding the missing block part, my bad, i ddn't install this  
cluster and didn't verify the ulimit first, then a new problem arose  
because of the number of xceivers. Regarding this property, I can't  
find the default value in hadoop-default.xml is this normal? I ask  
only because sometime ago you answered to someone to increase the  
dfs.datanode.max.xcievers proeprty but actually the exceptions refer  
to an *Xceiver (notice the change in the i,e order). Anyway the  
problem seemed to go away so probably that fixed it.
	Regarding the hang part the regionserver actually comes down (no more  
jvm process), next time it happens I'll investigate further.
	Btw Hbase has 2048 MB allocated, we have lots of columns but only  
three CFs, and we use a home grown crawler because we are crawling SVN  
and CVS and other exotic file systems.

Regards, aand thanks for you prompt help, again :)
David Alves


On Nov 20, 2008, at 6:26 PM, stack wrote:

> David Alves wrote:
>> Hi guys
>>
>>    We've got HBase(0.18.0, r695089) and Hadoop(0.18.0, r686010)  
>> running for a while, and apart from the ocasional regionserver  
>> stopping without notice (and whithout explanations from what we can  
>> see in the logs), problem that we solve easily just by restarting  
>> it, we now have come to face a more serious problem of what I think  
>> is data loss.
>
> What you think it is David?  A hang?  We've seen occasional hangups  
> on HDFS.   You could try threaddumping and see if you can figure  
> where things are blocked (Can do it in UI on problematic  
> regionserver or by sending QUIT to the JVM PID).
>
>
>>    We use Hbase as a links and documents database (similar to  
>> nutch) in a 3 node cluster (4GB Mem on each node), the links  
>> database has a 4 regions and the document database now has 200  
>> regions for a total of 216 (with meta and root).
>
> How much RAM allocated to HBase?  Each database has a single family  
> or more?
>
>>    After the crawl task, which went ok, (we now have 60GB/300GB  
>> full in hdfs) we proceed to do a full table scan to create the  
>> indexes and thats where things started to fail.
>>    We are seing a problem in the logs (at the end of this email).  
>> This repeats untils theres a retriesexausted exception and the task  
>> fails in the map phase. Hadoop fsk tool tells us that hdfs is ok.  
>> I'm still to explore the rest of the logs searching for some kind  
>> of error I will post a new mail if I find anything.
>>
>>    Any help would be greatly appreciated.
>
> Is this file in your HDFS: hdfs://cyclops-prod-1:9000/hbase/document/ 
> 153945136/docDatum/mapfiles/5163556575658593611/data?  If so, can  
> you fetch it using ./bin/hadoop fs -get FILENAME?
>
> What crawler are you using (out of interest).
> St.Ack