You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Daniel Leffel <da...@gmail.com> on 2008/05/05 20:51:53 UTC

Should I pass on HBase for this project? (for now)

Hi All (and St.Ack),
I've spent the last few weeks figuring out how to use HBase for my project.
HBase at it's surface has seemed like the dream solution for this project
and had me very excited from the beginning.

However, from the moment I've begun to implement the project, I've had a
frustrating go at it. I've spent weeks just simply trying to construct the
environment under which my application will need to run. I've sent countless
messages to this group (and thank you all so much for answering so many of
them, especially St.Ack).

At this point, I can't seem to tell which one(s) of the following is true:

   - Maybe I'm just a freaking idiot
   - Maybe HBase is just not equipped to do what I want it to do
   - Maybe HBase is just still too unstable and it will do what I need it to
   do at some point in the future
   - Maybe I have the wrong expectations for the amount of hardware I need
   to throw at the situation.

I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them
running MapRed). I'm running HBase 0.1.2 (most recent release candidate)
with the master running on the same box as namenode and 3 region servers
(running on the same MapRed boxes).

My first and very simple task is to load a sparce table with 220 million
rows. The average row has 2 columns or so (very low byte count per row). I
have attempted to do this with a simple MapReduce job. In the Map phase, I'm
simply parsing through a text file and using the standard TableReduce to
load the table.

I've attempted to do this with various numbers of reduce tasks and various
configurations of which machines run each dameon.

The end result is always the same. At some point, Regionservers go offline -
the most recent behavior is that region servers just quit responding and
logs set to debug give no useful information. If I had to guess, this was
typical deadlock behavior.

A simple table scan (just so I can find out how rows were successfully
inserted before all the region servers died) usually causes the same
behavior (one by one, region servers just die - even with no MapRed jobs
running).

At this point, I'm at a crossroads and beginning to think that I will need
to leave HBase behind because I can't spend another week with no progress on
this project.

So, I ask the question(s) I posed in the beginning.

   - Maybe I'm just a freaking idiot
   - Maybe HBase is just not equipped to do what I want it to do
   - Maybe HBase is just still too unstable and it will do what I need it to
   do at some point in the future
   - Maybe I have the wrong expectations for the amount of hardware I need
   to throw at the situation.

Can someone please point me in the right direction?

Danny

Re: Should I pass on HBase for this project? (for now)

Posted by stack <st...@duboce.net>.
Daniel Leffel wrote:
> At this point, I'm at a crossroads and beginning to think that I will need
> to leave HBase behind because I can't spend another week with no progress on
> this project.
>
> So, I ask the question(s) I posed in the beginning.
>   
>    - Maybe HBase is just not equipped to do what I want it to do
>   
220M might be a bit much for 3 servers.   At least, I do not know of any 
one who has loaded these  numbers up into a 3-node cluster.

>    - Maybe HBase is just still too unstable and it will do what I need it to
>    do at some point in the future
>   

HBase is green for sure but stability is our immediate, primary target. 
  We'll drop all to fix correctness and stability bugs in the 0.1 branch.

Unfortunately, you have tripped over a couple of our ugly issues of 
late.  We're trying to fix them as fast as they turn up.  The last time 
we chatted, HBASE-478 was in your way.  We hope to put up a new 0.1.2 
candidate release this evening or so with a fix for it.

>   - Maybe I'm just a freaking idiot


This I cannot help you with but from correspondence so far, I'd guess 
you are not (smile).

> Can someone please point me in the right direction?
>   

Any chance of our working closer together?  Can you pass us full logs at 
DEBUG from one of your upload runs?  We'd like to help you (and 
ourselves) by figuring whats failing.  Can you hang on the hbase IRC 
channel so we can give you faster turnaround?

Out of interest, have you upped the filehandle ulimit on your machines 
above the default 1024 (See FAQ for more on this)?

I understand if you cannot devote more time to hbase, especially when 
its been frustrating up to this, but we need fellas like you with your 
problems if we're going to make hbase better.   Let us help you out.

Thanks Daniel,
St.Ack



Re: Should I pass on HBase for this project? (for now)

Posted by Daniel Leffel <da...@gmail.com>.
Hi,
Planning on being there (short of any fires).

Things going relatively well. I've learned quite a bit about massaging the
setup during high load (write-intensive) jobs. Everything seems to be
working well now and HBase is going to be a piviotal part of my project.

See you tomorrow!

Danny


On Mon, May 19, 2008 at 2:17 PM, stack <st...@duboce.net> wrote:

> Hey Daniel.  How are things going over there?  You coming to the user group
> meeting tomorrow evening?
> St.Ack
>
>
> Daniel Leffel wrote:
>
>> Hi All (and St.Ack),
>> I've spent the last few weeks figuring out how to use HBase for my
>> project.
>> HBase at it's surface has seemed like the dream solution for this project
>> and had me very excited from the beginning.
>>
>> However, from the moment I've begun to implement the project, I've had a
>> frustrating go at it. I've spent weeks just simply trying to construct the
>> environment under which my application will need to run. I've sent
>> countless
>> messages to this group (and thank you all so much for answering so many of
>> them, especially St.Ack).
>>
>> At this point, I can't seem to tell which one(s) of the following is true:
>>
>>   - Maybe I'm just a freaking idiot
>>   - Maybe HBase is just not equipped to do what I want it to do
>>   - Maybe HBase is just still too unstable and it will do what I need it
>> to
>>   do at some point in the future
>>   - Maybe I have the wrong expectations for the amount of hardware I need
>>   to throw at the situation.
>>
>> I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them
>> running MapRed). I'm running HBase 0.1.2 (most recent release candidate)
>> with the master running on the same box as namenode and 3 region servers
>> (running on the same MapRed boxes).
>>
>> My first and very simple task is to load a sparce table with 220 million
>> rows. The average row has 2 columns or so (very low byte count per row). I
>> have attempted to do this with a simple MapReduce job. In the Map phase,
>> I'm
>> simply parsing through a text file and using the standard TableReduce to
>> load the table.
>>
>> I've attempted to do this with various numbers of reduce tasks and various
>> configurations of which machines run each dameon.
>>
>> The end result is always the same. At some point, Regionservers go offline
>> -
>> the most recent behavior is that region servers just quit responding and
>> logs set to debug give no useful information. If I had to guess, this was
>> typical deadlock behavior.
>>
>> A simple table scan (just so I can find out how rows were successfully
>> inserted before all the region servers died) usually causes the same
>> behavior (one by one, region servers just die - even with no MapRed jobs
>> running).
>>
>> At this point, I'm at a crossroads and beginning to think that I will need
>> to leave HBase behind because I can't spend another week with no progress
>> on
>> this project.
>>
>> So, I ask the question(s) I posed in the beginning.
>>
>>   - Maybe I'm just a freaking idiot
>>   - Maybe HBase is just not equipped to do what I want it to do
>>   - Maybe HBase is just still too unstable and it will do what I need it
>> to
>>   do at some point in the future
>>   - Maybe I have the wrong expectations for the amount of hardware I need
>>   to throw at the situation.
>>
>> Can someone please point me in the right direction?
>>
>> Danny
>>
>>
>>
>
>

Re: Should I pass on HBase for this project? (for now)

Posted by stack <st...@duboce.net>.
Hey Daniel.  How are things going over there?  You coming to the user 
group meeting tomorrow evening?
St.Ack

Daniel Leffel wrote:
> Hi All (and St.Ack),
> I've spent the last few weeks figuring out how to use HBase for my project.
> HBase at it's surface has seemed like the dream solution for this project
> and had me very excited from the beginning.
>
> However, from the moment I've begun to implement the project, I've had a
> frustrating go at it. I've spent weeks just simply trying to construct the
> environment under which my application will need to run. I've sent countless
> messages to this group (and thank you all so much for answering so many of
> them, especially St.Ack).
>
> At this point, I can't seem to tell which one(s) of the following is true:
>
>    - Maybe I'm just a freaking idiot
>    - Maybe HBase is just not equipped to do what I want it to do
>    - Maybe HBase is just still too unstable and it will do what I need it to
>    do at some point in the future
>    - Maybe I have the wrong expectations for the amount of hardware I need
>    to throw at the situation.
>
> I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them
> running MapRed). I'm running HBase 0.1.2 (most recent release candidate)
> with the master running on the same box as namenode and 3 region servers
> (running on the same MapRed boxes).
>
> My first and very simple task is to load a sparce table with 220 million
> rows. The average row has 2 columns or so (very low byte count per row). I
> have attempted to do this with a simple MapReduce job. In the Map phase, I'm
> simply parsing through a text file and using the standard TableReduce to
> load the table.
>
> I've attempted to do this with various numbers of reduce tasks and various
> configurations of which machines run each dameon.
>
> The end result is always the same. At some point, Regionservers go offline -
> the most recent behavior is that region servers just quit responding and
> logs set to debug give no useful information. If I had to guess, this was
> typical deadlock behavior.
>
> A simple table scan (just so I can find out how rows were successfully
> inserted before all the region servers died) usually causes the same
> behavior (one by one, region servers just die - even with no MapRed jobs
> running).
>
> At this point, I'm at a crossroads and beginning to think that I will need
> to leave HBase behind because I can't spend another week with no progress on
> this project.
>
> So, I ask the question(s) I posed in the beginning.
>
>    - Maybe I'm just a freaking idiot
>    - Maybe HBase is just not equipped to do what I want it to do
>    - Maybe HBase is just still too unstable and it will do what I need it to
>    do at some point in the future
>    - Maybe I have the wrong expectations for the amount of hardware I need
>    to throw at the situation.
>
> Can someone please point me in the right direction?
>
> Danny
>
>