You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Naama Kraus <na...@gmail.com> on 2008/05/13 19:13:08 UTC

Re: Blog post about when to use HBase

Hi,

Can anyone say some words on when to use HBase as opposed to using Plain
MapReduce on input files ?
In more details, when will it make sense to put data into HBase and then use
HBase methods to access it, including running MapReduce on the data in the
tables. As opposed to simply putting the data into HDFS and processing it
with MapReduce.

Thanks, Naama

On Wed, Mar 12, 2008 at 12:15 AM, Bryan Duxbury <br...@rapleaf.com> wrote:

> I've written up a blog post discussing when I think it's appropriate to
> use HBase in response to some of the questions people usually ask. You can
> find it at http://blog.rapleaf.com/dev/?p=26.
>
> -Bryan
>

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Blog post about when to use HBase

Posted by Naama Kraus <na...@gmail.com>.

Yes, very much.
Thank you Bryan.

Naama

On Tue, May 13, 2008 at 8:20 PM, Bryan Duxbury <br...@rapleaf.com> wrote:

> I think that the determining factor of when you should use HBase instead
> of HDFS files is really the consumption pattern. If you're only ever going
> to process the data in bulk, then chances are you'll get the most
> performance out of a raw HDFS file. However, if you need to have random
> access to some of the entries, then HBase will give you significant benefit.
>
> There are other factors that go into this decision. One that I can think
> of off the top of my head is if you'd like to take advantage of the
> versioning and semi-defined schema of HBase for your dataset. It would be a
> little complicated to duplicate all of that logic on your own from a flat
> file.
>
> Another factor is your system's workflow. If you use HDFS files, you need
> to be ok with always rewriting the files to do any "updates". So even if you
> only add 1MB worth of new data to a 1TB dataset, you have to rewrite the
> whole thing. HBase would let you "insert" it where it belongs. (Of course,
> HBase has the same constraints as your applications do, except we've already
> done the work to manage random inserts.)
>
> Does this help you out?
>
> -Bryan
>
>
> On May 13, 2008, at 10:13 AM, Naama Kraus wrote:
>
>  Hi,
> >
> > Can anyone say some words on when to use HBase as opposed to using Plain
> > MapReduce on input files ?
> > In more details, when will it make sense to put data into HBase and then
> > use
> > HBase methods to access it, including running MapReduce on the data in
> > the
> > tables. As opposed to simply putting the data into HDFS and processing
> > it
> > with MapReduce.
> >
> > Thanks, Naama
> >
> > On Wed, Mar 12, 2008 at 12:15 AM, Bryan Duxbury <br...@rapleaf.com>
> > wrote:
> >
> >  I've written up a blog post discussing when I think it's appropriate to
> > > use HBase in response to some of the questions people usually ask. You
> > > can
> > > find it at http://blog.rapleaf.com/dev/?p=26.
> > >
> > > -Bryan
> > >
> > >
> >
> >
> > --
> > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00
> > oo
> > 00 oo 00 oo
> > "If you want your children to be intelligent, read them fairy tales. If
> > you
> > want them to be more intelligent, read them more fairy tales." (Albert
> > Einstein)
> >
>
>


-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Blog post about when to use HBase

Posted by Bryan Duxbury <br...@rapleaf.com>.

I think that the determining factor of when you should use HBase  
instead of HDFS files is really the consumption pattern. If you're  
only ever going to process the data in bulk, then chances are you'll  
get the most performance out of a raw HDFS file. However, if you need  
to have random access to some of the entries, then HBase will give  
you significant benefit.

There are other factors that go into this decision. One that I can  
think of off the top of my head is if you'd like to take advantage of  
the versioning and semi-defined schema of HBase for your dataset. It  
would be a little complicated to duplicate all of that logic on your  
own from a flat file.

Another factor is your system's workflow. If you use HDFS files, you  
need to be ok with always rewriting the files to do any "updates". So  
even if you only add 1MB worth of new data to a 1TB dataset, you have  
to rewrite the whole thing. HBase would let you "insert" it where it  
belongs. (Of course, HBase has the same constraints as your  
applications do, except we've already done the work to manage random  
inserts.)

Does this help you out?

-Bryan

On May 13, 2008, at 10:13 AM, Naama Kraus wrote:

> Hi,
>
> Can anyone say some words on when to use HBase as opposed to using  
> Plain
> MapReduce on input files ?
> In more details, when will it make sense to put data into HBase and  
> then use
> HBase methods to access it, including running MapReduce on the data  
> in the
> tables. As opposed to simply putting the data into HDFS and  
> processing it
> with MapReduce.
>
> Thanks, Naama
>
> On Wed, Mar 12, 2008 at 12:15 AM, Bryan Duxbury <br...@rapleaf.com>  
> wrote:
>
>> I've written up a blog post discussing when I think it's  
>> appropriate to
>> use HBase in response to some of the questions people usually ask.  
>> You can
>> find it at http://blog.rapleaf.com/dev/?p=26.
>>
>> -Bryan
>>
>
>
>
> -- 
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00  
> oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy  
> tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)