You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Naama Kraus <na...@gmail.com> on 2008/10/06 10:48:49 UTC

HBase for small data sets

Hi,

Will HBase work reasonably for small data sets ? E.g. for 10s or 100s of
Gigas of data? Would it make sense to use HBase to store and access them ?

I was thinking HDFS and M/R have a overhead thus won't perform well for
small amounts of data. But say I use HBase w/o MapReduce (get, set, scan
only) and use local file system underneath, will I get reasonable
performance ? E.g. as opposed to using an RDBMS such as Apache Derby or
MySQL ? (Note that I am not thinking on having complicated relational
operations, or complicated schemes, but mainly a need in a key-value store).

Thanks, Naama
-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: HBase for small data sets

Posted by Andrew Purtell <ap...@yahoo.com>.

Also there are other distributed filesystems than Hadoop DFS
supported by the Hadoop FS abstraction layer, such as KFS. 
(Are the IBM Almaden folks running HBase on KFS?) So depending
on the random read performance of the underlying file system
query response times will differ, and maybe improve over that
achievable when running on top of HDFS. 

   - Andy

> From: stack <st...@duboce.net>
> Subject: Re: HBase for small data sets
> To: hbase-user@hadoop.apache.org
> Date: Wednesday, October 8, 2008, 2:35 PM
> Naama Kraus wrote:
> > Hi,
> >
> > Will HBase work reasonably for small data sets ? E.g.
> > for 10s or 100s of Gigas of data? Would it make sense
> > to use HBase to store and access them ?
> >   
> > I was thinking HDFS and M/R have a overhead thus
> > won't perform well for small amounts of data. But say I
> > use HBase w/o MapReduce (get, set, scan only) and use
> > local file system underneath, will I get reasonable
> > performance ?
>
> It won't look 'reasonable' if stacked against a RDBMS.
> Might be 'fast enough' though?
> 
> Scanning has been recently improved (4-fold is what I'm seeing)
> in trunk.  Coming batching facility should improve writes
> similarly. Random reads continue to suffer but might be OK
> going against local filesystem?
> 
> Thanks Naama,
> St.Ack

Re: HBase for small data sets

Posted by stack <st...@duboce.net>.

Naama Kraus wrote:
> Hi,
>
> Will HBase work reasonably for small data sets ? E.g. for 10s or 100s of
> Gigas of data? Would it make sense to use HBase to store and access them ?
>   
> I was thinking HDFS and M/R have a overhead thus won't perform well for
> small amounts of data. But say I use HBase w/o MapReduce (get, set, scan
> only) and use local file system underneath, will I get reasonable
> performance ?
It won't look 'reasonable' if stacked against a RDBMS.  Might be 'fast 
enough' though?

Scanning has been recently improved (4-fold is what I'm seeing) in 
trunk.  Coming batching facility should improve writes similarly.   
Random reads continue to suffer but might be OK going against local 
filesystem?

Thanks Naama,
St.Ack