You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by stack <st...@duboce.net> on 2009/02/07 02:37:02 UTC

New hbase file format (HBASE-61)

Ryan Rawson and I have been running performance experiments where we swap
out MapFile and put in its place a customized HADOOP-3315 tfile and a format
that Ryan wrote himself named rfile.  Its looking like Ryans' rfile has many
benefits over our current MapFile based format and that we'll likely move on
to it over the next week or so.  If you're interested, see toward the end of
HBASE-61, the new file format discussion doc
http://wiki.apache.org/hadoop/Hbase/NewFileFormat, and the coarse
performance stats here:
http://wiki.apache.org/hadoop/Hbase/NewFileFormat/Performance.

Comments welcome either here or up in the issue,
Thanks,
St.Ack

Re: New hbase file format (HBASE-61)

Posted by Ryan Rawson <ry...@gmail.com>.
One of the important features of rfile is the eschewing of streaming, and
the realization that given HBase's memcache, every key and value must fit in
ram at least once.  So by stripping down complexity, and going with a
block-oriented read, it also makes reliable and massive block caching a easy
reality.  With a unbounded soft-ref-style block cache, hbase can use as much
ram as you throw at it via the -Xmx parameter to cache blocks and improve
performance.

Stack has posted those performance numbers that validate that a simpler
approach = faster.  There are many tuning parameters (block size vs expected
key size being one) that will affect and improve performance, and other
features that are addable.

My goal has been to improve HBase's end-user read performance by 50-100x.
I'm hoping that with rfile this becomes a reality.

-ryan

On Fri, Feb 6, 2009 at 5:37 PM, stack <st...@duboce.net> wrote:

> Ryan Rawson and I have been running performance experiments where we swap
> out MapFile and put in its place a customized HADOOP-3315 tfile and a
> format
> that Ryan wrote himself named rfile.  Its looking like Ryans' rfile has
> many
> benefits over our current MapFile based format and that we'll likely move
> on
> to it over the next week or so.  If you're interested, see toward the end
> of
> HBASE-61, the new file format discussion doc
> http://wiki.apache.org/hadoop/Hbase/NewFileFormat, and the coarse
> performance stats here:
> http://wiki.apache.org/hadoop/Hbase/NewFileFormat/Performance.
>
> Comments welcome either here or up in the issue,
> Thanks,
> St.Ack
>