You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Rosemond Wu <ro...@gmail.com> on 2008/04/09 17:23:24 UTC

questions about overloading row key comparator method and some storage issues

Hello,
I have an application in which the data is sparse, is seldom deleted, and
requires extensible table schema. It looks to me hbase is a good fit. I have
several question:

* The row keys in my application require their own comparator. Where is a
good place to overload the key's comparator method? It looks to me that
hbase uses TreeMap in HStore.java to sort and save the tuples. Is this the
place I should start?
* How does hbase keep the order of rows? Does it use something similar to
SSTable described in Google bigtable paper? If my application inserts many
rows with key values that need to be sorted with rows already stored in
disk, will it result in lots of index reconstruction and tablet split?

BTW, why couldn't I access the archives of the old mailing list? When I hit
the link http://hadoop.apache.org/mail/hbase-dev/, I got the following error
message. The same problem happends to other mailist list as well.

 Forbidden

You don't have permission to access /mail/hbase-dev/ on this server.
------------------------------
Apache/2.2.8 (Unix) Server at hadoop.apache.org Port 80   regards RoseMond

Re: questions about overloading row key comparator method and some storage issues

Posted by stack <st...@duboce.net>.

Rosemond Wu wrote:
> Hello,
> I have an application in which the data is sparse, is seldom deleted, and
> requires extensible table schema. It looks to me hbase is a good fit. I have
> several question:
>
> * The row keys in my application require their own comparator. Where is a
> good place to overload the key's comparator method? It looks to me that
> hbase uses TreeMap in HStore.java to sort and save the tuples. Is this the
> place I should start?
>   

Keys currently are hardcoded as type Text.  HBASE-82 is about making 
keys be bytes with user supplying a Comparator.   Its on our near-term 
list of things to do.

> * How does hbase keep the order of rows? Does it use something similar to
> SSTable described in Google bigtable paper? If my application inserts many
> rows with key values that need to be sorted with rows already stored in
> disk, will it result in lots of index reconstruction and tablet split?
>   
Rows are lexicographically sorted in HBase (See Text.compareTo).

HBase works like bigtable; inserts go into memory first.  Memory is 
flushed when limits are reached.  The flushed files are compacted when 
they hit a limit.  In memory and on disk, edits are sorted.

If I understand the question,  whether inserts are sorted or not, the 
same amount of relative works is done.  Just the character of the upload 
as realized in the server will be different with unsorted inserts 
requiring the server to juggle more resources concurrently.

> BTW, why couldn't I access the archives of the old mailing list? When I hit
> the link http://hadoop.apache.org/mail/hbase-dev/, I got the following error
> message. The same problem happends to other mailist list as well.
>
>   
>  Forbidden
>   
Thanks for pointing out the broken link (looks like its broken for 
hadoop core too).  Let me try and fix.

St.Ack