You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by cu...@poczta.xg.pl on 2008/02/25 20:42:05 UTC

how to - get last inserted row id

   Hi

    Sorry for my maby stupid question, but i can't find answer. Is there
any way to get last inserted row id ?

    i have realy big table - now there is about 480 000 000 rows and i
need to get last 100 inserted.

   maby there is some fuctionality like sql "order by" ?

   unfortunately my row id are random guids.

   i think that it will be very use full to create wiki page with
frequently used patterns, like how to join two tables, how to realize
where and how to use versions. but it's ony my own sugesion.

     Thx Antony

Re: how to - get last inserted row id

Posted by cu...@poczta.xg.pl.

On Mon, 25 Feb 2008, Bryan Duxbury wrote:

>
> It seems to me that the only time you should use random GUIDs as row
> keys is if you don't care about the keys or the ordering of rows,
> since they're sorted lexically, and GUIDs are distributed randomly.
> If you want to find the last 100 inserted, you should make your keys
> timestamps instead. Then it'd be easier to find your way to the end.
> You'd still have to scan the last region of the table to find the
> last 100, but at least it would be doable.

  thank - this can be help full and i will check it.

> There is also no way to join tables in HBase. HBase is not a
> relational database, so if you want to "join" two tables, you need to
> do so on your own with two scanners, and probably in a map/reduce
> job. However, you should consider denormalizing your data some, as
> HBase rows can store 1:n relationships ok with column families. See
> the Bigtable discussion of Webtable for some ideas of how this might
> work.

  yeah, i know that there are no "join" - i just have sugest to create
page : "thinking in column-based databases". maby some comparsion of RDBMS
and column-based, study of the examples, when to use ...
 it will be very usefull for new users of hbase and other bases.

      Thx Antony


Re: how to - get last inserted row id

Posted by Bryan Duxbury <br...@rapleaf.com>.
There is no such thing as last inserted row id in HBase, unless you  
can produce it from your own application. There's no autoincrement- 
style magic ID in HBase, just the row keys you inserted with.

Also, the rows are already sorted by the key you used to write the  
rows in the first place. Unfortunately, there's no "last key of  
table" method in HBase right now, though we could support one.

It seems to me that the only time you should use random GUIDs as row  
keys is if you don't care about the keys or the ordering of rows,  
since they're sorted lexically, and GUIDs are distributed randomly.  
If you want to find the last 100 inserted, you should make your keys  
timestamps instead. Then it'd be easier to find your way to the end.  
You'd still have to scan the last region of the table to find the  
last 100, but at least it would be doable.

There is also no way to join tables in HBase. HBase is not a  
relational database, so if you want to "join" two tables, you need to  
do so on your own with two scanners, and probably in a map/reduce  
job. However, you should consider denormalizing your data some, as  
HBase rows can store 1:n relationships ok with column families. See  
the Bigtable discussion of Webtable for some ideas of how this might  
work.

Please continue to post questions to the list about where our  
documentation is insufficient so that we can make the needed changes  
on the wiki and elsewhere.

-Bryan

On Feb 25, 2008, at 11:42 AM, cure@poczta.xg.pl wrote:

>
>    Hi
>
>     Sorry for my maby stupid question, but i can't find answer. Is  
> there
> any way to get last inserted row id ?
>
>     i have realy big table - now there is about 480 000 000 rows and i
> need to get last 100 inserted.
>
>    maby there is some fuctionality like sql "order by" ?
>
>    unfortunately my row id are random guids.
>
>    i think that it will be very use full to create wiki page with
> frequently used patterns, like how to join two tables, how to realize
> where and how to use versions. but it's ony my own sugesion.
>
>      Thx Antony


Re: how to - get last inserted row id

Posted by cu...@poczta.xg.pl.

On Mon, 25 Feb 2008, stack wrote:

>
> P.S. Would be interested in hearing more about our 500M cluster if you
> have the time to describe: # of regionservers, data size, etc.
>
>

 I want to create implementation of "giant forum" engine based on hbase,
or other column-based database, and now we are on research if it's realy
good idea to use it.

 Maby creation of my own index / filter will be solution of the back order
problem ?

 About Test :

  I just run a simple test of hbase capacity. I have 4 * core Duo Athlon
with 32G RAM. There are 4*1TG in raid 0. I set HADOOP_HEAPSPACE to 230000

row is only 2 columns - length 10 bytes

Insert performance - max insert rate - 19000/sek - but ... there are some
problems with inserts and sometimes the hdf partition gets failed - but i
see that you know about this bug - it's full random - when it will happen.

I don't check read performance but i know that is high.

   Antony

Re: how to - get last inserted row id

Posted by stack <st...@duboce.net>.
There is no 'order by' in hbase.

The best that I can come up with answering your question is to get the 
last region in the table to find its start key and then start scanning 
until you run off the end of the table.  In pseudo code:

HTable t = new HTable(conf, YOUR_TABLE_NAME);
Text [] startKeys = t.getStartKeys();
if (startKeys.length > 0) {
  HScannerInterface scanner = t.obtainScanner(COLUMNS, 
startKeys[startKeys.length -1]);
  while(scanner.next(...)) {
     // Copy current key aside  -- or the last 100 -- so you have the 
last-keys-seen when scanner runs out
  }
}

We have an FAQ with answers to commonly asked questions such as how to 
up heap size or how to enable DEBUG.  Would this be the place to put 
'frequently used patterns'?   If you think otherwise, please startup a 
new wiki page.

Thanks for writing,
St.Ack

P.S. Would be interested in hearing more about our 500M cluster if you 
have the time to describe: # of regionservers, data size, etc.


cure@poczta.xg.pl wrote:
>    Hi
>
>     Sorry for my maby stupid question, but i can't find answer. Is there
> any way to get last inserted row id ?
>
>     i have realy big table - now there is about 480 000 000 rows and i
> need to get last 100 inserted.
>
>    maby there is some fuctionality like sql "order by" ?
>
>    unfortunately my row id are random guids.
>
>    i think that it will be very use full to create wiki page with
> frequently used patterns, like how to join two tables, how to realize
> where and how to use versions. but it's ony my own sugesion.
>
>      Thx Antony
>