You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by cu...@poczta.xg.pl on 2008/02/25 20:42:05 UTC
how to - get last inserted row id
Hi
Sorry for my maby stupid question, but i can't find answer. Is there
any way to get last inserted row id ?
i have realy big table - now there is about 480 000 000 rows and i
need to get last 100 inserted.
maby there is some fuctionality like sql "order by" ?
unfortunately my row id are random guids.
i think that it will be very use full to create wiki page with
frequently used patterns, like how to join two tables, how to realize
where and how to use versions. but it's ony my own sugesion.
Thx Antony
Re: how to - get last inserted row id
Posted by cu...@poczta.xg.pl.
On Mon, 25 Feb 2008, Bryan Duxbury wrote:
>
> It seems to me that the only time you should use random GUIDs as row
> keys is if you don't care about the keys or the ordering of rows,
> since they're sorted lexically, and GUIDs are distributed randomly.
> If you want to find the last 100 inserted, you should make your keys
> timestamps instead. Then it'd be easier to find your way to the end.
> You'd still have to scan the last region of the table to find the
> last 100, but at least it would be doable.
thank - this can be help full and i will check it.
> There is also no way to join tables in HBase. HBase is not a
> relational database, so if you want to "join" two tables, you need to
> do so on your own with two scanners, and probably in a map/reduce
> job. However, you should consider denormalizing your data some, as
> HBase rows can store 1:n relationships ok with column families. See
> the Bigtable discussion of Webtable for some ideas of how this might
> work.
yeah, i know that there are no "join" - i just have sugest to create
page : "thinking in column-based databases". maby some comparsion of RDBMS
and column-based, study of the examples, when to use ...
it will be very usefull for new users of hbase and other bases.
Thx Antony
Re: how to - get last inserted row id
Posted by Bryan Duxbury <br...@rapleaf.com>.
There is no such thing as last inserted row id in HBase, unless you
can produce it from your own application. There's no autoincrement-
style magic ID in HBase, just the row keys you inserted with.
Also, the rows are already sorted by the key you used to write the
rows in the first place. Unfortunately, there's no "last key of
table" method in HBase right now, though we could support one.
It seems to me that the only time you should use random GUIDs as row
keys is if you don't care about the keys or the ordering of rows,
since they're sorted lexically, and GUIDs are distributed randomly.
If you want to find the last 100 inserted, you should make your keys
timestamps instead. Then it'd be easier to find your way to the end.
You'd still have to scan the last region of the table to find the
last 100, but at least it would be doable.
There is also no way to join tables in HBase. HBase is not a
relational database, so if you want to "join" two tables, you need to
do so on your own with two scanners, and probably in a map/reduce
job. However, you should consider denormalizing your data some, as
HBase rows can store 1:n relationships ok with column families. See
the Bigtable discussion of Webtable for some ideas of how this might
work.
Please continue to post questions to the list about where our
documentation is insufficient so that we can make the needed changes
on the wiki and elsewhere.
-Bryan
On Feb 25, 2008, at 11:42 AM, cure@poczta.xg.pl wrote:
>
> Hi
>
> Sorry for my maby stupid question, but i can't find answer. Is
> there
> any way to get last inserted row id ?
>
> i have realy big table - now there is about 480 000 000 rows and i
> need to get last 100 inserted.
>
> maby there is some fuctionality like sql "order by" ?
>
> unfortunately my row id are random guids.
>
> i think that it will be very use full to create wiki page with
> frequently used patterns, like how to join two tables, how to realize
> where and how to use versions. but it's ony my own sugesion.
>
> Thx Antony
Re: how to - get last inserted row id
Posted by cu...@poczta.xg.pl.
On Mon, 25 Feb 2008, stack wrote:
>
> P.S. Would be interested in hearing more about our 500M cluster if you
> have the time to describe: # of regionservers, data size, etc.
>
>
I want to create implementation of "giant forum" engine based on hbase,
or other column-based database, and now we are on research if it's realy
good idea to use it.
Maby creation of my own index / filter will be solution of the back order
problem ?
About Test :
I just run a simple test of hbase capacity. I have 4 * core Duo Athlon
with 32G RAM. There are 4*1TG in raid 0. I set HADOOP_HEAPSPACE to 230000
row is only 2 columns - length 10 bytes
Insert performance - max insert rate - 19000/sek - but ... there are some
problems with inserts and sometimes the hdf partition gets failed - but i
see that you know about this bug - it's full random - when it will happen.
I don't check read performance but i know that is high.
Antony
Re: how to - get last inserted row id
Posted by stack <st...@duboce.net>.
There is no 'order by' in hbase.
The best that I can come up with answering your question is to get the
last region in the table to find its start key and then start scanning
until you run off the end of the table. In pseudo code:
HTable t = new HTable(conf, YOUR_TABLE_NAME);
Text [] startKeys = t.getStartKeys();
if (startKeys.length > 0) {
HScannerInterface scanner = t.obtainScanner(COLUMNS,
startKeys[startKeys.length -1]);
while(scanner.next(...)) {
// Copy current key aside -- or the last 100 -- so you have the
last-keys-seen when scanner runs out
}
}
We have an FAQ with answers to commonly asked questions such as how to
up heap size or how to enable DEBUG. Would this be the place to put
'frequently used patterns'? If you think otherwise, please startup a
new wiki page.
Thanks for writing,
St.Ack
P.S. Would be interested in hearing more about our 500M cluster if you
have the time to describe: # of regionservers, data size, etc.
cure@poczta.xg.pl wrote:
> Hi
>
> Sorry for my maby stupid question, but i can't find answer. Is there
> any way to get last inserted row id ?
>
> i have realy big table - now there is about 480 000 000 rows and i
> need to get last 100 inserted.
>
> maby there is some fuctionality like sql "order by" ?
>
> unfortunately my row id are random guids.
>
> i think that it will be very use full to create wiki page with
> frequently used patterns, like how to join two tables, how to realize
> where and how to use versions. but it's ony my own sugesion.
>
> Thx Antony
>