You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eran Bergman <er...@gmail.com> on 2009/03/05 10:00:07 UTC

Data uniqueness

Hello,

Lately I have been experimenting with HBase and I came across a problem I
don't know how to solve yet.
My problem is data uniqueness, meaning I would like to have unique data in a
specified column (taking into account all or some subset of my rows).
I would like to have that for any number of columns which I will specify
(various types of data).

Usually the way to do this is to use some sort of indexing method, but this
will amount to round trips to the server for uniqueness checks before I
commit, which are very costly.

Does anyone have any thoughts on how to do this?


Thanks,
Eran

Re: Data uniqueness

Posted by Ryan Rawson <ry...@gmail.com>.
The only method of determining uniqueness of data in general in hbase is via
the row key.  Just like a primary key in a database, you can use it to
verify uniqueness, and do index scans and gets.

So generally speaking, yes you will have to make multiple trips to the
server to use a secondary index.  The situation might not be as dire as it
seems, since in 0.20 the speed targets for small data gets/sets is really
low (like maybe 1 ms?).

The solution to "need to do more" for hbase is generally 'well use
map-reduce'... which is the solution i will offer you as well.

Hopefully this answers some of your questions.

Good luck!
-ryan

On Thu, Mar 5, 2009 at 1:00 AM, Eran Bergman <er...@gmail.com>wrote:

> Hello,
>
> Lately I have been experimenting with HBase and I came across a problem I
> don't know how to solve yet.
> My problem is data uniqueness, meaning I would like to have unique data in
> a
> specified column (taking into account all or some subset of my rows).
> I would like to have that for any number of columns which I will specify
> (various types of data).
>
> Usually the way to do this is to use some sort of indexing method, but this
> will amount to round trips to the server for uniqueness checks before I
> commit, which are very costly.
>
> Does anyone have any thoughts on how to do this?
>
>
> Thanks,
> Eran
>