You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stuti Awasthi <st...@hcl.com> on 2011/08/24 07:36:04 UTC

Search query in Hbase

Hi Friends,

I was wondering what could be the possible solution for various search options .

For example :

In user table we contain name and email of users. I want to check if the user name already exist and if it does then I will not put it in database.

User table , info : name
                         Info: email

Could be a solution :


1)      I scan through all the user rows to get the name and apply the logic to match the name with the existing name.
Personally I do not like this approach as for the huge user set this is quiet inefficient.

Is there some other way to perform this in Hbase ?

Thanks


________________________________
::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

RE: Search query in Hbase

Posted by Stuti Awasthi <st...@hcl.com>.
Thanks Lars,

This certainly helps . I will try solution.

-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com] 
Sent: Wednesday, August 24, 2011 11:46 AM
To: user@hbase.apache.org
Subject: Re: Search query in Hbase

Hi Stuti,

one of the main design tasks in HBase is to structure the key space correctly.
HBase does not maintain tables in the relational sense but keeps *sorted* tuples of the form:
(row-key, column family name, column name, timestamp, value)

A table is nothing more than a key-space isolation.


So in order to find your information quickly, make user-name the row-key (or at least the prefix of the row-key).
I.e. Store your information as (<user name>, <column family>, "email", ts, <email>). The ts is system generated by default.

If each username can only exist exactly once (as you seem to imply) you can then simply Get (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html) the "row" by the row-key (i.e. the username).

If not, you Scan (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) starting with user-name as the start-key, and 

continue to scan until the user name changes. Alternatively you can set an end-key.

Both will be very efficient (because the data is sorted by what you are looking for).

Does this help?


-- Lars



________________________________
From: Stuti Awasthi <st...@hcl.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Tuesday, August 23, 2011 10:36 PM
Subject: Search query in Hbase

Hi Friends,

I was wondering what could be the possible solution for various search options .

For example :

In user table we contain name and email of users. I want to check if the user name already exist and if it does then I will not put it in database.

User table , info : name
                         Info: email

Could be a solution :


1)      I scan through all the user rows to get the name and apply the logic to match the name with the existing name.
Personally I do not like this approach as for the huge user set this is quiet inefficient.

Is there some other way to perform this in Hbase ?

Thanks


________________________________
::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

Re: Search query in Hbase

Posted by lars hofhansl <lh...@yahoo.com>.
Hi Stuti,

one of the main design tasks in HBase is to structure the key space correctly.
HBase does not maintain tables in the relational sense but keeps *sorted* tuples of the form:
(row-key, column family name, column name, timestamp, value)

A table is nothing more than a key-space isolation.


So in order to find your information quickly, make user-name the row-key (or at least the prefix of the row-key).
I.e. Store your information as (<user name>, <column family>, "email", ts, <email>). The ts is system generated by default.

If each username can only exist exactly once (as you seem to imply) you can then simply
Get (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html) the "row" by the row-key (i.e. the username).

If not, you Scan (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) starting with user-name as the start-key, and 

continue to scan until the user name changes. Alternatively you can set an end-key.

Both will be very efficient (because the data is sorted by what you are looking for).

Does this help?


-- Lars



________________________________
From: Stuti Awasthi <st...@hcl.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Tuesday, August 23, 2011 10:36 PM
Subject: Search query in Hbase

Hi Friends,

I was wondering what could be the possible solution for various search options .

For example :

In user table we contain name and email of users. I want to check if the user name already exist and if it does then I will not put it in database.

User table , info : name
                         Info: email

Could be a solution :


1)      I scan through all the user rows to get the name and apply the logic to match the name with the existing name.
Personally I do not like this approach as for the huge user set this is quiet inefficient.

Is there some other way to perform this in Hbase ?

Thanks


________________________________
::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------