You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bin YANG <ya...@gmail.com> on 2008/03/06 09:39:30 UTC

Does HBase have a index?

Dear colleagues,

I have a questions on HBase's index implementation.

How does the HBase find the data according to a row key? Use a index
like database, or use a hash function?
I suppose that a hash function which hash row key to physical address
is more efficient.

As we know, a big table in HBase is stored as several Small tables,
each table stores attributes in a column family.
So that, each row may be stored in several small tables.
Does a hash function hash row key to many physical address? Each
physical address correspond to a small table which contains the row
key?

Does anybody have idea on how to create a index on other attribute?

Best,
Bin YANG
-- 
Bin YANG
Department of Computer Science and Engineering
Fudan University
Shanghai, P. R. China
EMail: yangbinisme82@gmail.com

RE: Does HBase have a index?

Posted by Jim Kellerman <ji...@powerset.com>.
Each cell in HBase has a key which is a tuple consisting of
row-key column-family:column-member and timestamp

Large tables are broken into row ranges called regions.

All the members of a single column family are stored together
in a file.

Thus the row key is used to find the region and the column family
is used to find the the file.

Each file has a sparse index composed of the row/column/timestamp
keys, so finding a particular cell involves binary searching the
index (which is kept in memory).

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Bin YANG [mailto:yangbinisme82@gmail.com]
> Sent: Thursday, March 06, 2008 12:40 AM
> To: hadoop-user@lucene.apache.org
> Subject: Does HBase have a index?
>
> Dear colleagues,
>
> I have a questions on HBase's index implementation.
>
> How does the HBase find the data according to a row key? Use
> a index like database, or use a hash function?
> I suppose that a hash function which hash row key to physical
> address is more efficient.
>
> As we know, a big table in HBase is stored as several Small
> tables, each table stores attributes in a column family.
> So that, each row may be stored in several small tables.
> Does a hash function hash row key to many physical address?
> Each physical address correspond to a small table which
> contains the row key?
>
> Does anybody have idea on how to create a index on other attribute?
>
> Best,
> Bin YANG
> --
> Bin YANG
> Department of Computer Science and Engineering Fudan
> University Shanghai, P. R. China
> EMail: yangbinisme82@gmail.com
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.21.4/1313 - Release
> Date: 3/5/2008 9:50 AM
>
>

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.21.4/1313 - Release Date: 3/5/2008 9:50 AM


Re: Does HBase have a index?

Posted by stack <st...@duboce.net>.
Bin:

FYI, there is now a hbase mailing list: See 
http://hadoop.apache.org/hbase/mailing_lists.html#Developers. Your 
questions (and Ma's on 'connection to HBase using HTable') would sit 
better there.

St.Ack


Bin YANG wrote:
> Dear colleagues,
>
> I have a questions on HBase's index implementation.
>
> How does the HBase find the data according to a row key? Use a index
> like database, or use a hash function?
> I suppose that a hash function which hash row key to physical address
> is more efficient.
>
> As we know, a big table in HBase is stored as several Small tables,
> each table stores attributes in a column family.
> So that, each row may be stored in several small tables.
> Does a hash function hash row key to many physical address? Each
> physical address correspond to a small table which contains the row
> key?
>
> Does anybody have idea on how to create a index on other attribute?
>
> Best,
> Bin YANG
>   


Re: Does HBase have a index?

Posted by edward yoon <ed...@udanax.org>.
Nice review Bin Yang.

HQL may/will be moved to the HrdfStore project
(http://wiki.apache.org/incubator/HRdfStoreProposal) because it is not
a Hbase community(BigTable clone)'s goal and i'm not a leader of
Hbase, So i can't answer your question.

Instead, HRdfStore and Query Languages will show the answer.

On 3/6/08, Bin YANG <ya...@gmail.com> wrote:
> Dear Edward Yoon,
>
> What I want to know is how HBase to execute the HQL query.
>
> I know that the SELECT in HQL can just specify row keys in WHERE clause.
> If I want to make the WHERE support column=***, I think HBase need indices.
> So, I want to know how does "WHERE row keys = ***" do now.
>
> About the column-oriented database, I know C-Store is a example.
> C-Sotre is a complete relational database, while the bigtable is not a
> relational database.
> Bigtable just supports to store simple relational data. So that, I
> think Bigtable should have
> different index models compared with column-oriented database.
>
> I think CAN is one of the Distributed Hash Table, maybe CHORD, PASTRY
> is similar to CAN.
> DHT can store data as a key find the data according to the key, but it
> cannot support store data with locality.
> For example, if you want to store two webpage form same site in one
> node, the bigtable can assure the requirement,
> but the DHT cannot assure. Using DHT, the two webpage may store in two
> nodes, because the content of the two
> pages are different.
>
> What is your opinion on how to support powerful HQL in HBase or Bigtable?
>
> Cheers,
> Bin YANG
>
> On Thu, Mar 6, 2008 at 5:55 PM, edward yoon <ed...@udanax.org> wrote:
> > >> I suppose that a hash function which hash row key to physical address
> >  is more efficient.
> >
> >  Actually, I thought and consider about CAN (Content-Addressable
> >  Network) because BigTable has a immutable meta tree as a b+tree.
> >  I don't exactly know what do you think, but it is a revolutionary idea.
> >
> >  If I may digress from my theme for a moment,
> >  For 30 years, The benefits of column-store development has been the
> >  subject of much/some debate. but, the availability of column-store has
> >  not been authenticated.
> >  And also, i couldn't proof an benefits of Hbase (BigTable clone)
> >  because there is an various alternative suggestions. But, I recently
> >  find a answer that they only made a BigTable for fun.
> >
> >
> >
> >  On 3/6/08, Bin YANG <ya...@gmail.com> wrote:
> >  > Dear colleagues,
> >  >
> >  >  I have a questions on HBase's index implementation.
> >  >
> >  >  How does the HBase find the data according to a row key? Use a index
> >  >  like database, or use a hash function?
> >  >  I suppose that a hash function which hash row key to physical address
> >  >  is more efficient.
> >  >
> >  >  As we know, a big table in HBase is stored as several Small tables,
> >  >  each table stores attributes in a column family.
> >  >  So that, each row may be stored in several small tables.
> >  >  Does a hash function hash row key to many physical address? Each
> >  >  physical address correspond to a small table which contains the row
> >  >  key?
> >  >
> >  >  Does anybody have idea on how to create a index on other attribute?
> >  >
> >  >  Best,
> >  >  Bin YANG
> >  >
> >  > --
> >  >  Bin YANG
> >  >  Department of Computer Science and Engineering
> >  >  Fudan University
> >  >  Shanghai, P. R. China
> >  >  EMail: yangbinisme82@gmail.com
> >  >
> >
> >
> >  --
> >  B. Regards,
> >  Edward yoon @ NHN, corp.
> >
>
>
>
> --
> Bin YANG
> Department of Computer Science and Engineering
> Fudan University
> Shanghai, P. R. China
> EMail: yangbinisme82@gmail.com
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: Does HBase have a index?

Posted by Bin YANG <ya...@gmail.com>.
Dear Edward Yoon,

What I want to know is how HBase to execute the HQL query.

I know that the SELECT in HQL can just specify row keys in WHERE clause.
If I want to make the WHERE support column=***, I think HBase need indices.
So, I want to know how does "WHERE row keys = ***" do now.

About the column-oriented database, I know C-Store is a example.
C-Sotre is a complete relational database, while the bigtable is not a
relational database.
Bigtable just supports to store simple relational data. So that, I
think Bigtable should have
different index models compared with column-oriented database.

I think CAN is one of the Distributed Hash Table, maybe CHORD, PASTRY
is similar to CAN.
DHT can store data as a key find the data according to the key, but it
cannot support store data with locality.
For example, if you want to store two webpage form same site in one
node, the bigtable can assure the requirement,
but the DHT cannot assure. Using DHT, the two webpage may store in two
nodes, because the content of the two
pages are different.

What is your opinion on how to support powerful HQL in HBase or Bigtable?

Cheers,
Bin YANG

On Thu, Mar 6, 2008 at 5:55 PM, edward yoon <ed...@udanax.org> wrote:
> >> I suppose that a hash function which hash row key to physical address
>  is more efficient.
>
>  Actually, I thought and consider about CAN (Content-Addressable
>  Network) because BigTable has a immutable meta tree as a b+tree.
>  I don't exactly know what do you think, but it is a revolutionary idea.
>
>  If I may digress from my theme for a moment,
>  For 30 years, The benefits of column-store development has been the
>  subject of much/some debate. but, the availability of column-store has
>  not been authenticated.
>  And also, i couldn't proof an benefits of Hbase (BigTable clone)
>  because there is an various alternative suggestions. But, I recently
>  find a answer that they only made a BigTable for fun.
>
>
>
>  On 3/6/08, Bin YANG <ya...@gmail.com> wrote:
>  > Dear colleagues,
>  >
>  >  I have a questions on HBase's index implementation.
>  >
>  >  How does the HBase find the data according to a row key? Use a index
>  >  like database, or use a hash function?
>  >  I suppose that a hash function which hash row key to physical address
>  >  is more efficient.
>  >
>  >  As we know, a big table in HBase is stored as several Small tables,
>  >  each table stores attributes in a column family.
>  >  So that, each row may be stored in several small tables.
>  >  Does a hash function hash row key to many physical address? Each
>  >  physical address correspond to a small table which contains the row
>  >  key?
>  >
>  >  Does anybody have idea on how to create a index on other attribute?
>  >
>  >  Best,
>  >  Bin YANG
>  >
>  > --
>  >  Bin YANG
>  >  Department of Computer Science and Engineering
>  >  Fudan University
>  >  Shanghai, P. R. China
>  >  EMail: yangbinisme82@gmail.com
>  >
>
>
>  --
>  B. Regards,
>  Edward yoon @ NHN, corp.
>



-- 
Bin YANG
Department of Computer Science and Engineering
Fudan University
Shanghai, P. R. China
EMail: yangbinisme82@gmail.com

Re: Does HBase have a index?

Posted by edward yoon <ed...@udanax.org>.
>> I suppose that a hash function which hash row key to physical address
is more efficient.

Actually, I thought and consider about CAN (Content-Addressable
Network) because BigTable has a immutable meta tree as a b+tree.
I don't exactly know what do you think, but it is a revolutionary idea.

If I may digress from my theme for a moment,
For 30 years, The benefits of column-store development has been the
subject of much/some debate. but, the availability of column-store has
not been authenticated.
And also, i couldn't proof an benefits of Hbase (BigTable clone)
because there is an various alternative suggestions. But, I recently
find a answer that they only made a BigTable for fun.

On 3/6/08, Bin YANG <ya...@gmail.com> wrote:
> Dear colleagues,
>
>  I have a questions on HBase's index implementation.
>
>  How does the HBase find the data according to a row key? Use a index
>  like database, or use a hash function?
>  I suppose that a hash function which hash row key to physical address
>  is more efficient.
>
>  As we know, a big table in HBase is stored as several Small tables,
>  each table stores attributes in a column family.
>  So that, each row may be stored in several small tables.
>  Does a hash function hash row key to many physical address? Each
>  physical address correspond to a small table which contains the row
>  key?
>
>  Does anybody have idea on how to create a index on other attribute?
>
>  Best,
>  Bin YANG
>
> --
>  Bin YANG
>  Department of Computer Science and Engineering
>  Fudan University
>  Shanghai, P. R. China
>  EMail: yangbinisme82@gmail.com
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: Does HBase have a index?

Posted by Michael Bieniosek <mi...@powerset.com>.
hbase is a clone of bigtable: http://labs.google.com/papers/bigtable.html

There is a META table that contains a mapping from ranges of rows to a regionserver where the row is stored.  Only the row key is indexed, and I don't think bigtable/hbase is designed to have indexes on other attributes.

-Michael

On 3/6/08 12:39 AM, "Bin YANG" <ya...@gmail.com> wrote:

Dear colleagues,

I have a questions on HBase's index implementation.

How does the HBase find the data according to a row key? Use a index
like database, or use a hash function?
I suppose that a hash function which hash row key to physical address
is more efficient.

As we know, a big table in HBase is stored as several Small tables,
each table stores attributes in a column family.
So that, each row may be stored in several small tables.
Does a hash function hash row key to many physical address? Each
physical address correspond to a small table which contains the row
key?

Does anybody have idea on how to create a index on other attribute?

Best,
Bin YANG
--
Bin YANG
Department of Computer Science and Engineering
Fudan University
Shanghai, P. R. China
EMail: yangbinisme82@gmail.com