You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Lin Ma <li...@gmail.com> on 2012/08/05 15:04:26 UTC

column based or row based storage for HBase?

Hi guys,

I am wondering whether HBase is using column based storage or row based
storage?

   - I read some technical documents and mentioned advantages of HBase is
   using column based storage to store similar data together to foster
   compression. So it means same columns of different rows are stored together;
   - But I also learned HBase is a sorted key-value map in underlying
   HFile. It uses key to address all related columns for that key (row), so it
   seems to be a row based storage?

It is appreciated if anyone could clarify my confusions. Any related
documents or code for more details are welcome.

thanks in advance,

Lin

Re: column based or row based storage for HBase?

Posted by Lin Ma <li...@gmail.com>.
Thank you Yong,

So just clarify one thing, for your comments -- "column family stores
continuously", does not mean data are stored *column after column physically
* (e.g. store col1 of row 1, then col 1 of row 2, then col 1 of row 3, then
col 2 of row 1, then col 2 of row 2, and finally col 2 of row 3), but means
stored *row after row physically* (store col1 of row 1, then col 2 of row
1, then col1 of row 2, then col 2 of row 2, then  col1 of row 3, then col 2
of row 3)?

regards,
Lin

On Mon, Aug 6, 2012 at 11:37 AM, yonghu <yo...@gmail.com> wrote:

> In my understanding of column-oriented structure of hbase, the first
> thing is the term column-oriented. The meaning is that the data which
> belongs to the same column family stores continuously in the disk. For
> each column-family, the data is stored as row store. If you want to
> understand the internal mechnisam of HBase, you'd better take a look
> at the content of HFile.
>
> regards!
>
> Yong
>
> On Mon, Aug 6, 2012 at 5:03 AM, Lin Ma <li...@gmail.com> wrote:
> > Thank you for the informative reply, Mohit!
> >
> > Some more comments,
> >
> > 1. actually my confusion about column based storage is from the book
> "HBase
> > The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
> > draw a picture showing HBase store the same column of all different rows
> > continuously physically in storage. Any comments?
> >
> > 2. I want to confirm my understanding is correct -- supposing I have only
> > one column family with 10 columns, the physical storage is row (with all
> > related columns) after row, other than store 1st column of all rows, then
> > store 2nd columns of all rows, etc?
> >
> > 3. It seems when we say column based storage, there are two meanings, (1)
> > column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> > where the same column of different rows stored together, (2) and column
> > oriented architecture, e.g. how Hbase is designed, which is used to
> > describe the pattern to store sparse, large number of columns (with NULL
> > for free). Any comments?
> >
> > regards,
> > Lin
> >
> > On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
> >
> >> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <li...@gmail.com> wrote:
> >>
> >> > Hi guys,
> >> >
> >> > I am wondering whether HBase is using column based storage or row
> based
> >> > storage?
> >> >
> >> >    - I read some technical documents and mentioned advantages of
> HBase is
> >> >    using column based storage to store similar data together to foster
> >> >    compression. So it means same columns of different rows are stored
> >> > together;
> >>
> >>
> >> Probably what you read was in context of Column Families. HBase has
> concept
> >> of column family similar to Google's bigtable. And the store files on
> disk
> >> is per column family. All columns of a given column family are in one
> store
> >> file and columns of different column family is a different file.
> >>
> >>
> >> >    - But I also learned HBase is a sorted key-value map in underlying
> >> >    HFile. It uses key to address all related columns for that key
> (row),
> >> > so it
> >> >    seems to be a row based storage?
> >> >
> >> HBase stores entire row together along with columns represented by
> >> KeyValue. This is also called cell in HBase.
> >>
> >>
> >> > It is appreciated if anyone could clarify my confusions. Any related
> >> > documents or code for more details are welcome.
> >> >
> >> > thanks in advance,
> >> >
> >> > Lin
> >> >
> >>
>

Re: column based or row based storage for HBase?

Posted by yonghu <yo...@gmail.com>.
In my understanding of column-oriented structure of hbase, the first
thing is the term column-oriented. The meaning is that the data which
belongs to the same column family stores continuously in the disk. For
each column-family, the data is stored as row store. If you want to
understand the internal mechnisam of HBase, you'd better take a look
at the content of HFile.

regards!

Yong

On Mon, Aug 6, 2012 at 5:03 AM, Lin Ma <li...@gmail.com> wrote:
> Thank you for the informative reply, Mohit!
>
> Some more comments,
>
> 1. actually my confusion about column based storage is from the book "HBase
> The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
> draw a picture showing HBase store the same column of all different rows
> continuously physically in storage. Any comments?
>
> 2. I want to confirm my understanding is correct -- supposing I have only
> one column family with 10 columns, the physical storage is row (with all
> related columns) after row, other than store 1st column of all rows, then
> store 2nd columns of all rows, etc?
>
> 3. It seems when we say column based storage, there are two meanings, (1)
> column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> where the same column of different rows stored together, (2) and column
> oriented architecture, e.g. how Hbase is designed, which is used to
> describe the pattern to store sparse, large number of columns (with NULL
> for free). Any comments?
>
> regards,
> Lin
>
> On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <li...@gmail.com> wrote:
>>
>> > Hi guys,
>> >
>> > I am wondering whether HBase is using column based storage or row based
>> > storage?
>> >
>> >    - I read some technical documents and mentioned advantages of HBase is
>> >    using column based storage to store similar data together to foster
>> >    compression. So it means same columns of different rows are stored
>> > together;
>>
>>
>> Probably what you read was in context of Column Families. HBase has concept
>> of column family similar to Google's bigtable. And the store files on disk
>> is per column family. All columns of a given column family are in one store
>> file and columns of different column family is a different file.
>>
>>
>> >    - But I also learned HBase is a sorted key-value map in underlying
>> >    HFile. It uses key to address all related columns for that key (row),
>> > so it
>> >    seems to be a row based storage?
>> >
>> HBase stores entire row together along with columns represented by
>> KeyValue. This is also called cell in HBase.
>>
>>
>> > It is appreciated if anyone could clarify my confusions. Any related
>> > documents or code for more details are welcome.
>> >
>> > thanks in advance,
>> >
>> > Lin
>> >
>>

Re: column based or row based storage for HBase?

Posted by Mohit Anchlia <mo...@gmail.com>.
On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma <li...@gmail.com> wrote:

> Thank you for the informative reply, Mohit!
>
> Some more comments,
>
> 1. actually my confusion about column based storage is from the book
> "HBase The Definitive Guide", chapter 1, section "the Dawn of Big Data",
> which draw a picture showing HBase store the same column of all different
> rows continuously physically in storage. Any comments?
>
> 2. I want to confirm my understanding is correct -- supposing I have only
> one column family with 10 columns, the physical storage is row (with all
> related columns) after row, other than store 1st column of all rows, then
> store 2nd columns of all rows, etc?
>
> 3. It seems when we say column based storage, there are two meanings, (1)
> column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> where the same column of different rows stored together, (2) and column
> oriented architecture, e.g. how Hbase is designed, which is used to
> describe the pattern to store sparse, large number of columns (with NULL
> for free). Any comments?
>
>
In simple terms, HBase is not a column Oriented store. All the data for a
row is stored together but the store file is created only per column family.


> regards,
> Lin
>
>
> On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <li...@gmail.com> wrote:
>>
>> > Hi guys,
>> >
>> > I am wondering whether HBase is using column based storage or row based
>> > storage?
>> >
>> >    - I read some technical documents and mentioned advantages of HBase
>> is
>> >    using column based storage to store similar data together to foster
>> >    compression. So it means same columns of different rows are stored
>> > together;
>>
>>
>> Probably what you read was in context of Column Families. HBase has
>> concept
>> of column family similar to Google's bigtable. And the store files on disk
>> is per column family. All columns of a given column family are in one
>> store
>> file and columns of different column family is a different file.
>>
>>
>> >    - But I also learned HBase is a sorted key-value map in underlying
>> >    HFile. It uses key to address all related columns for that key (row),
>> > so it
>> >    seems to be a row based storage?
>> >
>> HBase stores entire row together along with columns represented by
>> KeyValue. This is also called cell in HBase.
>>
>>
>> > It is appreciated if anyone could clarify my confusions. Any related
>> > documents or code for more details are welcome.
>> >
>> > thanks in advance,
>> >
>> > Lin
>> >
>>
>
>

Re: column based or row based storage for HBase?

Posted by Lin Ma <li...@gmail.com>.
Hi Jason,

This is very good reference. I read it from begin to the end and learned a
lot. Thanks and have a good weekend.

regards,
Lin

On Tue, Aug 7, 2012 at 2:00 AM, Jason Frantz <jf...@maprtech.com> wrote:

> Lin,
>
> Looks like your questions may already be answered, but you might find the
> following link comparing "traditional" columnar databases against
> HBase/BigTable interesting:
>
>
> http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html
>
> -Jason
>
> On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma <li...@gmail.com> wrote:
>
> > Thank you for the informative reply, Mohit!
> >
> > Some more comments,
> >
> > 1. actually my confusion about column based storage is from the book
> "HBase
> > The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
> > draw a picture showing HBase store the same column of all different rows
> > continuously physically in storage. Any comments?
> >
> > 2. I want to confirm my understanding is correct -- supposing I have only
> > one column family with 10 columns, the physical storage is row (with all
> > related columns) after row, other than store 1st column of all rows, then
> > store 2nd columns of all rows, etc?
> >
> > 3. It seems when we say column based storage, there are two meanings, (1)
> > column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> > where the same column of different rows stored together, (2) and column
> > oriented architecture, e.g. how Hbase is designed, which is used to
> > describe the pattern to store sparse, large number of columns (with NULL
> > for free). Any comments?
> >
> > regards,
> > Lin
> >
> > On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanchlia@gmail.com
> > >wrote:
> >
> > > On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <li...@gmail.com> wrote:
> > >
> > > > Hi guys,
> > > >
> > > > I am wondering whether HBase is using column based storage or row
> based
> > > > storage?
> > > >
> > > >    - I read some technical documents and mentioned advantages of
> HBase
> > is
> > > >    using column based storage to store similar data together to
> foster
> > > >    compression. So it means same columns of different rows are stored
> > > > together;
> > >
> > >
> > > Probably what you read was in context of Column Families. HBase has
> > concept
> > > of column family similar to Google's bigtable. And the store files on
> > disk
> > > is per column family. All columns of a given column family are in one
> > store
> > > file and columns of different column family is a different file.
> > >
> > >
> > > >    - But I also learned HBase is a sorted key-value map in underlying
> > > >    HFile. It uses key to address all related columns for that key
> > (row),
> > > > so it
> > > >    seems to be a row based storage?
> > > >
> > > HBase stores entire row together along with columns represented by
> > > KeyValue. This is also called cell in HBase.
> > >
> > >
> > > > It is appreciated if anyone could clarify my confusions. Any related
> > > > documents or code for more details are welcome.
> > > >
> > > > thanks in advance,
> > > >
> > > > Lin
> > > >
> > >
> >
>

Re: column based or row based storage for HBase?

Posted by Jason Frantz <jf...@maprtech.com>.
Lin,

Looks like your questions may already be answered, but you might find the
following link comparing "traditional" columnar databases against
HBase/BigTable interesting:

http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html

-Jason

On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma <li...@gmail.com> wrote:

> Thank you for the informative reply, Mohit!
>
> Some more comments,
>
> 1. actually my confusion about column based storage is from the book "HBase
> The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
> draw a picture showing HBase store the same column of all different rows
> continuously physically in storage. Any comments?
>
> 2. I want to confirm my understanding is correct -- supposing I have only
> one column family with 10 columns, the physical storage is row (with all
> related columns) after row, other than store 1st column of all rows, then
> store 2nd columns of all rows, etc?
>
> 3. It seems when we say column based storage, there are two meanings, (1)
> column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
> where the same column of different rows stored together, (2) and column
> oriented architecture, e.g. how Hbase is designed, which is used to
> describe the pattern to store sparse, large number of columns (with NULL
> for free). Any comments?
>
> regards,
> Lin
>
> On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
>
> > On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <li...@gmail.com> wrote:
> >
> > > Hi guys,
> > >
> > > I am wondering whether HBase is using column based storage or row based
> > > storage?
> > >
> > >    - I read some technical documents and mentioned advantages of HBase
> is
> > >    using column based storage to store similar data together to foster
> > >    compression. So it means same columns of different rows are stored
> > > together;
> >
> >
> > Probably what you read was in context of Column Families. HBase has
> concept
> > of column family similar to Google's bigtable. And the store files on
> disk
> > is per column family. All columns of a given column family are in one
> store
> > file and columns of different column family is a different file.
> >
> >
> > >    - But I also learned HBase is a sorted key-value map in underlying
> > >    HFile. It uses key to address all related columns for that key
> (row),
> > > so it
> > >    seems to be a row based storage?
> > >
> > HBase stores entire row together along with columns represented by
> > KeyValue. This is also called cell in HBase.
> >
> >
> > > It is appreciated if anyone could clarify my confusions. Any related
> > > documents or code for more details are welcome.
> > >
> > > thanks in advance,
> > >
> > > Lin
> > >
> >
>

Re: column based or row based storage for HBase?

Posted by Lin Ma <li...@gmail.com>.
Thank you for the informative reply, Mohit!

Some more comments,

1. actually my confusion about column based storage is from the book "HBase
The Definitive Guide", chapter 1, section "the Dawn of Big Data", which
draw a picture showing HBase store the same column of all different rows
continuously physically in storage. Any comments?

2. I want to confirm my understanding is correct -- supposing I have only
one column family with 10 columns, the physical storage is row (with all
related columns) after row, other than store 1st column of all rows, then
store 2nd columns of all rows, etc?

3. It seems when we say column based storage, there are two meanings, (1)
column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS,
where the same column of different rows stored together, (2) and column
oriented architecture, e.g. how Hbase is designed, which is used to
describe the pattern to store sparse, large number of columns (with NULL
for free). Any comments?

regards,
Lin

On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <li...@gmail.com> wrote:
>
> > Hi guys,
> >
> > I am wondering whether HBase is using column based storage or row based
> > storage?
> >
> >    - I read some technical documents and mentioned advantages of HBase is
> >    using column based storage to store similar data together to foster
> >    compression. So it means same columns of different rows are stored
> > together;
>
>
> Probably what you read was in context of Column Families. HBase has concept
> of column family similar to Google's bigtable. And the store files on disk
> is per column family. All columns of a given column family are in one store
> file and columns of different column family is a different file.
>
>
> >    - But I also learned HBase is a sorted key-value map in underlying
> >    HFile. It uses key to address all related columns for that key (row),
> > so it
> >    seems to be a row based storage?
> >
> HBase stores entire row together along with columns represented by
> KeyValue. This is also called cell in HBase.
>
>
> > It is appreciated if anyone could clarify my confusions. Any related
> > documents or code for more details are welcome.
> >
> > thanks in advance,
> >
> > Lin
> >
>

Re: column based or row based storage for HBase?

Posted by Mohit Anchlia <mo...@gmail.com>.
On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <li...@gmail.com> wrote:

> Hi guys,
>
> I am wondering whether HBase is using column based storage or row based
> storage?
>
>    - I read some technical documents and mentioned advantages of HBase is
>    using column based storage to store similar data together to foster
>    compression. So it means same columns of different rows are stored
> together;


Probably what you read was in context of Column Families. HBase has concept
of column family similar to Google's bigtable. And the store files on disk
is per column family. All columns of a given column family are in one store
file and columns of different column family is a different file.


>    - But I also learned HBase is a sorted key-value map in underlying
>    HFile. It uses key to address all related columns for that key (row),
> so it
>    seems to be a row based storage?
>
HBase stores entire row together along with columns represented by
KeyValue. This is also called cell in HBase.


> It is appreciated if anyone could clarify my confusions. Any related
> documents or code for more details are welcome.
>
> thanks in advance,
>
> Lin
>

Re: column based or row based storage for HBase?

Posted by Lin Ma <li...@gmail.com>.
Thank you lars.

My question is answered.

regards,
Lin

On Mon, Aug 6, 2012 at 12:30 PM, lars hofhansl <lh...@yahoo.com> wrote:

> A key in HBase looks like this: (rowkey, column family, column, timestamp)
>
> HBase will do two things for you:
> 1. All keys that have the same row key are stored in the same region
> 2. All keys are sorted
>
>
> (The column family is special in the each column family has it's one store
> file, but the logical sort order still holds).
>
> Think of it this way.
> Say you have two column families and two regions (A and B). You find the
> following ordering:
> Storefile(s) for column family 1 in Region A:
> (row1, column family1, column1, ts)->value
> (row1, column family1, column2, ts)->value
> (row2, column family1, column1, ts)->value
> (row2, column family1, column2, ts)->value
>
> Storefile(s) for column family 1 in Region B:
> (row3, column family1, column1, ts)->value
> (row3, column family1, column2, ts)->value
>
> Storefile(s) for column family 2: in Region A:
> (row1, column family2, column1, ts)->value
> (row1, column family2, column2, ts)->value
> (row2, column family2, column1, ts)->value
> (row2, column family2, column2, ts)->value
>
> Storefile(s) for column family 2 in Region B:
> (row3, column family2, column1, ts)->value
> (row3, column family2, column2, ts)->value
>
> So region A has rows row1 and row2, region B has row3.
> A region is shard of a table based on the row key and just
>
> #1 above means that HBase will never place key value for "row1" in
> different regions.
> #2 means you very efficiently locate specific keys, as they are always
> stored sorted.
>
> You should work through the topic in the HBase book:
> http://hbase.apache.org/book/datamodel.html.
>
> -- Lars
>
>
> ----- Original Message -----
> From: Lin Ma <li...@gmail.com>
> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc:
> Sent: Sunday, August 5, 2012 8:44 PM
> Subject: Re: column based or row based storage for HBase?
>
> Hi Lars,
>
> What do you mean a set of "keys that have the same row key" and
> "colocated"? It will be appreciated if you could show an example or provide
> more information.
>
> regards,
> Lin
>
> On Mon, Aug 6, 2012 at 3:42 AM, lars hofhansl <lh...@yahoo.com> wrote:
>
> > Hi Lin,
> >
> > HBase stores key -> value mappings sorted by key. So it is a key value
> > store.
> >
> > The key has internal structure, for example it starts with a row key.
> > HBase makes extra guarantees about a set of keys that have the same row
> > key (keeps them colocated, allows atomic operations, etc).
> >
> > I tried to write this up a while back:
> > http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Lin Ma <li...@gmail.com>
> > To: user@hbase.apache.org
> > Cc:
> > Sent: Sunday, August 5, 2012 6:04 AM
> > Subject: column based or row based storage for HBase?
> >
> > Hi guys,
> >
> > I am wondering whether HBase is using column based storage or row based
> > storage?
> >
> >    - I read some technical documents and mentioned advantages of HBase is
> >    using column based storage to store similar data together to foster
> >    compression. So it means same columns of different rows are stored
> > together;
> >    - But I also learned HBase is a sorted key-value map in underlying
> >    HFile. It uses key to address all related columns for that key (row),
> > so it
> >    seems to be a row based storage?
> >
> > It is appreciated if anyone could clarify my confusions. Any related
> > documents or code for more details are welcome.
> >
> > thanks in advance,
> >
> > Lin
> >
> >
>
>

Re: column based or row based storage for HBase?

Posted by lars hofhansl <lh...@yahoo.com>.
A key in HBase looks like this: (rowkey, column family, column, timestamp)

HBase will do two things for you:
1. All keys that have the same row key are stored in the same region
2. All keys are sorted


(The column family is special in the each column family has it's one store file, but the logical sort order still holds).

Think of it this way.
Say you have two column families and two regions (A and B). You find the following ordering:
Storefile(s) for column family 1 in Region A:
(row1, column family1, column1, ts)->value
(row1, column family1, column2, ts)->value
(row2, column family1, column1, ts)->value
(row2, column family1, column2, ts)->value

Storefile(s) for column family 1 in Region B:
(row3, column family1, column1, ts)->value
(row3, column family1, column2, ts)->value

Storefile(s) for column family 2: in Region A:
(row1, column family2, column1, ts)->value
(row1, column family2, column2, ts)->value
(row2, column family2, column1, ts)->value
(row2, column family2, column2, ts)->value

Storefile(s) for column family 2 in Region B:
(row3, column family2, column1, ts)->value
(row3, column family2, column2, ts)->value

So region A has rows row1 and row2, region B has row3.
A region is shard of a table based on the row key and just 

#1 above means that HBase will never place key value for "row1" in different regions.
#2 means you very efficiently locate specific keys, as they are always stored sorted.

You should work through the topic in the HBase book: http://hbase.apache.org/book/datamodel.html.

-- Lars


----- Original Message -----
From: Lin Ma <li...@gmail.com>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Sunday, August 5, 2012 8:44 PM
Subject: Re: column based or row based storage for HBase?

Hi Lars,

What do you mean a set of "keys that have the same row key" and
"colocated"? It will be appreciated if you could show an example or provide
more information.

regards,
Lin

On Mon, Aug 6, 2012 at 3:42 AM, lars hofhansl <lh...@yahoo.com> wrote:

> Hi Lin,
>
> HBase stores key -> value mappings sorted by key. So it is a key value
> store.
>
> The key has internal structure, for example it starts with a row key.
> HBase makes extra guarantees about a set of keys that have the same row
> key (keeps them colocated, allows atomic operations, etc).
>
> I tried to write this up a while back:
> http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Lin Ma <li...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, August 5, 2012 6:04 AM
> Subject: column based or row based storage for HBase?
>
> Hi guys,
>
> I am wondering whether HBase is using column based storage or row based
> storage?
>
>    - I read some technical documents and mentioned advantages of HBase is
>    using column based storage to store similar data together to foster
>    compression. So it means same columns of different rows are stored
> together;
>    - But I also learned HBase is a sorted key-value map in underlying
>    HFile. It uses key to address all related columns for that key (row),
> so it
>    seems to be a row based storage?
>
> It is appreciated if anyone could clarify my confusions. Any related
> documents or code for more details are welcome.
>
> thanks in advance,
>
> Lin
>
>


Re: column based or row based storage for HBase?

Posted by Lin Ma <li...@gmail.com>.
Hi Lars,

What do you mean a set of "keys that have the same row key" and
"colocated"? It will be appreciated if you could show an example or provide
more information.

regards,
Lin

On Mon, Aug 6, 2012 at 3:42 AM, lars hofhansl <lh...@yahoo.com> wrote:

> Hi Lin,
>
> HBase stores key -> value mappings sorted by key. So it is a key value
> store.
>
> The key has internal structure, for example it starts with a row key.
> HBase makes extra guarantees about a set of keys that have the same row
> key (keeps them colocated, allows atomic operations, etc).
>
> I tried to write this up a while back:
> http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Lin Ma <li...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, August 5, 2012 6:04 AM
> Subject: column based or row based storage for HBase?
>
> Hi guys,
>
> I am wondering whether HBase is using column based storage or row based
> storage?
>
>    - I read some technical documents and mentioned advantages of HBase is
>    using column based storage to store similar data together to foster
>    compression. So it means same columns of different rows are stored
> together;
>    - But I also learned HBase is a sorted key-value map in underlying
>    HFile. It uses key to address all related columns for that key (row),
> so it
>    seems to be a row based storage?
>
> It is appreciated if anyone could clarify my confusions. Any related
> documents or code for more details are welcome.
>
> thanks in advance,
>
> Lin
>
>

Re: column based or row based storage for HBase?

Posted by lars hofhansl <lh...@yahoo.com>.
Hi Lin,

HBase stores key -> value mappings sorted by key. So it is a key value store.

The key has internal structure, for example it starts with a row key.
HBase makes extra guarantees about a set of keys that have the same row key (keeps them colocated, allows atomic operations, etc).

I tried to write this up a while back: http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html

-- Lars



----- Original Message -----
From: Lin Ma <li...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Sunday, August 5, 2012 6:04 AM
Subject: column based or row based storage for HBase?

Hi guys,

I am wondering whether HBase is using column based storage or row based
storage?

   - I read some technical documents and mentioned advantages of HBase is
   using column based storage to store similar data together to foster
   compression. So it means same columns of different rows are stored together;
   - But I also learned HBase is a sorted key-value map in underlying
   HFile. It uses key to address all related columns for that key (row), so it
   seems to be a row based storage?

It is appreciated if anyone could clarify my confusions. Any related
documents or code for more details are welcome.

thanks in advance,

Lin