You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by y_...@tsmc.com on 2009/07/31 07:29:34 UTC

Column-oriented data modal

Hi,
Does anyone can tell me the benefit of Column-oriented data modal?
Thank you

Fleming
宏明
 --------------------------------------------------------------------------- 
                                                         TSMC PROPERTY       
 This email communication (and any attachments) is proprietary information   
 for the sole use of its                                                     
 intended recipient. Any unauthorized review, use or distribution by anyone  
 other than the intended                                                     
 recipient is strictly prohibited.  If you are not the intended recipient,   
 please notify the sender by                                                 
 replying to this email, and then delete this email and any copies of it     
 immediately. Thank you.                                                     
 --------------------------------------------------------------------------- 




Re: Column-oriented data modal

Posted by Ryan Rawson <ry...@gmail.com>.
Read:  http://labs.google.com/papers/bigtable.html

My thoughts:
* no schema, more flexible
* can avoid joins for simple things (such as lists of phone numbers in
an entity row)
* column names can carry data, useful for doing sparse graph stuff
* and much much more

All for the low price of $19.95.

2009/7/30  <y_...@tsmc.com>:
> Hi,
> Does anyone can tell me the benefit of Column-oriented data modal?
> Thank you
>
> Fleming
> 宏明
>  ---------------------------------------------------------------------------
>                                                         TSMC PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>  ---------------------------------------------------------------------------
>
>
>
>

Re: Column-oriented data modal

Posted by Angus He <an...@gmail.com>.
On Fri, Jul 31, 2009 at 4:23 PM, Ryan Rawson<ry...@gmail.com> wrote:
> Not really, only storing 1 value per column family is a fairly
> degenerate case and not really the primary mechanism by which people
> use hbase.  The column family storage model may superficially appear
> to be like a column-store, but it can do so much more and is much more
> flexible.

Yes, I couldn't agree more, Ryan.

And that's why we choose hbase instead of other column-oriented DBMS,
it provides us much more flexibility.

But from the conceptual point of view,  hbase and Google bigtable is a
column-family oriented database system indeed and consequently they
share the benefits as described in
http://en.wikipedia.org/wiki/Column-oriented_DBMS .



> On Fri, Jul 31, 2009 at 1:20 AM, Angus He<an...@gmail.com> wrote:
>>> If you stored only 1 column per family, it would resemble a
>>> column-store, however as you stored more columns per family, they
>>> would be stored in "row order", ie: columns from the same row are
>>> stored next to each other.
>>
>> I know. And In previous post, I have mentioned "You cannot equate the
>> "column" in that article of wikipedia to the
>> "column" in HBase.
>> So we should consider the "column" in wikipedia as "column-family" in
>> HBase".
>>
>> Anyway,
>> Ryan, do you agree that hbase is a "column-family oriented db system"?
>>
>>
>>
>>
>>>
>>> On Fri, Jul 31, 2009 at 1:05 AM, Angus He<an...@gmail.com> wrote:
>>>> OK,OK,OK.
>>>>
>>>> If data is stored row-by-row in hbase, how could you explain the text
>>>> under section "Physical Storage View" in
>>>> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
>>>> Is the page stale or something else wrong?
>>>>
>>>> On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>> Data is stored row-by-row in the hbase store files (aka hfiles).
>>>>> HBase is not a column-oriented-store as described in the wikipedia
>>>>> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>
>>>>> Have a look at the bigtable paper, do some searches, lots of material
>>>>> out there describing the benefits of a flexible store like
>>>>> bigtable/hbase.
>>>>>
>>>>> -ryan
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<an...@gmail.com> wrote:
>>>>>> Hi Ryan,
>>>>>>
>>>>>> You cannot equate the "column" in that article of wikipedia to the
>>>>>> "column" in HBase.
>>>>>>
>>>>>> We should assume that the word "column" in "column-oriented" is
>>>>>> predefined, otherwise, it is meaningless.
>>>>>>
>>>>>> So we should consider the "column" in wikipedia as "column-family" in
>>>>>> HBase.  In this way, the article can answer 宏明's question.
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>>>> Hey,
>>>>>>>
>>>>>>> The bigtable paper talks more about column families, but in HBase each
>>>>>>> column family is stored in it's own file.  That means there is disk
>>>>>>> locality for different column families.  The canonical use is to put
>>>>>>> web crawl data in one family, and meta data (like derived meta data)
>>>>>>> in another.  That way scanning just the meta data is not as expensive
>>>>>>> as scanning the web page crawl dump.
>>>>>>>
>>>>>>> Column families are pre-defined - the "schema" for what it's worth -
>>>>>>> but the 'qualifier' within a family is dynamically determined by the
>>>>>>> client.
>>>>>>>
>>>>>>> In the terminology of the article, hbase would be more 'row oriented',
>>>>>>> but with the column family snag, it isnt that simple.  Since rows from
>>>>>>> different families are stored in different files, reading efficiency
>>>>>>> is related to which column families you are reading in a query.
>>>>>>>
>>>>>>> -ryan
>>>>>>>
>>>>>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>>>>>>>> Hi Ryan,
>>>>>>>>
>>>>>>>> 1. If it is not the case , what is the purpose of introduction of
>>>>>>>> "column family"?
>>>>>>>> Does the contents from different column family stored in different
>>>>>>>> files in HBase?
>>>>>>>>
>>>>>>>> BTW, in the bigtable paper, we can find the following text:
>>>>>>>> "Access control and both disk and memory accounting are performed at
>>>>>>>> the column-family level."
>>>>>>>>
>>>>>>>> 2. I was wondering if HBase shares the benefits described in the
>>>>>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>>>>>> of  "column-stores" in HBase?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>>>>>
>>>>>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>>>>>> top).
>>>>>>>>>
>>>>>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>>>>>>>> Why don't you try to google it first?
>>>>>>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>>>>>>> exactly what you want.
>>>>>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>>>>>>>> Hi,
>>>>>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>>>>>> Thank you
>>>>>>>>>>>
>>>>>>>>>>> Fleming
>>>>>>>>>>> 宏明
>>>>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>>>>                                                         TSMC PROPERTY
>>>>>>>>>>>  This email communication (and any attachments) is proprietary information
>>>>>>>>>>>  for the sole use of its
>>>>>>>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>>>>>>>  other than the intended
>>>>>>>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>>>>>>>  please notify the sender by
>>>>>>>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>>>>>>>  immediately. Thank you.
>>>>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards
>>>>>>>>>> Angus
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards
>>>>>>>> Angus
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Angus
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Angus
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Angus
>>
>



-- 
Regards
Angus

Re: Column-oriented data modal

Posted by Ryan Rawson <ry...@gmail.com>.
Not really, only storing 1 value per column family is a fairly
degenerate case and not really the primary mechanism by which people
use hbase.  The column family storage model may superficially appear
to be like a column-store, but it can do so much more and is much more
flexible.



On Fri, Jul 31, 2009 at 1:20 AM, Angus He<an...@gmail.com> wrote:
>> If you stored only 1 column per family, it would resemble a
>> column-store, however as you stored more columns per family, they
>> would be stored in "row order", ie: columns from the same row are
>> stored next to each other.
>
> I know. And In previous post, I have mentioned "You cannot equate the
> "column" in that article of wikipedia to the
> "column" in HBase.
> So we should consider the "column" in wikipedia as "column-family" in
> HBase".
>
> Anyway,
> Ryan, do you agree that hbase is a "column-family oriented db system"?
>
>
>
>
>>
>> On Fri, Jul 31, 2009 at 1:05 AM, Angus He<an...@gmail.com> wrote:
>>> OK,OK,OK.
>>>
>>> If data is stored row-by-row in hbase, how could you explain the text
>>> under section "Physical Storage View" in
>>> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
>>> Is the page stale or something else wrong?
>>>
>>> On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>> Data is stored row-by-row in the hbase store files (aka hfiles).
>>>> HBase is not a column-oriented-store as described in the wikipedia
>>>> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>
>>>> Have a look at the bigtable paper, do some searches, lots of material
>>>> out there describing the benefits of a flexible store like
>>>> bigtable/hbase.
>>>>
>>>> -ryan
>>>>
>>>>
>>>>
>>>> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<an...@gmail.com> wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> You cannot equate the "column" in that article of wikipedia to the
>>>>> "column" in HBase.
>>>>>
>>>>> We should assume that the word "column" in "column-oriented" is
>>>>> predefined, otherwise, it is meaningless.
>>>>>
>>>>> So we should consider the "column" in wikipedia as "column-family" in
>>>>> HBase.  In this way, the article can answer 宏明's question.
>>>>>
>>>>>
>>>>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>>> Hey,
>>>>>>
>>>>>> The bigtable paper talks more about column families, but in HBase each
>>>>>> column family is stored in it's own file.  That means there is disk
>>>>>> locality for different column families.  The canonical use is to put
>>>>>> web crawl data in one family, and meta data (like derived meta data)
>>>>>> in another.  That way scanning just the meta data is not as expensive
>>>>>> as scanning the web page crawl dump.
>>>>>>
>>>>>> Column families are pre-defined - the "schema" for what it's worth -
>>>>>> but the 'qualifier' within a family is dynamically determined by the
>>>>>> client.
>>>>>>
>>>>>> In the terminology of the article, hbase would be more 'row oriented',
>>>>>> but with the column family snag, it isnt that simple.  Since rows from
>>>>>> different families are stored in different files, reading efficiency
>>>>>> is related to which column families you are reading in a query.
>>>>>>
>>>>>> -ryan
>>>>>>
>>>>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>>>>>>> Hi Ryan,
>>>>>>>
>>>>>>> 1. If it is not the case , what is the purpose of introduction of
>>>>>>> "column family"?
>>>>>>> Does the contents from different column family stored in different
>>>>>>> files in HBase?
>>>>>>>
>>>>>>> BTW, in the bigtable paper, we can find the following text:
>>>>>>> "Access control and both disk and memory accounting are performed at
>>>>>>> the column-family level."
>>>>>>>
>>>>>>> 2. I was wondering if HBase shares the benefits described in the
>>>>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>>>>> of  "column-stores" in HBase?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>>>>
>>>>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>>>>> top).
>>>>>>>>
>>>>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>>>>>>
>>>>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>>>>>>> Why don't you try to google it first?
>>>>>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>>>>>> exactly what you want.
>>>>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>>>>>>> Hi,
>>>>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>>>>> Thank you
>>>>>>>>>>
>>>>>>>>>> Fleming
>>>>>>>>>> 宏明
>>>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>>>                                                         TSMC PROPERTY
>>>>>>>>>>  This email communication (and any attachments) is proprietary information
>>>>>>>>>>  for the sole use of its
>>>>>>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>>>>>>  other than the intended
>>>>>>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>>>>>>  please notify the sender by
>>>>>>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>>>>>>  immediately. Thank you.
>>>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards
>>>>>>>>> Angus
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>> Angus
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Angus
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>

Re: Column-oriented data modal

Posted by Angus He <an...@gmail.com>.
> If you stored only 1 column per family, it would resemble a
> column-store, however as you stored more columns per family, they
> would be stored in "row order", ie: columns from the same row are
> stored next to each other.

I know. And In previous post, I have mentioned "You cannot equate the
"column" in that article of wikipedia to the
"column" in HBase.
So we should consider the "column" in wikipedia as "column-family" in
HBase".

Anyway,
Ryan, do you agree that hbase is a "column-family oriented db system"?




>
> On Fri, Jul 31, 2009 at 1:05 AM, Angus He<an...@gmail.com> wrote:
>> OK,OK,OK.
>>
>> If data is stored row-by-row in hbase, how could you explain the text
>> under section "Physical Storage View" in
>> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
>> Is the page stale or something else wrong?
>>
>> On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>> Data is stored row-by-row in the hbase store files (aka hfiles).
>>> HBase is not a column-oriented-store as described in the wikipedia
>>> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>
>>> Have a look at the bigtable paper, do some searches, lots of material
>>> out there describing the benefits of a flexible store like
>>> bigtable/hbase.
>>>
>>> -ryan
>>>
>>>
>>>
>>> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<an...@gmail.com> wrote:
>>>> Hi Ryan,
>>>>
>>>> You cannot equate the "column" in that article of wikipedia to the
>>>> "column" in HBase.
>>>>
>>>> We should assume that the word "column" in "column-oriented" is
>>>> predefined, otherwise, it is meaningless.
>>>>
>>>> So we should consider the "column" in wikipedia as "column-family" in
>>>> HBase.  In this way, the article can answer 宏明's question.
>>>>
>>>>
>>>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>> Hey,
>>>>>
>>>>> The bigtable paper talks more about column families, but in HBase each
>>>>> column family is stored in it's own file.  That means there is disk
>>>>> locality for different column families.  The canonical use is to put
>>>>> web crawl data in one family, and meta data (like derived meta data)
>>>>> in another.  That way scanning just the meta data is not as expensive
>>>>> as scanning the web page crawl dump.
>>>>>
>>>>> Column families are pre-defined - the "schema" for what it's worth -
>>>>> but the 'qualifier' within a family is dynamically determined by the
>>>>> client.
>>>>>
>>>>> In the terminology of the article, hbase would be more 'row oriented',
>>>>> but with the column family snag, it isnt that simple.  Since rows from
>>>>> different families are stored in different files, reading efficiency
>>>>> is related to which column families you are reading in a query.
>>>>>
>>>>> -ryan
>>>>>
>>>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>>>>>> Hi Ryan,
>>>>>>
>>>>>> 1. If it is not the case , what is the purpose of introduction of
>>>>>> "column family"?
>>>>>> Does the contents from different column family stored in different
>>>>>> files in HBase?
>>>>>>
>>>>>> BTW, in the bigtable paper, we can find the following text:
>>>>>> "Access control and both disk and memory accounting are performed at
>>>>>> the column-family level."
>>>>>>
>>>>>> 2. I was wondering if HBase shares the benefits described in the
>>>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>>>> of  "column-stores" in HBase?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>>>
>>>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>>>> top).
>>>>>>>
>>>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>>>>>
>>>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>>>>>> Why don't you try to google it first?
>>>>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>>>>> exactly what you want.
>>>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>>>>>> Hi,
>>>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>>>> Thank you
>>>>>>>>>
>>>>>>>>> Fleming
>>>>>>>>> 宏明
>>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>>                                                         TSMC PROPERTY
>>>>>>>>>  This email communication (and any attachments) is proprietary information
>>>>>>>>>  for the sole use of its
>>>>>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>>>>>  other than the intended
>>>>>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>>>>>  please notify the sender by
>>>>>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>>>>>  immediately. Thank you.
>>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards
>>>>>>>> Angus
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Angus
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Angus
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Angus
>>
>



-- 
Regards
Angus

Re: Column-oriented data modal

Posted by Ryan Rawson <ry...@gmail.com>.
The diagram is mostly correct, the thing it doesn't show it the other
rows that bracket the row shown there.  Each different column family
get stored into a different file, but within each file, things are
stored in row order.

If you stored only 1 column per family, it would resemble a
column-store, however as you stored more columns per family, they
would be stored in "row order", ie: columns from the same row are
stored next to each other.

-ryan

On Fri, Jul 31, 2009 at 1:05 AM, Angus He<an...@gmail.com> wrote:
> OK,OK,OK.
>
> If data is stored row-by-row in hbase, how could you explain the text
> under section "Physical Storage View" in
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
> Is the page stale or something else wrong?
>
> On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<ry...@gmail.com> wrote:
>> Data is stored row-by-row in the hbase store files (aka hfiles).
>> HBase is not a column-oriented-store as described in the wikipedia
>> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>
>> Have a look at the bigtable paper, do some searches, lots of material
>> out there describing the benefits of a flexible store like
>> bigtable/hbase.
>>
>> -ryan
>>
>>
>>
>> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<an...@gmail.com> wrote:
>>> Hi Ryan,
>>>
>>> You cannot equate the "column" in that article of wikipedia to the
>>> "column" in HBase.
>>>
>>> We should assume that the word "column" in "column-oriented" is
>>> predefined, otherwise, it is meaningless.
>>>
>>> So we should consider the "column" in wikipedia as "column-family" in
>>> HBase.  In this way, the article can answer 宏明's question.
>>>
>>>
>>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>> Hey,
>>>>
>>>> The bigtable paper talks more about column families, but in HBase each
>>>> column family is stored in it's own file.  That means there is disk
>>>> locality for different column families.  The canonical use is to put
>>>> web crawl data in one family, and meta data (like derived meta data)
>>>> in another.  That way scanning just the meta data is not as expensive
>>>> as scanning the web page crawl dump.
>>>>
>>>> Column families are pre-defined - the "schema" for what it's worth -
>>>> but the 'qualifier' within a family is dynamically determined by the
>>>> client.
>>>>
>>>> In the terminology of the article, hbase would be more 'row oriented',
>>>> but with the column family snag, it isnt that simple.  Since rows from
>>>> different families are stored in different files, reading efficiency
>>>> is related to which column families you are reading in a query.
>>>>
>>>> -ryan
>>>>
>>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> 1. If it is not the case , what is the purpose of introduction of
>>>>> "column family"?
>>>>> Does the contents from different column family stored in different
>>>>> files in HBase?
>>>>>
>>>>> BTW, in the bigtable paper, we can find the following text:
>>>>> "Access control and both disk and memory accounting are performed at
>>>>> the column-family level."
>>>>>
>>>>> 2. I was wondering if HBase shares the benefits described in the
>>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>>> of  "column-stores" in HBase?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>>
>>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>>> top).
>>>>>>
>>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>>>>
>>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>>>>> Why don't you try to google it first?
>>>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>>>> exactly what you want.
>>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>>>>> Hi,
>>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Fleming
>>>>>>>> 宏明
>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>                                                         TSMC PROPERTY
>>>>>>>>  This email communication (and any attachments) is proprietary information
>>>>>>>>  for the sole use of its
>>>>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>>>>  other than the intended
>>>>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>>>>  please notify the sender by
>>>>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>>>>  immediately. Thank you.
>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>> Angus
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Angus
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>

Re: Column-oriented data modal

Posted by tim robertson <ti...@gmail.com>.
That link is a pictorial view of what is represented in the HFile.
My limited understanding is what is actually written in the HFile in
terms of bytes is on a row by row basis, but you are not going to need
to get into HFiles.

Cheers,
Tim


On Fri, Jul 31, 2009 at 10:05 AM, Angus He<an...@gmail.com> wrote:
> OK,OK,OK.
>
> If data is stored row-by-row in hbase, how could you explain the text
> under section "Physical Storage View" in
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
> Is the page stale or something else wrong?
>
> On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<ry...@gmail.com> wrote:
>> Data is stored row-by-row in the hbase store files (aka hfiles).
>> HBase is not a column-oriented-store as described in the wikipedia
>> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>
>> Have a look at the bigtable paper, do some searches, lots of material
>> out there describing the benefits of a flexible store like
>> bigtable/hbase.
>>
>> -ryan
>>
>>
>>
>> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<an...@gmail.com> wrote:
>>> Hi Ryan,
>>>
>>> You cannot equate the "column" in that article of wikipedia to the
>>> "column" in HBase.
>>>
>>> We should assume that the word "column" in "column-oriented" is
>>> predefined, otherwise, it is meaningless.
>>>
>>> So we should consider the "column" in wikipedia as "column-family" in
>>> HBase.  In this way, the article can answer 宏明's question.
>>>
>>>
>>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>> Hey,
>>>>
>>>> The bigtable paper talks more about column families, but in HBase each
>>>> column family is stored in it's own file.  That means there is disk
>>>> locality for different column families.  The canonical use is to put
>>>> web crawl data in one family, and meta data (like derived meta data)
>>>> in another.  That way scanning just the meta data is not as expensive
>>>> as scanning the web page crawl dump.
>>>>
>>>> Column families are pre-defined - the "schema" for what it's worth -
>>>> but the 'qualifier' within a family is dynamically determined by the
>>>> client.
>>>>
>>>> In the terminology of the article, hbase would be more 'row oriented',
>>>> but with the column family snag, it isnt that simple.  Since rows from
>>>> different families are stored in different files, reading efficiency
>>>> is related to which column families you are reading in a query.
>>>>
>>>> -ryan
>>>>
>>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> 1. If it is not the case , what is the purpose of introduction of
>>>>> "column family"?
>>>>> Does the contents from different column family stored in different
>>>>> files in HBase?
>>>>>
>>>>> BTW, in the bigtable paper, we can find the following text:
>>>>> "Access control and both disk and memory accounting are performed at
>>>>> the column-family level."
>>>>>
>>>>> 2. I was wondering if HBase shares the benefits described in the
>>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>>> of  "column-stores" in HBase?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>>
>>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>>> top).
>>>>>>
>>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>>>>
>>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>>>>> Why don't you try to google it first?
>>>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>>>> exactly what you want.
>>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>>>>> Hi,
>>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Fleming
>>>>>>>> 宏明
>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>                                                         TSMC PROPERTY
>>>>>>>>  This email communication (and any attachments) is proprietary information
>>>>>>>>  for the sole use of its
>>>>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>>>>  other than the intended
>>>>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>>>>  please notify the sender by
>>>>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>>>>  immediately. Thank you.
>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>> Angus
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Angus
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>

Re: Column-oriented data modal

Posted by Angus He <an...@gmail.com>.
OK,OK,OK.

If data is stored row-by-row in hbase, how could you explain the text
under section "Physical Storage View" in
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
Is the page stale or something else wrong?

On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<ry...@gmail.com> wrote:
> Data is stored row-by-row in the hbase store files (aka hfiles).
> HBase is not a column-oriented-store as described in the wikipedia
> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>
> Have a look at the bigtable paper, do some searches, lots of material
> out there describing the benefits of a flexible store like
> bigtable/hbase.
>
> -ryan
>
>
>
> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<an...@gmail.com> wrote:
>> Hi Ryan,
>>
>> You cannot equate the "column" in that article of wikipedia to the
>> "column" in HBase.
>>
>> We should assume that the word "column" in "column-oriented" is
>> predefined, otherwise, it is meaningless.
>>
>> So we should consider the "column" in wikipedia as "column-family" in
>> HBase.  In this way, the article can answer 宏明's question.
>>
>>
>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>> Hey,
>>>
>>> The bigtable paper talks more about column families, but in HBase each
>>> column family is stored in it's own file.  That means there is disk
>>> locality for different column families.  The canonical use is to put
>>> web crawl data in one family, and meta data (like derived meta data)
>>> in another.  That way scanning just the meta data is not as expensive
>>> as scanning the web page crawl dump.
>>>
>>> Column families are pre-defined - the "schema" for what it's worth -
>>> but the 'qualifier' within a family is dynamically determined by the
>>> client.
>>>
>>> In the terminology of the article, hbase would be more 'row oriented',
>>> but with the column family snag, it isnt that simple.  Since rows from
>>> different families are stored in different files, reading efficiency
>>> is related to which column families you are reading in a query.
>>>
>>> -ryan
>>>
>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>>>> Hi Ryan,
>>>>
>>>> 1. If it is not the case , what is the purpose of introduction of
>>>> "column family"?
>>>> Does the contents from different column family stored in different
>>>> files in HBase?
>>>>
>>>> BTW, in the bigtable paper, we can find the following text:
>>>> "Access control and both disk and memory accounting are performed at
>>>> the column-family level."
>>>>
>>>> 2. I was wondering if HBase shares the benefits described in the
>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>> of  "column-stores" in HBase?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>
>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>> top).
>>>>>
>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>>>
>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>>>> Why don't you try to google it first?
>>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>>> exactly what you want.
>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>>>> Hi,
>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>> Thank you
>>>>>>>
>>>>>>> Fleming
>>>>>>> 宏明
>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>                                                         TSMC PROPERTY
>>>>>>>  This email communication (and any attachments) is proprietary information
>>>>>>>  for the sole use of its
>>>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>>>  other than the intended
>>>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>>>  please notify the sender by
>>>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>>>  immediately. Thank you.
>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Angus
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Angus
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Angus
>>
>



-- 
Regards
Angus

Re: Column-oriented data modal

Posted by Ryan Rawson <ry...@gmail.com>.
Data is stored row-by-row in the hbase store files (aka hfiles).
HBase is not a column-oriented-store as described in the wikipedia
article: http://en.wikipedia.org/wiki/Column-oriented_DBMS

Have a look at the bigtable paper, do some searches, lots of material
out there describing the benefits of a flexible store like
bigtable/hbase.

-ryan



On Fri, Jul 31, 2009 at 12:42 AM, Angus He<an...@gmail.com> wrote:
> Hi Ryan,
>
> You cannot equate the "column" in that article of wikipedia to the
> "column" in HBase.
>
> We should assume that the word "column" in "column-oriented" is
> predefined, otherwise, it is meaningless.
>
> So we should consider the "column" in wikipedia as "column-family" in
> HBase.  In this way, the article can answer 宏明's question.
>
>
> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
>> Hey,
>>
>> The bigtable paper talks more about column families, but in HBase each
>> column family is stored in it's own file.  That means there is disk
>> locality for different column families.  The canonical use is to put
>> web crawl data in one family, and meta data (like derived meta data)
>> in another.  That way scanning just the meta data is not as expensive
>> as scanning the web page crawl dump.
>>
>> Column families are pre-defined - the "schema" for what it's worth -
>> but the 'qualifier' within a family is dynamically determined by the
>> client.
>>
>> In the terminology of the article, hbase would be more 'row oriented',
>> but with the column family snag, it isnt that simple.  Since rows from
>> different families are stored in different files, reading efficiency
>> is related to which column families you are reading in a query.
>>
>> -ryan
>>
>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>>> Hi Ryan,
>>>
>>> 1. If it is not the case , what is the purpose of introduction of
>>> "column family"?
>>> Does the contents from different column family stored in different
>>> files in HBase?
>>>
>>> BTW, in the bigtable paper, we can find the following text:
>>> "Access control and both disk and memory accounting are performed at
>>> the column-family level."
>>>
>>> 2. I was wondering if HBase shares the benefits described in the
>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>> of  "column-stores" in HBase?
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>> 'column oriented dbms' as described in the wikipedia.
>>>>
>>>> At the storage level, hbase stores key-values, where the key is a
>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>> key/values, and they are sorted in that order, hence rows are stored
>>>> together, then sorted by column then reverse by timestamp (newest on
>>>> top).
>>>>
>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>>
>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>>> Why don't you try to google it first?
>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>> exactly what you want.
>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>
>>>>>
>>>>>
>>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>>> Hi,
>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>> Thank you
>>>>>>
>>>>>> Fleming
>>>>>> 宏明
>>>>>>  ---------------------------------------------------------------------------
>>>>>>                                                         TSMC PROPERTY
>>>>>>  This email communication (and any attachments) is proprietary information
>>>>>>  for the sole use of its
>>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>>  other than the intended
>>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>>  please notify the sender by
>>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>>  immediately. Thank you.
>>>>>>  ---------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Angus
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>

Re: Column-oriented data modal

Posted by Angus He <an...@gmail.com>.
Hi Ryan,

You cannot equate the "column" in that article of wikipedia to the
"column" in HBase.

We should assume that the word "column" in "column-oriented" is
predefined, otherwise, it is meaningless.

So we should consider the "column" in wikipedia as "column-family" in
HBase.  In this way, the article can answer 宏明's question.


On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ry...@gmail.com> wrote:
> Hey,
>
> The bigtable paper talks more about column families, but in HBase each
> column family is stored in it's own file.  That means there is disk
> locality for different column families.  The canonical use is to put
> web crawl data in one family, and meta data (like derived meta data)
> in another.  That way scanning just the meta data is not as expensive
> as scanning the web page crawl dump.
>
> Column families are pre-defined - the "schema" for what it's worth -
> but the 'qualifier' within a family is dynamically determined by the
> client.
>
> In the terminology of the article, hbase would be more 'row oriented',
> but with the column family snag, it isnt that simple.  Since rows from
> different families are stored in different files, reading efficiency
> is related to which column families you are reading in a query.
>
> -ryan
>
> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
>> Hi Ryan,
>>
>> 1. If it is not the case , what is the purpose of introduction of
>> "column family"?
>> Does the contents from different column family stored in different
>> files in HBase?
>>
>> BTW, in the bigtable paper, we can find the following text:
>> "Access control and both disk and memory accounting are performed at
>> the column-family level."
>>
>> 2. I was wondering if HBase shares the benefits described in the
>> "Benefits" sections of wikipedia article. If not, what is the meaning
>> of  "column-stores" in HBase?
>>
>>
>>
>>
>>
>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>>> HBase and bigtable are referred to column-stores, but we arent a
>>> 'column oriented dbms' as described in the wikipedia.
>>>
>>> At the storage level, hbase stores key-values, where the key is a
>>> triple of row / column / timestamp.  Files are ordered lists of these
>>> key/values, and they are sorted in that order, hence rows are stored
>>> together, then sorted by column then reverse by timestamp (newest on
>>> top).
>>>
>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>>
>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>>> Why don't you try to google it first?
>>>> After googling with the keyword "Column-oriented", the first result is
>>>> exactly what you want.
>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>
>>>>
>>>>
>>>> 2009/7/31  <y_...@tsmc.com>:
>>>>> Hi,
>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>> Thank you
>>>>>
>>>>> Fleming
>>>>> 宏明
>>>>>  ---------------------------------------------------------------------------
>>>>>                                                         TSMC PROPERTY
>>>>>  This email communication (and any attachments) is proprietary information
>>>>>  for the sole use of its
>>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>>  other than the intended
>>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>>  please notify the sender by
>>>>>  replying to this email, and then delete this email and any copies of it
>>>>>  immediately. Thank you.
>>>>>  ---------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Angus
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Angus
>>
>



-- 
Regards
Angus

Re: Column-oriented data modal

Posted by Andrew Purtell <ap...@apache.org>.
"The canonical use is to put
web crawl data in one family, and meta data (like derived meta data)
in another.  That way scanning just the meta data is not as expensive
as scanning the web page crawl dump."

I have this very same canonical use case. HBase provides very clear
benefits here. One can have a very deep archival store of web content --
and with multiversioning and timestamps, the ability to reconstitute
snapshots of change over time -- and yet run very efficient analytics
over derived metadata. The metadata and archival data are colocated in
the sense they are both in the same table, but by separating them as
different column families access to one is I/O independent of access to 
the other. So I might get a few K/ops/sec/node scanning over content,
but can get > 100K/ops/sec/node scanning over metadata only. 

   - Andy





________________________________
From: Ryan Rawson <ry...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Friday, July 31, 2009 12:18:31 AM
Subject: Re: Column-oriented data modal

Hey,

The bigtable paper talks more about column families, but in HBase each
column family is stored in it's own file.  That means there is disk
locality for different column families.  The canonical use is to put
web crawl data in one family, and meta data (like derived meta data)
in another.  That way scanning just the meta data is not as expensive
as scanning the web page crawl dump.

Column families are pre-defined - the "schema" for what it's worth -
but the 'qualifier' within a family is dynamically determined by the
client.

In the terminology of the article, hbase would be more 'row oriented',
but with the column family snag, it isnt that simple.  Since rows from
different families are stored in different files, reading efficiency
is related to which column families you are reading in a query.

-ryan

On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
> Hi Ryan,
>
> 1. If it is not the case , what is the purpose of introduction of
> "column family"?
> Does the contents from different column family stored in different
> files in HBase?
>
> BTW, in the bigtable paper, we can find the following text:
> "Access control and both disk and memory accounting are performed at
> the column-family level."
>
> 2. I was wondering if HBase shares the benefits described in the
> "Benefits" sections of wikipedia article. If not, what is the meaning
> of  "column-stores" in HBase?
>
>
>
>
>
> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>> HBase and bigtable are referred to column-stores, but we arent a
>> 'column oriented dbms' as described in the wikipedia.
>>
>> At the storage level, hbase stores key-values, where the key is a
>> triple of row / column / timestamp.  Files are ordered lists of these
>> key/values, and they are sorted in that order, hence rows are stored
>> together, then sorted by column then reverse by timestamp (newest on
>> top).
>>
>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>
>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>> Why don't you try to google it first?
>>> After googling with the keyword "Column-oriented", the first result is
>>> exactly what you want.
>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>
>>>
>>>
>>> 2009/7/31  <y_...@tsmc.com>:
>>>> Hi,
>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>> Thank you
>>>>
>>>> Fleming
>>>> 宏明
>>>>  ---------------------------------------------------------------------------
>>>>                                                         TSMC PROPERTY
>>>>  This email communication (and any attachments) is proprietary information
>>>>  for the sole use of its
>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>  other than the intended
>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>  please notify the sender by
>>>>  replying to this email, and then delete this email and any copies of it
>>>>  immediately. Thank you.
>>>>  ---------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>



      

Re: Column-oriented data modal

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

The bigtable paper talks more about column families, but in HBase each
column family is stored in it's own file.  That means there is disk
locality for different column families.  The canonical use is to put
web crawl data in one family, and meta data (like derived meta data)
in another.  That way scanning just the meta data is not as expensive
as scanning the web page crawl dump.

Column families are pre-defined - the "schema" for what it's worth -
but the 'qualifier' within a family is dynamically determined by the
client.

In the terminology of the article, hbase would be more 'row oriented',
but with the column family snag, it isnt that simple.  Since rows from
different families are stored in different files, reading efficiency
is related to which column families you are reading in a query.

-ryan

On Fri, Jul 31, 2009 at 12:02 AM, Angus He<an...@gmail.com> wrote:
> Hi Ryan,
>
> 1. If it is not the case , what is the purpose of introduction of
> "column family"?
> Does the contents from different column family stored in different
> files in HBase?
>
> BTW, in the bigtable paper, we can find the following text:
> "Access control and both disk and memory accounting are performed at
> the column-family level."
>
> 2. I was wondering if HBase shares the benefits described in the
> "Benefits" sections of wikipedia article. If not, what is the meaning
> of  "column-stores" in HBase?
>
>
>
>
>
> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
>> HBase and bigtable are referred to column-stores, but we arent a
>> 'column oriented dbms' as described in the wikipedia.
>>
>> At the storage level, hbase stores key-values, where the key is a
>> triple of row / column / timestamp.  Files are ordered lists of these
>> key/values, and they are sorted in that order, hence rows are stored
>> together, then sorted by column then reverse by timestamp (newest on
>> top).
>>
>> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>>
>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>>> Why don't you try to google it first?
>>> After googling with the keyword "Column-oriented", the first result is
>>> exactly what you want.
>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>
>>>
>>>
>>> 2009/7/31  <y_...@tsmc.com>:
>>>> Hi,
>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>> Thank you
>>>>
>>>> Fleming
>>>> 宏明
>>>>  ---------------------------------------------------------------------------
>>>>                                                         TSMC PROPERTY
>>>>  This email communication (and any attachments) is proprietary information
>>>>  for the sole use of its
>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>  other than the intended
>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>  please notify the sender by
>>>>  replying to this email, and then delete this email and any copies of it
>>>>  immediately. Thank you.
>>>>  ---------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>

Re: Column-oriented data modal

Posted by Angus He <an...@gmail.com>.
Hi Ryan,

1. If it is not the case , what is the purpose of introduction of
"column family"?
Does the contents from different column family stored in different
files in HBase?

BTW, in the bigtable paper, we can find the following text:
"Access control and both disk and memory accounting are performed at
the column-family level."

2. I was wondering if HBase shares the benefits described in the
"Benefits" sections of wikipedia article. If not, what is the meaning
of  "column-stores" in HBase?





On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ry...@gmail.com> wrote:
> HBase and bigtable are referred to column-stores, but we arent a
> 'column oriented dbms' as described in the wikipedia.
>
> At the storage level, hbase stores key-values, where the key is a
> triple of row / column / timestamp.  Files are ordered lists of these
> key/values, and they are sorted in that order, hence rows are stored
> together, then sorted by column then reverse by timestamp (newest on
> top).
>
> Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.
>
> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
>> Why don't you try to google it first?
>> After googling with the keyword "Column-oriented", the first result is
>> exactly what you want.
>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>
>>
>>
>> 2009/7/31  <y_...@tsmc.com>:
>>> Hi,
>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>> Thank you
>>>
>>> Fleming
>>> 宏明
>>>  ---------------------------------------------------------------------------
>>>                                                         TSMC PROPERTY
>>>  This email communication (and any attachments) is proprietary information
>>>  for the sole use of its
>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>  other than the intended
>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>  please notify the sender by
>>>  replying to this email, and then delete this email and any copies of it
>>>  immediately. Thank you.
>>>  ---------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Angus
>>
>



-- 
Regards
Angus

Re: Column-oriented data modal

Posted by Ryan Rawson <ry...@gmail.com>.
HBase and bigtable are referred to column-stores, but we arent a
'column oriented dbms' as described in the wikipedia.

At the storage level, hbase stores key-values, where the key is a
triple of row / column / timestamp.  Files are ordered lists of these
key/values, and they are sorted in that order, hence rows are stored
together, then sorted by column then reverse by timestamp (newest on
top).

Thus hbase is not a 'column store' in the sense listed in the wikipedia entry.

On Thu, Jul 30, 2009 at 11:23 PM, Angus He<an...@gmail.com> wrote:
> Why don't you try to google it first?
> After googling with the keyword "Column-oriented", the first result is
> exactly what you want.
> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>
>
>
> 2009/7/31  <y_...@tsmc.com>:
>> Hi,
>> Does anyone can tell me the benefit of Column-oriented data modal?
>> Thank you
>>
>> Fleming
>> 宏明
>>  ---------------------------------------------------------------------------
>>                                                         TSMC PROPERTY
>>  This email communication (and any attachments) is proprietary information
>>  for the sole use of its
>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>  other than the intended
>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>  please notify the sender by
>>  replying to this email, and then delete this email and any copies of it
>>  immediately. Thank you.
>>  ---------------------------------------------------------------------------
>>
>>
>>
>>
>
>
>
> --
> Regards
> Angus
>

Re: Column-oriented data modal

Posted by Angus He <an...@gmail.com>.
Why don't you try to google it first?
After googling with the keyword "Column-oriented", the first result is
exactly what you want.
http://en.wikipedia.org/wiki/Column-oriented_DBMS



2009/7/31  <y_...@tsmc.com>:
> Hi,
> Does anyone can tell me the benefit of Column-oriented data modal?
> Thank you
>
> Fleming
> 宏明
>  ---------------------------------------------------------------------------
>                                                         TSMC PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>  ---------------------------------------------------------------------------
>
>
>
>



-- 
Regards
Angus