You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by JohnJohnGa <Jo...@gmail.com> on 2011/04/23 09:56:40 UTC

HBase - Column family

Hi, I'm a beginner in HBase. I need to design my table. I want to play with the 
following information:

At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight of 
each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D

I created a table with, row: word, column:date, value:doc But I can't store 
multiple row with the same date, for the same word.

Can we create multiple column families for a table? What can be the best way to 
design the schema?

Thanks a lot

Re: HBase - Column family

Posted by Bernd Fondermann <be...@googlemail.com>.

2011/4/23 Panayotis Antonopoulos <an...@hotmail.com>:
>
> I am also a beginner, so I would like to ask you something about the method you proposed.
> HBase is column-oriented. This means (as far as I know from databases) that it stores its data column by column and not row by row.

Fortunately, this is an oversimplification. HBase has data efficiently
accessible by row. Strictly speaking, it is not even a column-oriented
database. It's a column-family-oriented database. From the docs:
"Physically they are stored on a per-column family basis."

> If we use the schema you suggested then when we want some of the documents for a single word we will have to access many columns and I think this will cost as a lot.

No, it is very efficient, even more so if you access columns from a
single column family only.
AFAIK, there is no way to access HBase by-column only, without being
in the context of a dedicated row.

> I think that the locality of the data is lost using this schema.

No, I don't think so.

> I repeat that I am a beginner so please correct me if I am wrong.

This presentation might help:
http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies

  Bernd

>
> Regards,
> Panagiotis.
>
>> Date: Sat, 23 Apr 2011 11:25:47 +0200
>> Subject: Re: HBase - Column family
>> From: bernd.fondermann@googlemail.com
>> To: user@hbase.apache.org
>>
>> That's how I would do it:
>> What's nice in HBase is that you can store all the data for one of
>> your keywords in a single row.
>> Create a column family "doc_id".
>> Now, for each word, you create one row.
>> In this row, for each matching document you create one column (that's
>> the gotcha compared to a RDB design).
>> The name of the column is the doc id. The column's cell content is the weight.
>>
>> So, following your example you'd get:
>>
>> row id | column-family:column....
>> HELLO |  doc_id:2 | doc_id:3 | doc_id:4
>>
>> and column values:
>> doc_id:2 | doc_id:3 | doc_id:4
>> 12 | 45 | 36
>>
>> HTH,
>>
>>   Bernd
>>
>>
>> On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <Jo...@gmail.com> wrote:
>> > Hi, I'm a beginner in HBase. I need to design my table. I want to play with the
>> > following information:
>> >
>> > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight of
>> > each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D
>> >
>> > I created a table with, row: word, column:date, value:doc But I can't store
>> > multiple row with the same date, for the same word.
>> >
>> > Can we create multiple column families for a table? What can be the best way to
>> > design the schema?
>> >
>> > Thanks a lot
>> >
>> >
>

Re: HBase - Column family

Posted by Suraj Varma <sv...@gmail.com>.

If you only want some of the columns, you could return a subset by
using server side Filters.

Your schema can be designed in multiple ways - it all depends on what
your access patterns are.
Here's a good thread on various schema design alternatives for
one-to-many relationships. There are many other such threads that you
can search the mailing lists for.
http://search-hadoop.com/m/Yj4TE1g3ZX51

--Suraj

2011/4/23 Panayotis Antonopoulos <an...@hotmail.com>:
>
> I am also a beginner, so I would like to ask you something about the method you proposed.
> HBase is column-oriented. This means (as far as I know from databases) that it stores its data column by column and not row by row.
> If we use the schema you suggested then when we want some of the documents for a single word we will have to access many columns and I think this will cost as a lot.
> I think that the locality of the data is lost using this schema.
>
> I repeat that I am a beginner so please correct me if I am wrong.
>
> Regards,
> Panagiotis.
>
>> Date: Sat, 23 Apr 2011 11:25:47 +0200
>> Subject: Re: HBase - Column family
>> From: bernd.fondermann@googlemail.com
>> To: user@hbase.apache.org
>>
>> That's how I would do it:
>> What's nice in HBase is that you can store all the data for one of
>> your keywords in a single row.
>> Create a column family "doc_id".
>> Now, for each word, you create one row.
>> In this row, for each matching document you create one column (that's
>> the gotcha compared to a RDB design).
>> The name of the column is the doc id. The column's cell content is the weight.
>>
>> So, following your example you'd get:
>>
>> row id | column-family:column....
>> HELLO |  doc_id:2 | doc_id:3 | doc_id:4
>>
>> and column values:
>> doc_id:2 | doc_id:3 | doc_id:4
>> 12 | 45 | 36
>>
>> HTH,
>>
>>   Bernd
>>
>>
>> On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <Jo...@gmail.com> wrote:
>> > Hi, I'm a beginner in HBase. I need to design my table. I want to play with the
>> > following information:
>> >
>> > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight of
>> > each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D
>> >
>> > I created a table with, row: word, column:date, value:doc But I can't store
>> > multiple row with the same date, for the same word.
>> >
>> > Can we create multiple column families for a table? What can be the best way to
>> > design the schema?
>> >
>> > Thanks a lot
>> >
>> >
>

RE: HBase - Column family

Posted by Panayotis Antonopoulos <an...@hotmail.com>.

I am also a beginner, so I would like to ask you something about the method you proposed.
HBase is column-oriented. This means (as far as I know from databases) that it stores its data column by column and not row by row.
If we use the schema you suggested then when we want some of the documents for a single word we will have to access many columns and I think this will cost as a lot. 
I think that the locality of the data is lost using this schema.

I repeat that I am a beginner so please correct me if I am wrong.

Regards,
Panagiotis.

> Date: Sat, 23 Apr 2011 11:25:47 +0200
> Subject: Re: HBase - Column family
> From: bernd.fondermann@googlemail.com
> To: user@hbase.apache.org
> 
> That's how I would do it:
> What's nice in HBase is that you can store all the data for one of
> your keywords in a single row.
> Create a column family "doc_id".
> Now, for each word, you create one row.
> In this row, for each matching document you create one column (that's
> the gotcha compared to a RDB design).
> The name of the column is the doc id. The column's cell content is the weight.
> 
> So, following your example you'd get:
> 
> row id | column-family:column....
> HELLO |  doc_id:2 | doc_id:3 | doc_id:4
> 
> and column values:
> doc_id:2 | doc_id:3 | doc_id:4
> 12 | 45 | 36
> 
> HTH,
> 
>   Bernd
> 
> 
> On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <Jo...@gmail.com> wrote:
> > Hi, I'm a beginner in HBase. I need to design my table. I want to play with the
> > following information:
> >
> > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight of
> > each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D
> >
> > I created a table with, row: word, column:date, value:doc But I can't store
> > multiple row with the same date, for the same word.
> >
> > Can we create multiple column families for a table? What can be the best way to
> > design the schema?
> >
> > Thanks a lot
> >
> >

Re: HBase - Column family

Posted by Bernd Fondermann <be...@googlemail.com>.

That's how I would do it:
What's nice in HBase is that you can store all the data for one of
your keywords in a single row.
Create a column family "doc_id".
Now, for each word, you create one row.
In this row, for each matching document you create one column (that's
the gotcha compared to a RDB design).
The name of the column is the doc id. The column's cell content is the weight.

So, following your example you'd get:

row id | column-family:column....
HELLO |  doc_id:2 | doc_id:3 | doc_id:4

and column values:
doc_id:2 | doc_id:3 | doc_id:4
12 | 45 | 36

HTH,

  Bernd

On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <Jo...@gmail.com> wrote:
> Hi, I'm a beginner in HBase. I need to design my table. I want to play with the
> following information:
>
> At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight of
> each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D
>
> I created a table with, row: word, column:date, value:doc But I can't store
> multiple row with the same date, for the same word.
>
> Can we create multiple column families for a table? What can be the best way to
> design the schema?
>
> Thanks a lot
>
>