You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Imran M Yousuf <im...@gmail.com> on 2011/09/09 07:07:21 UTC

Using multiple column families

Hi,

Firstly, I have read in the mailing list before that having more than
1 column family is not recommended. I am more interested to know
whether it is a problem in my use case as well or not.

I have a strong entitly and it has 6 weak entities all with 1-to-many
cardinal relationship to the strong entity. Furthermore, they are all
loaded in mutually exclusive manner, i.e. if A is strong entity and
its weak entities are P, Q, R, S, T, U in that case no 2 weak entities
are accessed at once. Moreover their lifecycles are independent of
each other. My current implementation is I have one column family for
the strong entity and one for each weak entities. So for a given row I
only load one column family at a time. The obvious advantages are that
- deleting strong entity automatically deletes the weak entities as
they are a single row, delete all of a kind weak entity for a specific
weak entity is as simple as deleting all cells in a column family for
a row. Our assumption (pretty high than what we expect) is that we
will not have more than 20k rows in that table. Under these
circumstance how bad is it to have 7 column families?

We would be glad if you would kindly share thoughts and feedback on this issue.

Thank you,

-- 
Imran M Yousuf
Entrepreneur & CEO
Smart IT Engineering Ltd.
Dhaka, Bangladesh
Twitter: @imyousuf - http://twitter.com/imyousuf
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

RE: Using multiple column families

Posted by Stuti Awasthi <st...@hcl.com>.
Thanks St.Ack

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Monday, September 12, 2011 11:02 PM
To: user@hbase.apache.org
Subject: Re: Using multiple column families

It depends on how you access the table.  Three to four column families may be appropriate schema if you are accessing individual cfs mostly.
Its when you do x-cf accesses, that things can slow (If most of your accesses are getting all data -- then just have one cf).  Multiple cfs too if all active at the one time can make the server internal accounting a little messy.  We've not spent much time studying and optimizing for this case; e.g. mult-cf flushing, compacting, querying.
 Because of this, query times can be slower.

St.Ack

On Mon, Sep 12, 2011 at 12:05 AM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi,
>
> I am also looking answer for similar question. In my scenario we will be having petabytes of data to handle. Currently I am working with schema which has 3-4 column family with them. What the major issues we can face if we have multiple column family.
>
> I have read that each column family will be stored as separate Hfile in regionserver and if we search by row-id and column family that will be useful as client will go to Hfile for specific column family.
> If we have flat table structure then we will land up either having more tables with data replication because of the data dependencies on each other.
>
> Please suggest
>
>
> -----Original Message-----
> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
> Sent: Saturday, September 10, 2011 6:55 AM
> To: user@hbase.apache.org
> Subject: Re: Using multiple column families
>
> Hi J-D,
>
> Thanks for your feedback.
>
> (replies inline)
> On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> 20k rows? If this is your only use case, you don't need HBase :)
>>
>
> Its one of several others
>
>> If it's 20k rows times a gazillion columns per row, then I would 
>> recommend flattening out the rows instead.
>>
>
> Well, our guess is at the moment their would not be more than 500 cells per family to start with.
>
>> If it's just one small table among others, then you probably won't be 
>> bothered by the multiple families.
>>
>
> We actually have many other tables which are flattened out to a single column family and this is one table for which we are using more than 1 column family.
>
> Thanks once again.
>
> Imran
>
>> J-D
>>
>> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <im...@gmail.com> wrote:
>>> Hi,
>>>
>>> Firstly, I have read in the mailing list before that having more 
>>> than
>>> 1 column family is not recommended. I am more interested to know 
>>> whether it is a problem in my use case as well or not.
>>>
>>> I have a strong entitly and it has 6 weak entities all with 
>>> 1-to-many cardinal relationship to the strong entity. Furthermore, 
>>> they are all loaded in mutually exclusive manner, i.e. if A is 
>>> strong entity and its weak entities are P, Q, R, S, T, U in that 
>>> case no 2 weak entities are accessed at once. Moreover their 
>>> lifecycles are independent of each other. My current implementation 
>>> is I have one column family for the strong entity and one for each weak entities.
>>> So for a given row I only load one column family at a time. The 
>>> obvious advantages are that
>>> - deleting strong entity automatically deletes the weak entities as 
>>> they are a single row, delete all of a kind weak entity for a 
>>> specific weak entity is as simple as deleting all cells in a column 
>>> family for a row. Our assumption (pretty high than what we expect) 
>>> is that we will not have more than 20k rows in that table. Under 
>>> these circumstance how bad is it to have 7 column families?
>>>
>>> We would be glad if you would kindly share thoughts and feedback on this issue.
>>>
>>> Thank you,
>>>
>>> --
>>> Imran M Yousuf
>>> Entrepreneur & CEO
>>> Smart IT Engineering Ltd.
>>> Dhaka, Bangladesh
>>> Twitter: @imyousuf - http://twitter.com/imyousuf
>>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>>> Mobile: +880-1711402557
>>>
>>
>
>
>
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> -------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its 
> affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of the author of this e-mail is 
> strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>

Re: Using multiple column families

Posted by Stack <st...@duboce.net>.
It depends on how you access the table.  Three to four column families
may be appropriate schema if you are accessing individual cfs mostly.
Its when you do x-cf accesses, that things can slow (If most of your
accesses are getting all data -- then just have one cf).  Multiple cfs
too if all active at the one time can make the server internal
accounting a little messy.  We've not spent much time studying and
optimizing for this case; e.g. mult-cf flushing, compacting, querying.
 Because of this, query times can be slower.

St.Ack

On Mon, Sep 12, 2011 at 12:05 AM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi,
>
> I am also looking answer for similar question. In my scenario we will be having petabytes of data to handle. Currently I am working with schema which has 3-4 column family with them. What the major issues we can face if we have multiple column family.
>
> I have read that each column family will be stored as separate Hfile in regionserver and if we search by row-id and column family that will be useful as client will go to Hfile for specific column family.
> If we have flat table structure then we will land up either having more tables with data replication because of the data dependencies on each other.
>
> Please suggest
>
>
> -----Original Message-----
> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
> Sent: Saturday, September 10, 2011 6:55 AM
> To: user@hbase.apache.org
> Subject: Re: Using multiple column families
>
> Hi J-D,
>
> Thanks for your feedback.
>
> (replies inline)
> On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> 20k rows? If this is your only use case, you don't need HBase :)
>>
>
> Its one of several others
>
>> If it's 20k rows times a gazillion columns per row, then I would
>> recommend flattening out the rows instead.
>>
>
> Well, our guess is at the moment their would not be more than 500 cells per family to start with.
>
>> If it's just one small table among others, then you probably won't be
>> bothered by the multiple families.
>>
>
> We actually have many other tables which are flattened out to a single column family and this is one table for which we are using more than 1 column family.
>
> Thanks once again.
>
> Imran
>
>> J-D
>>
>> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <im...@gmail.com> wrote:
>>> Hi,
>>>
>>> Firstly, I have read in the mailing list before that having more than
>>> 1 column family is not recommended. I am more interested to know
>>> whether it is a problem in my use case as well or not.
>>>
>>> I have a strong entitly and it has 6 weak entities all with 1-to-many
>>> cardinal relationship to the strong entity. Furthermore, they are all
>>> loaded in mutually exclusive manner, i.e. if A is strong entity and
>>> its weak entities are P, Q, R, S, T, U in that case no 2 weak
>>> entities are accessed at once. Moreover their lifecycles are
>>> independent of each other. My current implementation is I have one
>>> column family for the strong entity and one for each weak entities.
>>> So for a given row I only load one column family at a time. The
>>> obvious advantages are that
>>> - deleting strong entity automatically deletes the weak entities as
>>> they are a single row, delete all of a kind weak entity for a
>>> specific weak entity is as simple as deleting all cells in a column
>>> family for a row. Our assumption (pretty high than what we expect) is
>>> that we will not have more than 20k rows in that table. Under these
>>> circumstance how bad is it to have 7 column families?
>>>
>>> We would be glad if you would kindly share thoughts and feedback on this issue.
>>>
>>> Thank you,
>>>
>>> --
>>> Imran M Yousuf
>>> Entrepreneur & CEO
>>> Smart IT Engineering Ltd.
>>> Dhaka, Bangladesh
>>> Twitter: @imyousuf - http://twitter.com/imyousuf
>>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>>> Mobile: +880-1711402557
>>>
>>
>
>
>
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>
> ::DISCLAIMER::
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------
>

RE: Using multiple column families

Posted by Stuti Awasthi <st...@hcl.com>.
Hi,

I am also looking answer for similar question. In my scenario we will be having petabytes of data to handle. Currently I am working with schema which has 3-4 column family with them. What the major issues we can face if we have multiple column family.

I have read that each column family will be stored as separate Hfile in regionserver and if we search by row-id and column family that will be useful as client will go to Hfile for specific column family.
If we have flat table structure then we will land up either having more tables with data replication because of the data dependencies on each other.

Please suggest


-----Original Message-----
From: Imran M Yousuf [mailto:imyousuf@gmail.com]
Sent: Saturday, September 10, 2011 6:55 AM
To: user@hbase.apache.org
Subject: Re: Using multiple column families

Hi J-D,

Thanks for your feedback.

(replies inline)
On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> 20k rows? If this is your only use case, you don't need HBase :)
>

Its one of several others

> If it's 20k rows times a gazillion columns per row, then I would
> recommend flattening out the rows instead.
>

Well, our guess is at the moment their would not be more than 500 cells per family to start with.

> If it's just one small table among others, then you probably won't be
> bothered by the multiple families.
>

We actually have many other tables which are flattened out to a single column family and this is one table for which we are using more than 1 column family.

Thanks once again.

Imran

> J-D
>
> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <im...@gmail.com> wrote:
>> Hi,
>>
>> Firstly, I have read in the mailing list before that having more than
>> 1 column family is not recommended. I am more interested to know
>> whether it is a problem in my use case as well or not.
>>
>> I have a strong entitly and it has 6 weak entities all with 1-to-many
>> cardinal relationship to the strong entity. Furthermore, they are all
>> loaded in mutually exclusive manner, i.e. if A is strong entity and
>> its weak entities are P, Q, R, S, T, U in that case no 2 weak
>> entities are accessed at once. Moreover their lifecycles are
>> independent of each other. My current implementation is I have one
>> column family for the strong entity and one for each weak entities.
>> So for a given row I only load one column family at a time. The
>> obvious advantages are that
>> - deleting strong entity automatically deletes the weak entities as
>> they are a single row, delete all of a kind weak entity for a
>> specific weak entity is as simple as deleting all cells in a column
>> family for a row. Our assumption (pretty high than what we expect) is
>> that we will not have more than 20k rows in that table. Under these
>> circumstance how bad is it to have 7 column families?
>>
>> We would be glad if you would kindly share thoughts and feedback on this issue.
>>
>> Thank you,
>>
>> --
>> Imran M Yousuf
>> Entrepreneur & CEO
>> Smart IT Engineering Ltd.
>> Dhaka, Bangladesh
>> Twitter: @imyousuf - http://twitter.com/imyousuf
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>
>



--
Imran M Yousuf
Entrepreneur & CEO
Smart IT Engineering Ltd.
Dhaka, Bangladesh
Twitter: @imyousuf - http://twitter.com/imyousuf
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

Re: Using multiple column families

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Ok it's small enough you that you won't be bothered.

J-D

On Fri, Sep 9, 2011 at 6:25 PM, Imran M Yousuf <im...@gmail.com> wrote:
> Hi J-D,
>
> Thanks for your feedback.
>
> (replies inline)
> On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> 20k rows? If this is your only use case, you don't need HBase :)
>>
>
> Its one of several others
>
>> If it's 20k rows times a gazillion columns per row, then I would
>> recommend flattening out the rows instead.
>>
>
> Well, our guess is at the moment their would not be more than 500
> cells per family to start with.
>
>> If it's just one small table among others, then you probably won't be
>> bothered by the multiple families.
>>
>
> We actually have many other tables which are flattened out to a single
> column family and this is one table for which we are using more than 1
> column family.
>
> Thanks once again.
>
> Imran
>
>> J-D
>>
>> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <im...@gmail.com> wrote:
>>> Hi,
>>>
>>> Firstly, I have read in the mailing list before that having more than
>>> 1 column family is not recommended. I am more interested to know
>>> whether it is a problem in my use case as well or not.
>>>
>>> I have a strong entitly and it has 6 weak entities all with 1-to-many
>>> cardinal relationship to the strong entity. Furthermore, they are all
>>> loaded in mutually exclusive manner, i.e. if A is strong entity and
>>> its weak entities are P, Q, R, S, T, U in that case no 2 weak entities
>>> are accessed at once. Moreover their lifecycles are independent of
>>> each other. My current implementation is I have one column family for
>>> the strong entity and one for each weak entities. So for a given row I
>>> only load one column family at a time. The obvious advantages are that
>>> - deleting strong entity automatically deletes the weak entities as
>>> they are a single row, delete all of a kind weak entity for a specific
>>> weak entity is as simple as deleting all cells in a column family for
>>> a row. Our assumption (pretty high than what we expect) is that we
>>> will not have more than 20k rows in that table. Under these
>>> circumstance how bad is it to have 7 column families?
>>>
>>> We would be glad if you would kindly share thoughts and feedback on this issue.
>>>
>>> Thank you,
>>>
>>> --
>>> Imran M Yousuf
>>> Entrepreneur & CEO
>>> Smart IT Engineering Ltd.
>>> Dhaka, Bangladesh
>>> Twitter: @imyousuf - http://twitter.com/imyousuf
>>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>>> Mobile: +880-1711402557
>>>
>>
>
>
>
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>

Re: Using multiple column families

Posted by Imran M Yousuf <im...@gmail.com>.
Hi J-D,

Thanks for your feedback.

(replies inline)
On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> 20k rows? If this is your only use case, you don't need HBase :)
>

Its one of several others

> If it's 20k rows times a gazillion columns per row, then I would
> recommend flattening out the rows instead.
>

Well, our guess is at the moment their would not be more than 500
cells per family to start with.

> If it's just one small table among others, then you probably won't be
> bothered by the multiple families.
>

We actually have many other tables which are flattened out to a single
column family and this is one table for which we are using more than 1
column family.

Thanks once again.

Imran

> J-D
>
> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <im...@gmail.com> wrote:
>> Hi,
>>
>> Firstly, I have read in the mailing list before that having more than
>> 1 column family is not recommended. I am more interested to know
>> whether it is a problem in my use case as well or not.
>>
>> I have a strong entitly and it has 6 weak entities all with 1-to-many
>> cardinal relationship to the strong entity. Furthermore, they are all
>> loaded in mutually exclusive manner, i.e. if A is strong entity and
>> its weak entities are P, Q, R, S, T, U in that case no 2 weak entities
>> are accessed at once. Moreover their lifecycles are independent of
>> each other. My current implementation is I have one column family for
>> the strong entity and one for each weak entities. So for a given row I
>> only load one column family at a time. The obvious advantages are that
>> - deleting strong entity automatically deletes the weak entities as
>> they are a single row, delete all of a kind weak entity for a specific
>> weak entity is as simple as deleting all cells in a column family for
>> a row. Our assumption (pretty high than what we expect) is that we
>> will not have more than 20k rows in that table. Under these
>> circumstance how bad is it to have 7 column families?
>>
>> We would be glad if you would kindly share thoughts and feedback on this issue.
>>
>> Thank you,
>>
>> --
>> Imran M Yousuf
>> Entrepreneur & CEO
>> Smart IT Engineering Ltd.
>> Dhaka, Bangladesh
>> Twitter: @imyousuf - http://twitter.com/imyousuf
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>
>



-- 
Imran M Yousuf
Entrepreneur & CEO
Smart IT Engineering Ltd.
Dhaka, Bangladesh
Twitter: @imyousuf - http://twitter.com/imyousuf
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Re: Using multiple column families

Posted by Jean-Daniel Cryans <jd...@apache.org>.
20k rows? If this is your only use case, you don't need HBase :)

If it's 20k rows times a gazillion columns per row, then I would
recommend flattening out the rows instead.

If it's just one small table among others, then you probably won't be
bothered by the multiple families.

J-D

On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <im...@gmail.com> wrote:
> Hi,
>
> Firstly, I have read in the mailing list before that having more than
> 1 column family is not recommended. I am more interested to know
> whether it is a problem in my use case as well or not.
>
> I have a strong entitly and it has 6 weak entities all with 1-to-many
> cardinal relationship to the strong entity. Furthermore, they are all
> loaded in mutually exclusive manner, i.e. if A is strong entity and
> its weak entities are P, Q, R, S, T, U in that case no 2 weak entities
> are accessed at once. Moreover their lifecycles are independent of
> each other. My current implementation is I have one column family for
> the strong entity and one for each weak entities. So for a given row I
> only load one column family at a time. The obvious advantages are that
> - deleting strong entity automatically deletes the weak entities as
> they are a single row, delete all of a kind weak entity for a specific
> weak entity is as simple as deleting all cells in a column family for
> a row. Our assumption (pretty high than what we expect) is that we
> will not have more than 20k rows in that table. Under these
> circumstance how bad is it to have 7 column families?
>
> We would be glad if you would kindly share thoughts and feedback on this issue.
>
> Thank you,
>
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>