You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mohammad Tariq <do...@gmail.com> on 2011/12/28 08:53:48 UTC

No. of families

Hello all,

    Having less no. of column families is advisable. It is feasible to
have 2 or 3 sub column families within a single column family???I
want to store xml data in Hbase and I have sub tags that may go down
to 2 or 3 levels.

Regards,
    Mohammad Tariq

Re: No. of families

Posted by Doug Meil <do...@explorysmedical.com>.
I'm with Mike on this one.

Just like he said, hierarchical isn't related to the CF question, and it's
pretty straightforward in a NoSQL DB.





On 12/30/11 10:18 PM, "Michel Segel" <mi...@hotmail.com> wrote:

>Sorry, but you misunderstand.
>
>Implementing a hierarchical model in any NoSQL database is trivial.
>
>
>
>Sent from a remote device. Please excuse any typos...
>
>Mike Segel
>
>On Dec 30, 2011, at 9:01 PM, Imran M Yousuf <im...@gmail.com> wrote:
>
>> Hi Michael,
>> 
>> I totally agree. Thats what we tried to implement in Smart CMS making
>> it easy for clients to retrieve and persist data with more added
>> facility. BTW, I forgot to mention that Smart CMS persists data in
>> HBase.
>> 
>> Regards,
>> 
>> Imran
>> 
>> On Sat, Dec 31, 2011 at 8:56 AM, Michael Segel
>> <mi...@hotmail.com> wrote:
>>> Hierarchical data doesn't necessarily has anything to do w column
>>>families. You can do a hierarchical model in a single column family.
>>> It's pretty straight forward.
>>> 
>>> Sent from my iPhone
>>> 
>>> On Dec 30, 2011, at 6:34 PM, "Imran M Yousuf" <im...@gmail.com>
>>>wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Rather than addressing the issue of how many column family may be
>>>>used and
>>>> query performance on them, I would like to address the problem of
>>>> hierarchical data.
>>>> 
>>>> We were facing an issue of storing hierarchical data in one of our
>>>> applications and for solving that, and many other features, we turned
>>>> developed Smart CMS - smart-cms.org If it sounds interesting to your
>>>> problem let me know, we can then collaborate in more details.
>>>> 
>>>> Thank you,
>>>> 
>>>> Imran
>>>> 
>>>> On 28 Dec 2011 13:55, "Mohammad Tariq" <do...@gmail.com> wrote:
>>>> 
>>>> Hello all,
>>>> 
>>>>   Having less no. of column families is advisable. It is feasible to
>>>> have 2 or 3 sub column families within a single column family???I
>>>> want to store xml data in Hbase and I have sub tags that may go down
>>>> to 2 or 3 levels.
>>>> 
>>>> Regards,
>>>>    Mohammad Tariq
>> 
>> 
>> 
>> -- 
>> Imran M Yousuf
>> Entrepreneur & CEO
>> Smart IT Engineering Ltd.
>> Dhaka, Bangladesh
>> Twitter: @imyousuf - http://twitter.com/imyousuf
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>> 
>



Re: No. of families

Posted by Michel Segel <mi...@hotmail.com>.
Sorry, but you misunderstand.

Implementing a hierarchical model in any NoSQL database is trivial.  



Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 30, 2011, at 9:01 PM, Imran M Yousuf <im...@gmail.com> wrote:

> Hi Michael,
> 
> I totally agree. Thats what we tried to implement in Smart CMS making
> it easy for clients to retrieve and persist data with more added
> facility. BTW, I forgot to mention that Smart CMS persists data in
> HBase.
> 
> Regards,
> 
> Imran
> 
> On Sat, Dec 31, 2011 at 8:56 AM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Hierarchical data doesn't necessarily has anything to do w column families. You can do a hierarchical model in a single column family.
>> It's pretty straight forward.
>> 
>> Sent from my iPhone
>> 
>> On Dec 30, 2011, at 6:34 PM, "Imran M Yousuf" <im...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> Rather than addressing the issue of how many column family may be used and
>>> query performance on them, I would like to address the problem of
>>> hierarchical data.
>>> 
>>> We were facing an issue of storing hierarchical data in one of our
>>> applications and for solving that, and many other features, we turned
>>> developed Smart CMS - smart-cms.org If it sounds interesting to your
>>> problem let me know, we can then collaborate in more details.
>>> 
>>> Thank you,
>>> 
>>> Imran
>>> 
>>> On 28 Dec 2011 13:55, "Mohammad Tariq" <do...@gmail.com> wrote:
>>> 
>>> Hello all,
>>> 
>>>   Having less no. of column families is advisable. It is feasible to
>>> have 2 or 3 sub column families within a single column family???I
>>> want to store xml data in Hbase and I have sub tags that may go down
>>> to 2 or 3 levels.
>>> 
>>> Regards,
>>>    Mohammad Tariq
> 
> 
> 
> -- 
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
> 

Re: No. of families

Posted by Imran M Yousuf <im...@gmail.com>.
Hi Michael,

I totally agree. Thats what we tried to implement in Smart CMS making
it easy for clients to retrieve and persist data with more added
facility. BTW, I forgot to mention that Smart CMS persists data in
HBase.

Regards,

Imran

On Sat, Dec 31, 2011 at 8:56 AM, Michael Segel
<mi...@hotmail.com> wrote:
> Hierarchical data doesn't necessarily has anything to do w column families. You can do a hierarchical model in a single column family.
> It's pretty straight forward.
>
> Sent from my iPhone
>
> On Dec 30, 2011, at 6:34 PM, "Imran M Yousuf" <im...@gmail.com> wrote:
>
>> Hi,
>>
>> Rather than addressing the issue of how many column family may be used and
>> query performance on them, I would like to address the problem of
>> hierarchical data.
>>
>> We were facing an issue of storing hierarchical data in one of our
>> applications and for solving that, and many other features, we turned
>> developed Smart CMS - smart-cms.org If it sounds interesting to your
>> problem let me know, we can then collaborate in more details.
>>
>> Thank you,
>>
>> Imran
>>
>> On 28 Dec 2011 13:55, "Mohammad Tariq" <do...@gmail.com> wrote:
>>
>> Hello all,
>>
>>   Having less no. of column families is advisable. It is feasible to
>> have 2 or 3 sub column families within a single column family???I
>> want to store xml data in Hbase and I have sub tags that may go down
>> to 2 or 3 levels.
>>
>> Regards,
>>    Mohammad Tariq



-- 
Imran M Yousuf
Entrepreneur & CEO
Smart IT Engineering Ltd.
Dhaka, Bangladesh
Twitter: @imyousuf - http://twitter.com/imyousuf
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Re: No. of families

Posted by Michael Segel <mi...@hotmail.com>.
Hierarchical data doesn't necessarily has anything to do w column families. You can do a hierarchical model in a single column family.
It's pretty straight forward.

Sent from my iPhone

On Dec 30, 2011, at 6:34 PM, "Imran M Yousuf" <im...@gmail.com> wrote:

> Hi,
> 
> Rather than addressing the issue of how many column family may be used and
> query performance on them, I would like to address the problem of
> hierarchical data.
> 
> We were facing an issue of storing hierarchical data in one of our
> applications and for solving that, and many other features, we turned
> developed Smart CMS - smart-cms.org If it sounds interesting to your
> problem let me know, we can then collaborate in more details.
> 
> Thank you,
> 
> Imran
> 
> On 28 Dec 2011 13:55, "Mohammad Tariq" <do...@gmail.com> wrote:
> 
> Hello all,
> 
>   Having less no. of column families is advisable. It is feasible to
> have 2 or 3 sub column families within a single column family???I
> want to store xml data in Hbase and I have sub tags that may go down
> to 2 or 3 levels.
> 
> Regards,
>    Mohammad Tariq

Re: No. of families

Posted by Imran M Yousuf <im...@gmail.com>.
Hi,

Rather than addressing the issue of how many column family may be used and
query performance on them, I would like to address the problem of
hierarchical data.

We were facing an issue of storing hierarchical data in one of our
applications and for solving that, and many other features, we turned
developed Smart CMS - smart-cms.org If it sounds interesting to your
problem let me know, we can then collaborate in more details.

Thank you,

Imran

On 28 Dec 2011 13:55, "Mohammad Tariq" <do...@gmail.com> wrote:

Hello all,

   Having less no. of column families is advisable. It is feasible to
have 2 or 3 sub column families within a single column family???I
want to store xml data in Hbase and I have sub tags that may go down
to 2 or 3 levels.

Regards,
    Mohammad Tariq

Re: No. of families

Posted by Jesse Yates <je...@gmail.com>.
Out of curiosity (havent rtfm on this yet) do we have any hard bounds/performance impact on the max number of column families/qualifiers? Has that behavior changed with the dynamic cf stuff that fairly recently got rolled in? 

Further, any pointers on where to start digging into the code on this would be great!

Thanks!

- Jesse Yates

Sent from my iPhone.

On Dec 29, 2011, at 1:18 AM, lars hofhansl <lh...@yahoo.com> wrote:

> Less is not necessarily better. HBase can ignore stores (column families) during a scan or get if thatno columns in that family were requested.
> 
> So what you want to do is group columns that are typically queried together in a single column family, and put
> columns that are not typically queried together in separate families.
> 
> 
> -- Lars
> 
> 
> ----- Original Message -----
> From: Rohit Kelkar <ro...@gmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Wednesday, December 28, 2011 9:01 PM
> Subject: Re: No. of families
> 
> When we say less column families, how much is less? Is this guided by
> a ratio of the number of rows stored in the Htable to number of column
> families. Or number of tables to number of column families. If I
> understand correctly, the content of each column family is stored in a
> separate file. So does it have anything to do with the disk space
> allocated to hadoop?
> 
> - Rohit Kelkar
> 
> On Wed, Dec 28, 2011 at 10:14 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> Hi Doug,
>> 
>> Thanks a lot for the reply.Ya, I had asked a similar
>> question.Actually I am stuck with some schema design issue.I am sorry,
>> the intention was not to ask the same thing repeatedly.I'll try to
>> figure it out with the help of guidelines provided.Many thanks.
>> 
>> Regards,
>>    Mohammad Tariq
>> 
>> 
>> 
>> On Wed, Dec 28, 2011 at 7:24 PM, Doug Meil
>> <do...@explorysmedical.com> wrote:
>>> 
>>> Hi there-
>>> 
>>> re:  "number of CF's"
>>> 
>>> Yes.  Fewer is better.
>>> 
>>> http://hbase.apache.org/book.html#schema
>>> 
>>> re:  "sub column families"
>>> 
>>> 
>>> There aren't "sub column families" - it's just columns (within a CF).
>>> 
>>> http://hbase.apache.org/book.html#datamodel
>>> 
>>> 
>>> If I am not mistaken you asked a similar question to the dist-list a few
>>> weeks ago. The answers haven't changed.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 12/28/11 2:53 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>>   Having less no. of column families is advisable. It is feasible to
>>>> have 2 or 3 sub column families within a single column family???I
>>>> want to store xml data in Hbase and I have sub tags that may go down
>>>> to 2 or 3 levels.
>>>> 
>>>> Regards,
>>>>   Mohammad Tariq
>>>> 
>>> 
>>> 
> 


Re: No. of families

Posted by Doug Meil <do...@explorysmedical.com>.
The origin on the lower number of CF's wasn't so much on the reads but on
the writes (and GC churn) that happens with MemStore flushes per Store.




On 12/29/11 1:18 AM, "lars hofhansl" <lh...@yahoo.com> wrote:

>Less is not necessarily better. HBase can ignore stores (column families)
>during a scan or get if thatno columns in that family were requested.
>
>So what you want to do is group columns that are typically queried
>together in a single column family, and put
>columns that are not typically queried together in separate families.
>
>
>-- Lars
>
>
>----- Original Message -----
>From: Rohit Kelkar <ro...@gmail.com>
>To: user@hbase.apache.org
>Cc: 
>Sent: Wednesday, December 28, 2011 9:01 PM
>Subject: Re: No. of families
>
>When we say less column families, how much is less? Is this guided by
>a ratio of the number of rows stored in the Htable to number of column
>families. Or number of tables to number of column families. If I
>understand correctly, the content of each column family is stored in a
>separate file. So does it have anything to do with the disk space
>allocated to hadoop?
>
>- Rohit Kelkar
>
>On Wed, Dec 28, 2011 at 10:14 PM, Mohammad Tariq <do...@gmail.com>
>wrote:
>> Hi Doug,
>>
>>  Thanks a lot for the reply.Ya, I had asked a similar
>> question.Actually I am stuck with some schema design issue.I am sorry,
>> the intention was not to ask the same thing repeatedly.I'll try to
>> figure it out with the help of guidelines provided.Many thanks.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Wed, Dec 28, 2011 at 7:24 PM, Doug Meil
>> <do...@explorysmedical.com> wrote:
>>>
>>> Hi there-
>>>
>>> re:  "number of CF's"
>>>
>>> Yes.  Fewer is better.
>>>
>>> http://hbase.apache.org/book.html#schema
>>>
>>> re:  "sub column families"
>>>
>>>
>>> There aren't "sub column families" - it's just columns (within a CF).
>>>
>>> http://hbase.apache.org/book.html#datamodel
>>>
>>>
>>> If I am not mistaken you asked a similar question to the dist-list a
>>>few
>>> weeks ago. The answers haven't changed.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 12/28/11 2:53 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
>>>
>>>>Hello all,
>>>>
>>>>    Having less no. of column families is advisable. It is feasible to
>>>>have 2 or 3 sub column families within a single column family???I
>>>>want to store xml data in Hbase and I have sub tags that may go down
>>>>to 2 or 3 levels.
>>>>
>>>>Regards,
>>>>    Mohammad Tariq
>>>>
>>>
>>>
>
>



Re: No. of families

Posted by lars hofhansl <lh...@yahoo.com>.
Less is not necessarily better. HBase can ignore stores (column families) during a scan or get if thatno columns in that family were requested.

So what you want to do is group columns that are typically queried together in a single column family, and put
columns that are not typically queried together in separate families.


-- Lars


----- Original Message -----
From: Rohit Kelkar <ro...@gmail.com>
To: user@hbase.apache.org
Cc: 
Sent: Wednesday, December 28, 2011 9:01 PM
Subject: Re: No. of families

When we say less column families, how much is less? Is this guided by
a ratio of the number of rows stored in the Htable to number of column
families. Or number of tables to number of column families. If I
understand correctly, the content of each column family is stored in a
separate file. So does it have anything to do with the disk space
allocated to hadoop?

- Rohit Kelkar

On Wed, Dec 28, 2011 at 10:14 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hi Doug,
>
>  Thanks a lot for the reply.Ya, I had asked a similar
> question.Actually I am stuck with some schema design issue.I am sorry,
> the intention was not to ask the same thing repeatedly.I'll try to
> figure it out with the help of guidelines provided.Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Wed, Dec 28, 2011 at 7:24 PM, Doug Meil
> <do...@explorysmedical.com> wrote:
>>
>> Hi there-
>>
>> re:  "number of CF's"
>>
>> Yes.  Fewer is better.
>>
>> http://hbase.apache.org/book.html#schema
>>
>> re:  "sub column families"
>>
>>
>> There aren't "sub column families" - it's just columns (within a CF).
>>
>> http://hbase.apache.org/book.html#datamodel
>>
>>
>> If I am not mistaken you asked a similar question to the dist-list a few
>> weeks ago. The answers haven't changed.
>>
>>
>>
>>
>>
>>
>> On 12/28/11 2:53 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
>>
>>>Hello all,
>>>
>>>    Having less no. of column families is advisable. It is feasible to
>>>have 2 or 3 sub column families within a single column family???I
>>>want to store xml data in Hbase and I have sub tags that may go down
>>>to 2 or 3 levels.
>>>
>>>Regards,
>>>    Mohammad Tariq
>>>
>>
>>


Re: No. of families

Posted by Jahangir Mohammed <md...@gmail.com>.
Generally, low number of C.F.s is better. Lower the number of C.F.s, better
the performance. I am not sure if users go above 8 C.F.s. Most of the users
keep number of C.F.s less than 4 or so. The implication of having more
C.F.s and the density of those C.F.s in terms of data may vary, in that
case it can cause unnecessary I/O burden when flushing or compacting.

Below link should help too:
http://hbase.apache.org/book/number.of.cfs.html

Setting compaction per C.F. is WIP
https://issues.apache.org/jira/browse/HBASE-4770

Thanks,
Jahangir.

On Thu, Dec 29, 2011 at 12:01 AM, Rohit Kelkar <ro...@gmail.com>wrote:

> When we say less column families, how much is less? Is this guided by
> a ratio of the number of rows stored in the Htable to number of column
> families. Or number of tables to number of column families. If I
> understand correctly, the content of each column family is stored in a
> separate file. So does it have anything to do with the disk space
> allocated to hadoop?
>
> - Rohit Kelkar
>
> On Wed, Dec 28, 2011 at 10:14 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> > Hi Doug,
> >
> >  Thanks a lot for the reply.Ya, I had asked a similar
> > question.Actually I am stuck with some schema design issue.I am sorry,
> > the intention was not to ask the same thing repeatedly.I'll try to
> > figure it out with the help of guidelines provided.Many thanks.
> >
> > Regards,
> >     Mohammad Tariq
> >
> >
> >
> > On Wed, Dec 28, 2011 at 7:24 PM, Doug Meil
> > <do...@explorysmedical.com> wrote:
> >>
> >> Hi there-
> >>
> >> re:  "number of CF's"
> >>
> >> Yes.  Fewer is better.
> >>
> >> http://hbase.apache.org/book.html#schema
> >>
> >> re:  "sub column families"
> >>
> >>
> >> There aren't "sub column families" - it's just columns (within a CF).
> >>
> >> http://hbase.apache.org/book.html#datamodel
> >>
> >>
> >> If I am not mistaken you asked a similar question to the dist-list a few
> >> weeks ago. The answers haven't changed.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 12/28/11 2:53 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
> >>
> >>>Hello all,
> >>>
> >>>    Having less no. of column families is advisable. It is feasible to
> >>>have 2 or 3 sub column families within a single column family???I
> >>>want to store xml data in Hbase and I have sub tags that may go down
> >>>to 2 or 3 levels.
> >>>
> >>>Regards,
> >>>    Mohammad Tariq
> >>>
> >>
> >>
>

Re: No. of families

Posted by Rohit Kelkar <ro...@gmail.com>.
When we say less column families, how much is less? Is this guided by
a ratio of the number of rows stored in the Htable to number of column
families. Or number of tables to number of column families. If I
understand correctly, the content of each column family is stored in a
separate file. So does it have anything to do with the disk space
allocated to hadoop?

- Rohit Kelkar

On Wed, Dec 28, 2011 at 10:14 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hi Doug,
>
>  Thanks a lot for the reply.Ya, I had asked a similar
> question.Actually I am stuck with some schema design issue.I am sorry,
> the intention was not to ask the same thing repeatedly.I'll try to
> figure it out with the help of guidelines provided.Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Wed, Dec 28, 2011 at 7:24 PM, Doug Meil
> <do...@explorysmedical.com> wrote:
>>
>> Hi there-
>>
>> re:  "number of CF's"
>>
>> Yes.  Fewer is better.
>>
>> http://hbase.apache.org/book.html#schema
>>
>> re:  "sub column families"
>>
>>
>> There aren't "sub column families" - it's just columns (within a CF).
>>
>> http://hbase.apache.org/book.html#datamodel
>>
>>
>> If I am not mistaken you asked a similar question to the dist-list a few
>> weeks ago. The answers haven't changed.
>>
>>
>>
>>
>>
>>
>> On 12/28/11 2:53 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
>>
>>>Hello all,
>>>
>>>    Having less no. of column families is advisable. It is feasible to
>>>have 2 or 3 sub column families within a single column family???I
>>>want to store xml data in Hbase and I have sub tags that may go down
>>>to 2 or 3 levels.
>>>
>>>Regards,
>>>    Mohammad Tariq
>>>
>>
>>

Re: No. of families

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Doug,

  Thanks a lot for the reply.Ya, I had asked a similar
question.Actually I am stuck with some schema design issue.I am sorry,
the intention was not to ask the same thing repeatedly.I'll try to
figure it out with the help of guidelines provided.Many thanks.

Regards,
    Mohammad Tariq



On Wed, Dec 28, 2011 at 7:24 PM, Doug Meil
<do...@explorysmedical.com> wrote:
>
> Hi there-
>
> re:  "number of CF's"
>
> Yes.  Fewer is better.
>
> http://hbase.apache.org/book.html#schema
>
> re:  "sub column families"
>
>
> There aren't "sub column families" - it's just columns (within a CF).
>
> http://hbase.apache.org/book.html#datamodel
>
>
> If I am not mistaken you asked a similar question to the dist-list a few
> weeks ago. The answers haven't changed.
>
>
>
>
>
>
> On 12/28/11 2:53 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
>
>>Hello all,
>>
>>    Having less no. of column families is advisable. It is feasible to
>>have 2 or 3 sub column families within a single column family???I
>>want to store xml data in Hbase and I have sub tags that may go down
>>to 2 or 3 levels.
>>
>>Regards,
>>    Mohammad Tariq
>>
>
>

Re: No. of families

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

re:  "number of CF's"

Yes.  Fewer is better.

http://hbase.apache.org/book.html#schema

re:  "sub column families"


There aren't "sub column families" - it's just columns (within a CF).

http://hbase.apache.org/book.html#datamodel


If I am not mistaken you asked a similar question to the dist-list a few
weeks ago. The answers haven't changed.






On 12/28/11 2:53 AM, "Mohammad Tariq" <do...@gmail.com> wrote:

>Hello all,
>
>    Having less no. of column families is advisable. It is feasible to
>have 2 or 3 sub column families within a single column family???I
>want to store xml data in Hbase and I have sub tags that may go down
>to 2 or 3 levels.
>
>Regards,
>    Mohammad Tariq
>