You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Namkyu Chang <na...@gmail.com> on 2013/06/20 03:01:47 UTC

HBase Column Family Limit Reasoning

Hi everyone,

I'm a newcomer to HBase, and as I was reading the documentation I wanted to
learn more about the reasoning behind the limit on the number of column
family that HBase supports.

I understand that currently HBase can only support at most 2-3 column
families due to the flushing and compaction issues, and the excessive i/o
loading it may cause for some smaller column families. Since flushing and
compaction is done on a per region basis, and each region contains most of
the column families, 1 filled column family can trigger a flushing, but the
other non-filled column families will also have to participate when really
they could wait.

However, is this the only reason? I see that this is "To be addressed by
changing flushing and compaction to work on a per column family basis", and
would this mean we can have as much CFs as we'd like after this fix? In
Google's Bigtable paper, they also limit the number of their CFs to around
100 at most. As such, are there any other factors to this limitation?

As well, are there any other ways of getting around this problem? I feel as
if there is still a limit after flushing/compaction issue is fixed, there
must be some other way of doing this. But then would it change the entire
architecture of HBase?

I've been trying to find out more about this problem online and in print,
but there seems to be very limited discussion on this topic.

Thank you in advance.

Daniel

Re: HBase Column Family Limit Reasoning

Posted by lars hofhansl <la...@apache.org>.
There's also some good discussion here: https://issues.apache.org/jira/browse/HBASE-3149
This mostly discusses the small HFiles created, since all CFs have to be flushed together, but it's still worth a read.

-- Lars
________________________________
From: Stack <st...@duboce.net>
To: HBase Dev List <de...@hbase.apache.org> 
Sent: Thursday, June 20, 2013 8:30 AM
Subject: Re: HBase Column Family Limit Reasoning


On Wed, Jun 19, 2013 at 6:01 PM, Namkyu Chang <na...@gmail.com> wrote:

> Hi everyone,
>
> I'm a newcomer to HBase, and as I was reading the documentation I wanted to
> learn more about the reasoning behind the limit on the number of column
> family that HBase supports.
>
> I understand that currently HBase can only support at most 2-3 column
> families due to the flushing and compaction issues, and the excessive i/o
> loading it may cause for some smaller column families. Since flushing and
> compaction is done on a per region basis, and each region contains most of
> the column families, 1 filled column family can trigger a flushing, but the
> other non-filled column families will also have to participate when really
> they could wait.
>
> However, is this the only reason? I see that this is "To be addressed by
> changing flushing and compaction to work on a per column family basis", and
> would this mean we can have as much CFs as we'd like after this fix? In
> Google's Bigtable paper, they also limit the number of their CFs to around
> 100 at most. As such, are there any other factors to this limitation?
>


Some folks have more than 2-3 CFs; IIRC, FB Messages has 10-15 CFs.  In
Messages, reads and writes are carefully managed per CF so they avoid the
issues 'normal' users run into when they have many CFs.

Main issue as you cite above is our flushing all CFs on a region flush
rather than just the big CF.  Fixing this will get us to a new upper-bound.
We'll have to see what it is (my guess is that it will be well below 100).

Other factors are that each CF consumes resources; each has its own
memstore for example.

Be careful. In HBase, CFs are not the same as BigTable CFs.  CFs in HBase
are more like the LocalityGroups BigTable talks of.  If HBase CFs were like
BigTable CFs, that would enable us to have more CFs too (depending upon how
they are implemented).

St.Ack

Re: HBase Column Family Limit Reasoning

Posted by Stack <st...@duboce.net>.
On Wed, Jun 19, 2013 at 6:01 PM, Namkyu Chang <na...@gmail.com> wrote:

> Hi everyone,
>
> I'm a newcomer to HBase, and as I was reading the documentation I wanted to
> learn more about the reasoning behind the limit on the number of column
> family that HBase supports.
>
> I understand that currently HBase can only support at most 2-3 column
> families due to the flushing and compaction issues, and the excessive i/o
> loading it may cause for some smaller column families. Since flushing and
> compaction is done on a per region basis, and each region contains most of
> the column families, 1 filled column family can trigger a flushing, but the
> other non-filled column families will also have to participate when really
> they could wait.
>
> However, is this the only reason? I see that this is "To be addressed by
> changing flushing and compaction to work on a per column family basis", and
> would this mean we can have as much CFs as we'd like after this fix? In
> Google's Bigtable paper, they also limit the number of their CFs to around
> 100 at most. As such, are there any other factors to this limitation?
>


Some folks have more than 2-3 CFs; IIRC, FB Messages has 10-15 CFs.  In
Messages, reads and writes are carefully managed per CF so they avoid the
issues 'normal' users run into when they have many CFs.

Main issue as you cite above is our flushing all CFs on a region flush
rather than just the big CF.  Fixing this will get us to a new upper-bound.
 We'll have to see what it is (my guess is that it will be well below 100).

Other factors are that each CF consumes resources; each has its own
memstore for example.

Be careful. In HBase, CFs are not the same as BigTable CFs.  CFs in HBase
are more like the LocalityGroups BigTable talks of.  If HBase CFs were like
BigTable CFs, that would enable us to have more CFs too (depending upon how
they are implemented).

St.Ack