You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Wei Liu <we...@stellarloyalty.com> on 2014/08/20 01:56:30 UTC

Multiple column families vs Multiple tables

We are doing schema design for our application, One thing we are not so
clear about is multiple column families (more than 3, probably 4 - 5) vs
multiple tables. In our use case, we will have the same number of rows in
all these column families, but some column families may be modified more
often than others, and some column families will have more columns than
others (thousands vs several).

The reason we are thinking about multiple column families is that it
probably can give us better performance if we need to do a search with data
from multiple column families. For example, search for a row with value x
in column family A and with value Y in column family B.

On the other hand, we saw the following paragraph in the user guide which
is scary to us:
"HBase currently does not do well with anything above two or three column
families so keep the number of column families in your schema low.
Currently, flushing and compactions are done on a per Region basis so if
one column family is carrying the bulk of the data bringing on flushes, the
adjacent families will also be flushed though the amount of data they carry
is small. When many column families the flushing and compaction interaction
can make for a bunch of needless i/o loading (To be addressed by changing
flushing and compaction to work on a per column family basis). For more
information on compactions, see Section 9.7.6.7, “Compaction”
<http://hbase.apache.org/book.html#compaction>."

Can any one please shed some light on this topic?  Thanks in advance.

Thanks,
Wei