You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Nishanth S <ni...@gmail.com> on 2014/09/22 22:43:37 UTC

Restructuring Hbase Table

Hi folks,

We   have a hbase table  with  4 column families which stores log data.The
columns and the content stored on each of these column families are the
same. The reason for having multiple families is that we needed 4 retention
buckets for messages and  were using the TTL  feature of hbase to achieve
this.Each of our hbase row would have a predefined set of meta fields and a
large blob  message.

I was considering re structuring the table with  2 column families.One
column family for metadata and other for the blob message which is the
meatier chunk.The reason for  this approach  being most of the analytics
queries would be directed at meta data which is in cf1 and few in cf2 which
has the blob message.There will be  few use cases where you would need to
query  the data in both cf1 and cf2 but that is not the dominant use
case.We would then devise some method to purge  the data manually(using
retention bucket + timestamp) in row key. How does this look so far?Is
there a better way?.


Thanks,
Nishanth

Fwd: Restructuring Hbase Table

Posted by Nishanth S <ni...@gmail.com>.

Hi folks,

We   have a hbase table  with  4 column families which stores log data.The
columns and the content stored on each of these column families are the
same. The reason for having multiple families is that we needed 4 retention
buckets for messages and  were using the TTL  feature of hbase to achieve
this.Each of our hbase row would have a predefined set of meta fields and a
large blob  message.

I was considering re structuring the table with  2 column families.One
column family for metadata and other for the blob message which is the
meatier chunk.The reason for  this approach  being most of the analytics
queries would be directed at meta data which is in cf1 and few in cf2 which
has the blob message.There will be  few use cases where you would need to
query  the data in both cf1 and cf2 but that is not the dominant use
case.We would then devise some method to purge  the data manually(using
retention bucket + timestamp) in row key. How does this look so far?Is
there a better way to implement this?


Thanks,
Nishanth