You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Neil Yalowitz <ne...@gmail.com> on 2012/02/14 18:33:41 UTC

length and size of a column family name or qualifier vs. amount of disk storage

Hi all, here's a (not-so) hypothetical question...  How does a given column
family name or a qualifier impact storage?  Would a long family or qualifer
like this:



my-descriptive-but-long-column-family-name:my-descriptive-but-long-qualifier

--vs. a short column family and qualifier:--

mycolfam1:myqual1


We are assuming the longer cf/qual would be written to HDFS billions of
times and would be wasteful.  Is that a correct assumption?

Does the answer change if you use Snappy compression?



Thanks,

Neil Yalowitz
neilyalowitz@gmail.com

Re: length and size of a column family name or qualifier vs. amount of disk storage

Posted by Doug Meil <do...@explorysmedical.com>.
Also see here...

http://hbase.apache.org/book.html#keyvalue

Compression will make it better on disk, but it will inflate over the wire.



On 2/14/12 12:40 PM, "Jean-Daniel Cryans" <jd...@apache.org> wrote:

>> We are assuming the longer cf/qual would be written to HDFS billions of
>> times and would be wasteful.  Is that a correct assumption?
>
>Yes, also that's covered a bit in:
>http://hbase.apache.org/book.html#keysize
>
>>
>> Does the answer change if you use Snappy compression?
>
>Any compression will make it better, also have a look at this jira
>which was recently committed to trunk (adding prefix compression):
>https://issues.apache.org/jira/browse/HBASE-4218
>
>J-D
>



Re: length and size of a column family name or qualifier vs. amount of disk storage

Posted by Jean-Daniel Cryans <jd...@apache.org>.
> We are assuming the longer cf/qual would be written to HDFS billions of
> times and would be wasteful.  Is that a correct assumption?

Yes, also that's covered a bit in: http://hbase.apache.org/book.html#keysize

>
> Does the answer change if you use Snappy compression?

Any compression will make it better, also have a look at this jira
which was recently committed to trunk (adding prefix compression):
https://issues.apache.org/jira/browse/HBASE-4218

J-D