You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Davey Yan <da...@gmail.com> on 2012/04/10 08:58:24 UTC

How many data versions should I keep in HBase?

HI,

In my business case, it is unnecessary to keep more then one version of data.
The application code will never try to get/scan older versions.

Should I set the MAX_VERSIONS => 1 for every table, instead of the default 3 ?

The hbase book online said: Compression will boost performance by
reducing the size of StoreFiles and thus reducing I/O.
(http://hbase.apache.org/book/important_configurations.html)
I have enabled the SNAPPY compression, ideally i will reduce data to
22.2% remaining.
So if i set the MAX_VERSIONS => 1, i will reduce data to 1/3 remaining again?

Thanks for your time.
Sincerely,


--
Davey Yan

Re: How many data versions should I keep in HBase?

Posted by Davey Yan <da...@gmail.com>.
Thank you for your reply, Alex.

In my business case, it is unnecessary to store or access more then
one version of data.
I will set the MAX_VERSIONS => 1 for every table.

On Tue, Apr 10, 2012 at 8:54 PM, Alex Baranau <al...@gmail.com> wrote:
> Compression applies to the files stored on disks. All versions of a column
> are stored the same way (HBase doesn't differentiate them at the time of
> writing and they are not placed "near" each other in the file). Given that,
> yes you are likely to get the same level of compression (compr. ratio) if
> you increase the # of versions to store.
>
> May I ask you what is your business case that requires storing multiple
> versions, but at the same time you are never going to access them?
>
> Alex
> ------
> Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
>
> On Tue, Apr 10, 2012 at 2:58 AM, Davey Yan <da...@gmail.com> wrote:
>
>> HI,
>>
>> In my business case, it is unnecessary to keep more then one version of
>> data.
>> The application code will never try to get/scan older versions.
>>
>> Should I set the MAX_VERSIONS => 1 for every table, instead of the default
>> 3 ?
>>
>> The hbase book online said: Compression will boost performance by
>> reducing the size of StoreFiles and thus reducing I/O.
>> (http://hbase.apache.org/book/important_configurations.html)
>> I have enabled the SNAPPY compression, ideally i will reduce data to
>> 22.2% remaining.
>> So if i set the MAX_VERSIONS => 1, i will reduce data to 1/3 remaining
>> again?
>>
>> Thanks for your time.
>> Sincerely,
>>
>>
>> --
>> Davey Yan
>>



-- 
Davey Yan

Re: How many data versions should I keep in HBase?

Posted by Alex Baranau <al...@gmail.com>.
Compression applies to the files stored on disks. All versions of a column
are stored the same way (HBase doesn't differentiate them at the time of
writing and they are not placed "near" each other in the file). Given that,
yes you are likely to get the same level of compression (compr. ratio) if
you increase the # of versions to store.

May I ask you what is your business case that requires storing multiple
versions, but at the same time you are never going to access them?

Alex
------
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase

On Tue, Apr 10, 2012 at 2:58 AM, Davey Yan <da...@gmail.com> wrote:

> HI,
>
> In my business case, it is unnecessary to keep more then one version of
> data.
> The application code will never try to get/scan older versions.
>
> Should I set the MAX_VERSIONS => 1 for every table, instead of the default
> 3 ?
>
> The hbase book online said: Compression will boost performance by
> reducing the size of StoreFiles and thus reducing I/O.
> (http://hbase.apache.org/book/important_configurations.html)
> I have enabled the SNAPPY compression, ideally i will reduce data to
> 22.2% remaining.
> So if i set the MAX_VERSIONS => 1, i will reduce data to 1/3 remaining
> again?
>
> Thanks for your time.
> Sincerely,
>
>
> --
> Davey Yan
>