You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Micah Whitacre <mk...@gmail.com> on 2011/10/04 18:20:20 UTC

Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

In reading the documentation all I've seen suggestions on how to set
the value and the default value.  However I haven't seen any
indication how to set the value to "i don't care store them all" or if
there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
know?

Thanks,
Micah

[1] - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html#setMaxVersions(int)

Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by "Brush,Ryan" <RB...@CERNER.COM>.
No, if you have only 10 versions of a cell there is no additional overhead
to the maxVersion being 20 or 10,000. There shouldn't be a penalty for
setting max versions arbitrarily large as long as your number of actual,
physical versions of a row is less than that.


Max versions on the column family is used during compactions to restrict
the number of copies carried forward during the compaction.  So if your
value is greater than the number of actual versions, the behavior is
unchanged no matter how big it is.  Similarly with get or scan operations,
if the value you request is much larger than the actual values, it has no
additional overhead.

Of course, if your number of actual versions exceeds some maxVersion
value, that has the semantics you'd expect. But if you really want to keep
all versions, there is no cost to setting the value arbitrarily high.

(There is a catch in that a single row can never span beyond a single
region, so if you have a _lot_ of versions in a row this could have
implications, but this same issue applies if you have a single "wide" row
with a huge amount of data in the columns.)

On 10/6/11 8:52 AM, "Micah Whitacre" <mk...@gmail.com> wrote:

>Are there any negative performance aspects to setting the max versions
>to a large value if those extra stored versions are not used?  If I
>set the max to 10k but really only store 100, there is not extra
>diskspace/memory being consumed by the potential of having more
>versions is there?  Also what about the inverse of writing, gets?  If
>my "Gets" all call get.setMaxVersion() does setting that value to
>being extremely despite there not being versions cause performance
>problems?
>
>Thanks for the help,
>Micah
>
>On Tue, Oct 4, 2011 at 10:52 PM, lars hofhansl <lh...@yahoo.com>
>wrote:
>> MaxVersions and MinVersions are different features.
>> MaxVersion identifies the max number of versions you want to keep (just
>>to state the obvious).
>> MinVersion is used together with TTL (and soon with deletes - see
>>HBASE-4536), to indicate the minimum number of version you want keep
>>around even when they should be expired or were deleted.
>>
>>
>> There is no way to disable MaxVersions, just set it to a very large
>>number.
>>
>> MinVersions is by default disabled (setting is 0), which means rows
>>past their TTL and deleted rows will be removed during compaction.
>>
>>
>> I am thinking about how to state that more clearly in the documentation.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>> From: Doug Meil <do...@explorysmedical.com>
>> To: "user@hbase.apache.org" <us...@hbase.apache.org>
>> Sent: Tuesday, October 4, 2011 4:32 PM
>> Subject: Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?
>>
>>
>> The default for versioning is 3, unfortunately the sub-section also
>>cites
>> (incorrectly) that the min is 0.  That sub-section is trying to indicate
>> the minimum legal value.  I am working on clearing that entry up with
>> another developer.
>>
>>
>>
>>
>>
>> On 10/4/11 6:04 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>
>>>Are you surmising that from the description of setting a minimum
>>>version?
>>>
>>>On Tue, Oct 4, 2011 at 2:31 PM, Doug Meil
>>><do...@explorysmedical.com>
>>>wrote:
>>>>
>>>> http://hbase.apache.org/book.html#schema.versions
>>>>
>>>>
>>>> I believe if you set that to 0 it should disable the versioning.
>>>>
>>>>
>>>>
>>>> On 10/4/11 2:21 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>>>
>>>>>I guess what I'm asking is there a way to set "infinite" or no max
>>>>>bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
>>>>>call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
>>>>>guess>)?  If a large guess is the way to go, what sort of overhead
>>>>>costs might we need to consider when finding the right balance point
>>>>>between room to grow and the maintenance support cost of needing to
>>>>>expand later?
>>>>>
>>>>>We plan on building MapReduce jobs to clean up versions based on some
>>>>>conditions so the value shouldn't get that large but the conditions
>>>>>for cleaning up those versions might be decided by other consumers of
>>>>>the service.  So having room to grow is ideal.
>>>>>
>>>>>On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
>>>>><do...@explorysmedical.com> wrote:
>>>>>>
>>>>>> Hi there-
>>>>>>
>>>>>> re:  "i don't care store them all"
>>>>>>
>>>>>>
>>>>>> What do you mean?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>>>>>
>>>>>>>In reading the documentation all I've seen suggestions on how to set
>>>>>>>the value and the default value.  However I haven't seen any
>>>>>>>indication how to set the value to "i don't care store them all" or
>>>>>>>if
>>>>>>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>>>>>>>know?
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Micah
>>>>>>>
>>>>>>>[1] -
>>>>>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescr
>>>>>>>ip
>>>>>>>to
>>>>>>>r.
>>>>>>>html#setMaxVersions(int)
>>>>>>
>>>>>>
>>>>
>>>>

----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by Micah Whitacre <mk...@gmail.com>.
Are there any negative performance aspects to setting the max versions
to a large value if those extra stored versions are not used?  If I
set the max to 10k but really only store 100, there is not extra
diskspace/memory being consumed by the potential of having more
versions is there?  Also what about the inverse of writing, gets?  If
my "Gets" all call get.setMaxVersion() does setting that value to
being extremely despite there not being versions cause performance
problems?

Thanks for the help,
Micah

On Tue, Oct 4, 2011 at 10:52 PM, lars hofhansl <lh...@yahoo.com> wrote:
> MaxVersions and MinVersions are different features.
> MaxVersion identifies the max number of versions you want to keep (just to state the obvious).
> MinVersion is used together with TTL (and soon with deletes - see HBASE-4536), to indicate the minimum number of version you want keep around even when they should be expired or were deleted.
>
>
> There is no way to disable MaxVersions, just set it to a very large number.
>
> MinVersions is by default disabled (setting is 0), which means rows past their TTL and deleted rows will be removed during compaction.
>
>
> I am thinking about how to state that more clearly in the documentation.
>
>
> -- Lars
>
>
>
> ________________________________
> From: Doug Meil <do...@explorysmedical.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Sent: Tuesday, October 4, 2011 4:32 PM
> Subject: Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?
>
>
> The default for versioning is 3, unfortunately the sub-section also cites
> (incorrectly) that the min is 0.  That sub-section is trying to indicate
> the minimum legal value.  I am working on clearing that entry up with
> another developer.
>
>
>
>
>
> On 10/4/11 6:04 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>
>>Are you surmising that from the description of setting a minimum version?
>>
>>On Tue, Oct 4, 2011 at 2:31 PM, Doug Meil <do...@explorysmedical.com>
>>wrote:
>>>
>>> http://hbase.apache.org/book.html#schema.versions
>>>
>>>
>>> I believe if you set that to 0 it should disable the versioning.
>>>
>>>
>>>
>>> On 10/4/11 2:21 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>>
>>>>I guess what I'm asking is there a way to set "infinite" or no max
>>>>bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
>>>>call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
>>>>guess>)?  If a large guess is the way to go, what sort of overhead
>>>>costs might we need to consider when finding the right balance point
>>>>between room to grow and the maintenance support cost of needing to
>>>>expand later?
>>>>
>>>>We plan on building MapReduce jobs to clean up versions based on some
>>>>conditions so the value shouldn't get that large but the conditions
>>>>for cleaning up those versions might be decided by other consumers of
>>>>the service.  So having room to grow is ideal.
>>>>
>>>>On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
>>>><do...@explorysmedical.com> wrote:
>>>>>
>>>>> Hi there-
>>>>>
>>>>> re:  "i don't care store them all"
>>>>>
>>>>>
>>>>> What do you mean?
>>>>>
>>>>>
>>>>>
>>>>> On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>>>>
>>>>>>In reading the documentation all I've seen suggestions on how to set
>>>>>>the value and the default value.  However I haven't seen any
>>>>>>indication how to set the value to "i don't care store them all" or if
>>>>>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>>>>>>know?
>>>>>>
>>>>>>Thanks,
>>>>>>Micah
>>>>>>
>>>>>>[1] -
>>>>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescrip
>>>>>>to
>>>>>>r.
>>>>>>html#setMaxVersions(int)
>>>>>
>>>>>
>>>
>>>

Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by lars hofhansl <lh...@yahoo.com>.
MaxVersions and MinVersions are different features.
MaxVersion identifies the max number of versions you want to keep (just to state the obvious).
MinVersion is used together with TTL (and soon with deletes - see HBASE-4536), to indicate the minimum number of version you want keep around even when they should be expired or were deleted.


There is no way to disable MaxVersions, just set it to a very large number.

MinVersions is by default disabled (setting is 0), which means rows past their TTL and deleted rows will be removed during compaction.


I am thinking about how to state that more clearly in the documentation.


-- Lars



________________________________
From: Doug Meil <do...@explorysmedical.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Sent: Tuesday, October 4, 2011 4:32 PM
Subject: Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?


The default for versioning is 3, unfortunately the sub-section also cites
(incorrectly) that the min is 0.  That sub-section is trying to indicate
the minimum legal value.  I am working on clearing that entry up with
another developer.





On 10/4/11 6:04 PM, "Micah Whitacre" <mk...@gmail.com> wrote:

>Are you surmising that from the description of setting a minimum version?
>
>On Tue, Oct 4, 2011 at 2:31 PM, Doug Meil <do...@explorysmedical.com>
>wrote:
>>
>> http://hbase.apache.org/book.html#schema.versions
>>
>>
>> I believe if you set that to 0 it should disable the versioning.
>>
>>
>>
>> On 10/4/11 2:21 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>
>>>I guess what I'm asking is there a way to set "infinite" or no max
>>>bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
>>>call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
>>>guess>)?  If a large guess is the way to go, what sort of overhead
>>>costs might we need to consider when finding the right balance point
>>>between room to grow and the maintenance support cost of needing to
>>>expand later?
>>>
>>>We plan on building MapReduce jobs to clean up versions based on some
>>>conditions so the value shouldn't get that large but the conditions
>>>for cleaning up those versions might be decided by other consumers of
>>>the service.  So having room to grow is ideal.
>>>
>>>On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
>>><do...@explorysmedical.com> wrote:
>>>>
>>>> Hi there-
>>>>
>>>> re:  "i don't care store them all"
>>>>
>>>>
>>>> What do you mean?
>>>>
>>>>
>>>>
>>>> On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>>>
>>>>>In reading the documentation all I've seen suggestions on how to set
>>>>>the value and the default value.  However I haven't seen any
>>>>>indication how to set the value to "i don't care store them all" or if
>>>>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>>>>>know?
>>>>>
>>>>>Thanks,
>>>>>Micah
>>>>>
>>>>>[1] -
>>>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescrip
>>>>>to
>>>>>r.
>>>>>html#setMaxVersions(int)
>>>>
>>>>
>>
>>

Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by Doug Meil <do...@explorysmedical.com>.
The default for versioning is 3, unfortunately the sub-section also cites
(incorrectly) that the min is 0.  That sub-section is trying to indicate
the minimum legal value.  I am working on clearing that entry up with
another developer.





On 10/4/11 6:04 PM, "Micah Whitacre" <mk...@gmail.com> wrote:

>Are you surmising that from the description of setting a minimum version?
>
>On Tue, Oct 4, 2011 at 2:31 PM, Doug Meil <do...@explorysmedical.com>
>wrote:
>>
>> http://hbase.apache.org/book.html#schema.versions
>>
>>
>> I believe if you set that to 0 it should disable the versioning.
>>
>>
>>
>> On 10/4/11 2:21 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>
>>>I guess what I'm asking is there a way to set "infinite" or no max
>>>bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
>>>call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
>>>guess>)?  If a large guess is the way to go, what sort of overhead
>>>costs might we need to consider when finding the right balance point
>>>between room to grow and the maintenance support cost of needing to
>>>expand later?
>>>
>>>We plan on building MapReduce jobs to clean up versions based on some
>>>conditions so the value shouldn't get that large but the conditions
>>>for cleaning up those versions might be decided by other consumers of
>>>the service.  So having room to grow is ideal.
>>>
>>>On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
>>><do...@explorysmedical.com> wrote:
>>>>
>>>> Hi there-
>>>>
>>>> re:  "i don't care store them all"
>>>>
>>>>
>>>> What do you mean?
>>>>
>>>>
>>>>
>>>> On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>>>
>>>>>In reading the documentation all I've seen suggestions on how to set
>>>>>the value and the default value.  However I haven't seen any
>>>>>indication how to set the value to "i don't care store them all" or if
>>>>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>>>>>know?
>>>>>
>>>>>Thanks,
>>>>>Micah
>>>>>
>>>>>[1] -
>>>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescrip
>>>>>to
>>>>>r.
>>>>>html#setMaxVersions(int)
>>>>
>>>>
>>
>>


Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by Micah Whitacre <mk...@gmail.com>.
Are you surmising that from the description of setting a minimum version?

On Tue, Oct 4, 2011 at 2:31 PM, Doug Meil <do...@explorysmedical.com> wrote:
>
> http://hbase.apache.org/book.html#schema.versions
>
>
> I believe if you set that to 0 it should disable the versioning.
>
>
>
> On 10/4/11 2:21 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>
>>I guess what I'm asking is there a way to set "infinite" or no max
>>bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
>>call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
>>guess>)?  If a large guess is the way to go, what sort of overhead
>>costs might we need to consider when finding the right balance point
>>between room to grow and the maintenance support cost of needing to
>>expand later?
>>
>>We plan on building MapReduce jobs to clean up versions based on some
>>conditions so the value shouldn't get that large but the conditions
>>for cleaning up those versions might be decided by other consumers of
>>the service.  So having room to grow is ideal.
>>
>>On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
>><do...@explorysmedical.com> wrote:
>>>
>>> Hi there-
>>>
>>> re:  "i don't care store them all"
>>>
>>>
>>> What do you mean?
>>>
>>>
>>>
>>> On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>>
>>>>In reading the documentation all I've seen suggestions on how to set
>>>>the value and the default value.  However I haven't seen any
>>>>indication how to set the value to "i don't care store them all" or if
>>>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>>>>know?
>>>>
>>>>Thanks,
>>>>Micah
>>>>
>>>>[1] -
>>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescripto
>>>>r.
>>>>html#setMaxVersions(int)
>>>
>>>
>
>

Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by Doug Meil <do...@explorysmedical.com>.
http://hbase.apache.org/book.html#schema.versions


I believe if you set that to 0 it should disable the versioning.



On 10/4/11 2:21 PM, "Micah Whitacre" <mk...@gmail.com> wrote:

>I guess what I'm asking is there a way to set "infinite" or no max
>bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
>call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
>guess>)?  If a large guess is the way to go, what sort of overhead
>costs might we need to consider when finding the right balance point
>between room to grow and the maintenance support cost of needing to
>expand later?
>
>We plan on building MapReduce jobs to clean up versions based on some
>conditions so the value shouldn't get that large but the conditions
>for cleaning up those versions might be decided by other consumers of
>the service.  So having room to grow is ideal.
>
>On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
><do...@explorysmedical.com> wrote:
>>
>> Hi there-
>>
>> re:  "i don't care store them all"
>>
>>
>> What do you mean?
>>
>>
>>
>> On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>>
>>>In reading the documentation all I've seen suggestions on how to set
>>>the value and the default value.  However I haven't seen any
>>>indication how to set the value to "i don't care store them all" or if
>>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>>>know?
>>>
>>>Thanks,
>>>Micah
>>>
>>>[1] -
>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescripto
>>>r.
>>>html#setMaxVersions(int)
>>
>>


Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by Micah Whitacre <mk...@gmail.com>.
I guess what I'm asking is there a way to set "infinite" or no max
bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
guess>)?  If a large guess is the way to go, what sort of overhead
costs might we need to consider when finding the right balance point
between room to grow and the maintenance support cost of needing to
expand later?

We plan on building MapReduce jobs to clean up versions based on some
conditions so the value shouldn't get that large but the conditions
for cleaning up those versions might be decided by other consumers of
the service.  So having room to grow is ideal.

On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
<do...@explorysmedical.com> wrote:
>
> Hi there-
>
> re:  "i don't care store them all"
>
>
> What do you mean?
>
>
>
> On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:
>
>>In reading the documentation all I've seen suggestions on how to set
>>the value and the default value.  However I haven't seen any
>>indication how to set the value to "i don't care store them all" or if
>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>>know?
>>
>>Thanks,
>>Micah
>>
>>[1] -
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.
>>html#setMaxVersions(int)
>
>

Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

re:  "i don't care store them all"


What do you mean?



On 10/4/11 12:20 PM, "Micah Whitacre" <mk...@gmail.com> wrote:

>In reading the documentation all I've seen suggestions on how to set
>the value and the default value.  However I haven't seen any
>indication how to set the value to "i don't care store them all" or if
>there is a maximum bounds aside from Integer.MAX_VALUE.  Does anyone
>know?
>
>Thanks,
>Micah
>
>[1] - 
>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.
>html#setMaxVersions(int)