You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jameson Lopp <ja...@bronto.com> on 2011/09/29 20:04:14 UTC

setTimeRange for HBase Increment

I wish to store a count of 30-day trailing event data (e.g. # of clicks 
in past 30 days) and ended up reading the documentation for setTimeRange 
in the Increment operation. 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28%29

I was hoping someone could clarify if it works as I'm imagining in this 
example scenario.

1) Current click count is 0

2) I process a click and I perform an increment operation with the time 
range set to minStamp = now and maxStamp = 30 days from now

3) I query for the value immediately and find it to be 1

4) Assuming no other clicks come in, if I query for the value in 31 
days, it will be returned as 0

In essence, I'm looking for a way to set a TTL on my increment 
operation. Is this how it actually works? The documentation is a bit 
vague and I could imagine several other scenarios.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc

Re: setTimeRange for HBase Increment

Posted by Gary Helmling <gh...@gmail.com>.
If you just need the increments to not be visible when > 30 days old, then
put the increment columns in their own column family and set TTL=2592000 (30
days in seconds).

Note that the timestamp is updated on each increment, so a column that
always receives increments before the TTL window runs out will never expire.

Is this the problem?  Are you looking to do rolling expiration of the
increment values?  If so you could do some combination of increments with
limited time ranges (always set minStamp to 12:00am of the current day to
roll over to a new version per day) or represent the truncated date in
either the column qualifier or row key.  This way you're incrementing
(aggregating) over limited periods to allow for data expiration, and can
easily do summing for the period you're concerned with.  Again, openTSDB
does some smart things with efficiently constructing keys for these types of
scenarios, so it's definitely worth looking at.

If neither of these really addresses what you're looking for, maybe you can
explain your requirements in a bit more detail?  HBase schema design is a
fine art, but it helps to be able to see the big picture.


--gh

On Tue, Oct 4, 2011 at 11:14 AM, Jameson Lopp <ja...@bronto.com> wrote:

> Thanks, that makes sense. Unfortunately, it sounds like this feature is
> unable to solve my particular problem...
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 10/04/2011 01:36 PM, Gary Helmling wrote:
>
>> Jameson,
>>
>> The TimeRange you set on the Increment is used in looking up the previous
>> value that you'll be incrementing.  It's not stored with the incremented
>> value as a data "lifetime" or anything.  If a previously stored value is
>> found within the given time range, it will be incremented.  If no value is
>> found within that range, a new value is stored with using the value from
>> your Increment.
>>
>> As other have already covered, if you're looking for auto-cleanup of data
>> you would set a TTL on the column family.
>>
>> So let me tweak your scenario a bit to explain how it might work:
>>
>> 0) Say you have a previous value on column "c1" of 2, last incremented 31
>> days ago
>>
>> 1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
>> days, maxStamp = now
>>
>> 2) There is now a new version of "c1", with value=1, timestamp=now.  The
>> previous version, with value=2, timestamp=now - 31 days, still exists and
>> may be automatically cleaned up, subject to your settings for max versions
>> and TTL.  So you would have:
>>
>> c1:
>>   - v2: ts=now, value=1
>>   - v1: ts=now-31days, value=2
>>
>> 3) Reading the current value of "c1" will return 1
>>
>> 4a) If you repeat step #1 in 31 days from now, you would wind up with a
>> third version of "c1", again with value=1:
>>
>> c1:
>>   - v3: ts=now, value=1
>>   - v2: ts=now-31days, value=1
>>   - v1: ts=now-62days, value=2
>>
>> 4b) If you instead repeat step #1 31 days from now, but using minStamp=now
>> -
>> 60 days, maxStamp=now, then you would be incrementing the existing "v2" of
>> "c1", since it falls within the time range:
>>
>> c1:
>>   - v2: ts=now, value=2
>>   - v1: ts=now-62days, value=2
>>
>>
>> I hope this clarifies things.
>>
>> --gh
>>
>>
>> On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp<ja...@bronto.com>
>>  wrote:
>>
>>  Thanks! Nevertheless, can anyone confirm / deny if the scenario I
>>> described
>>> would play out in that manner? Just want to make sure I understand the
>>> functionality.
>>>
>>>
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc
>>>
>>> On 09/29/2011 03:32 PM, Doug Meil wrote:
>>>
>>>
>>>> Here are a few links on table cleanup and major compactions...
>>>>
>>>> http://hbase.apache.org/book.****html#schema.minversions<http://hbase.apache.org/book.**html#schema.minversions>
>>>> <http:**//hbase.apache.org/book.html#**schema.minversions<http://hbase.apache.org/book.html#schema.minversions>>
>>>>   (ttl related)
>>>>
>>>> http://hbase.apache.org/book.****html#perf.deleting.queue<http://hbase.apache.org/book.**html#perf.deleting.queue>
>>>> <http**://hbase.apache.org/book.html#**perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>
>>>> >
>>>>
>>>> http://hbase.apache.org/book.****html#compaction<http://hbase.apache.org/book.**html#compaction>
>>>> <http://hbase.**apache.org/book.html#**compaction<http://hbase.apache.org/book.html#compaction>
>>>> >
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 9/29/11 2:29 PM, "Ted Yu"<yu...@gmail.com>   wrote:
>>>>
>>>>  Doug Meil may point you to related doc.
>>>>
>>>>>
>>>>> Take a look at this as well:
>>>>> https://issues.apache.org/****jira/browse/HBASE-4241<https://issues.apache.org/**jira/browse/HBASE-4241>
>>>>> <https:/**/issues.apache.org/jira/**browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>
>>>>> >
>>>>>
>>>>>
>>>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<ja...@bronto.com>
>>>>>  wrote:
>>>>>
>>>>>  Hm, well I didn't mention a number of other requirements for the
>>>>> feature
>>>>>
>>>>>> I'm building, but long story short, I need to keep track of millions
>>>>>> to
>>>>>> billions of these counters and need the lookup time to be as close to
>>>>>> constant time as possible, thus I was really hoping to avoid doing
>>>>>> table
>>>>>> scans.
>>>>>>
>>>>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>>>>> article / documentation I could read about it? Google wasn't very
>>>>>> helpful.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jameson Lopp
>>>>>> Software Engineer
>>>>>> Bronto Software, Inc
>>>>>>
>>>>>>
>>>>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>>>>
>>>>>>  My advice usually regarding timestamps is if it's part of your data
>>>>>>
>>>>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>>>>> overloading the HBase timestamps is a bad idea, especially with
>>>>>>> counters since there's auto-pruning done in the Memstore!
>>>>>>>
>>>>>>> I would suggest you make time part of your row key, maybe one counter
>>>>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>>>>> that refers to some event id.
>>>>>>>
>>>>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<ja...@bronto.com>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>  I wish to store a count of 30-day trailing event data (e.g. # of
>>>>>>>
>>>>>>>> clicks
>>>>>>>> in
>>>>>>>> past 30 days) and ended up reading the documentation for
>>>>>>>> setTimeRange
>>>>>>>> in
>>>>>>>> the
>>>>>>>> Increment operation.
>>>>>>>> http://hbase.apache.org/******apidocs/org/apache/hadoop/**<http://hbase.apache.org/****apidocs/org/apache/hadoop/**>
>>>>>>>> <h**ttp://hbase.apache.org/****apidocs/org/apache/hadoop/**<http://hbase.apache.org/**apidocs/org/apache/hadoop/**>
>>>>>>>> >
>>>>>>>>
>>>>>>>> hbase/client/Increment.html#******getTimeRange%28%29<http://**
>>>>>>>> hbase.apache.or<http://hbase.**apache.or <http://hbase.apache.or>>
>>>>>>>> g/apidocs/org/apache/hadoop/****hbase/client/Increment.html#**
>>>>>>>>
>>>>>>>> getTimeRange%28
>>>>>>>> %29>
>>>>>>>>
>>>>>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>>>>> this
>>>>>>>> example scenario.
>>>>>>>>
>>>>>>>> 1) Current click count is 0
>>>>>>>>
>>>>>>>> 2) I process a click and I perform an increment operation with the
>>>>>>>> time
>>>>>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>>>>>
>>>>>>>> 3) I query for the value immediately and find it to be 1
>>>>>>>>
>>>>>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>>>>> days,
>>>>>>>> it
>>>>>>>> will be returned as 0
>>>>>>>>
>>>>>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>>>>> operation.
>>>>>>>> Is
>>>>>>>> this how it actually works? The documentation is a bit vague and I
>>>>>>>> could
>>>>>>>> imagine several other scenarios.
>>>>>>>> --
>>>>>>>> Jameson Lopp
>>>>>>>> Software Engineer
>>>>>>>> Bronto Software, Inc
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>

Re: setTimeRange for HBase Increment

Posted by Jameson Lopp <ja...@bronto.com>.
Thanks, that makes sense. Unfortunately, it sounds like this feature is 
unable to solve my particular problem...
--
Jameson Lopp
Software Engineer
Bronto Software, Inc

On 10/04/2011 01:36 PM, Gary Helmling wrote:
> Jameson,
>
> The TimeRange you set on the Increment is used in looking up the previous
> value that you'll be incrementing.  It's not stored with the incremented
> value as a data "lifetime" or anything.  If a previously stored value is
> found within the given time range, it will be incremented.  If no value is
> found within that range, a new value is stored with using the value from
> your Increment.
>
> As other have already covered, if you're looking for auto-cleanup of data
> you would set a TTL on the column family.
>
> So let me tweak your scenario a bit to explain how it might work:
>
> 0) Say you have a previous value on column "c1" of 2, last incremented 31
> days ago
>
> 1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
> days, maxStamp = now
>
> 2) There is now a new version of "c1", with value=1, timestamp=now.  The
> previous version, with value=2, timestamp=now - 31 days, still exists and
> may be automatically cleaned up, subject to your settings for max versions
> and TTL.  So you would have:
>
> c1:
>    - v2: ts=now, value=1
>    - v1: ts=now-31days, value=2
>
> 3) Reading the current value of "c1" will return 1
>
> 4a) If you repeat step #1 in 31 days from now, you would wind up with a
> third version of "c1", again with value=1:
>
> c1:
>    - v3: ts=now, value=1
>    - v2: ts=now-31days, value=1
>    - v1: ts=now-62days, value=2
>
> 4b) If you instead repeat step #1 31 days from now, but using minStamp=now -
> 60 days, maxStamp=now, then you would be incrementing the existing "v2" of
> "c1", since it falls within the time range:
>
> c1:
>    - v2: ts=now, value=2
>    - v1: ts=now-62days, value=2
>
>
> I hope this clarifies things.
>
> --gh
>
>
> On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp<ja...@bronto.com>  wrote:
>
>> Thanks! Nevertheless, can anyone confirm / deny if the scenario I described
>> would play out in that manner? Just want to make sure I understand the
>> functionality.
>>
>>
>> --
>> Jameson Lopp
>> Software Engineer
>> Bronto Software, Inc
>>
>> On 09/29/2011 03:32 PM, Doug Meil wrote:
>>
>>>
>>> Here are a few links on table cleanup and major compactions...
>>>
>>> http://hbase.apache.org/book.**html#schema.minversions<http://hbase.apache.org/book.html#schema.minversions>   (ttl related)
>>>
>>> http://hbase.apache.org/book.**html#perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>
>>>
>>> http://hbase.apache.org/book.**html#compaction<http://hbase.apache.org/book.html#compaction>
>>>
>>>
>>>
>>>
>>>
>>> On 9/29/11 2:29 PM, "Ted Yu"<yu...@gmail.com>   wrote:
>>>
>>>   Doug Meil may point you to related doc.
>>>>
>>>> Take a look at this as well:
>>>> https://issues.apache.org/**jira/browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>
>>>>
>>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<ja...@bronto.com>
>>>>   wrote:
>>>>
>>>>   Hm, well I didn't mention a number of other requirements for the feature
>>>>> I'm building, but long story short, I need to keep track of millions to
>>>>> billions of these counters and need the lookup time to be as close to
>>>>> constant time as possible, thus I was really hoping to avoid doing table
>>>>> scans.
>>>>>
>>>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>>>> article / documentation I could read about it? Google wasn't very
>>>>> helpful.
>>>>>
>>>>>
>>>>> --
>>>>> Jameson Lopp
>>>>> Software Engineer
>>>>> Bronto Software, Inc
>>>>>
>>>>>
>>>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>>>
>>>>>   My advice usually regarding timestamps is if it's part of your data
>>>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>>>> overloading the HBase timestamps is a bad idea, especially with
>>>>>> counters since there's auto-pruning done in the Memstore!
>>>>>>
>>>>>> I would suggest you make time part of your row key, maybe one counter
>>>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>>>> that refers to some event id.
>>>>>>
>>>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<ja...@bronto.com>
>>>>>>   wrote:
>>>>>>
>>>>>>   I wish to store a count of 30-day trailing event data (e.g. # of
>>>>>>> clicks
>>>>>>> in
>>>>>>> past 30 days) and ended up reading the documentation for setTimeRange
>>>>>>> in
>>>>>>> the
>>>>>>> Increment operation.
>>>>>>> http://hbase.apache.org/****apidocs/org/apache/hadoop/**<http://hbase.apache.org/**apidocs/org/apache/hadoop/**>
>>>>>>>
>>>>>>> hbase/client/Increment.html#****getTimeRange%28%29<http://**
>>>>>>> hbase.apache.or<http://hbase.apache.or>
>>>>>>> g/apidocs/org/apache/hadoop/**hbase/client/Increment.html#**
>>>>>>> getTimeRange%28
>>>>>>> %29>
>>>>>>>
>>>>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>>>> this
>>>>>>> example scenario.
>>>>>>>
>>>>>>> 1) Current click count is 0
>>>>>>>
>>>>>>> 2) I process a click and I perform an increment operation with the
>>>>>>> time
>>>>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>>>>
>>>>>>> 3) I query for the value immediately and find it to be 1
>>>>>>>
>>>>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>>>> days,
>>>>>>> it
>>>>>>> will be returned as 0
>>>>>>>
>>>>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>>>> operation.
>>>>>>> Is
>>>>>>> this how it actually works? The documentation is a bit vague and I
>>>>>>> could
>>>>>>> imagine several other scenarios.
>>>>>>> --
>>>>>>> Jameson Lopp
>>>>>>> Software Engineer
>>>>>>> Bronto Software, Inc
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>

Re: setTimeRange for HBase Increment

Posted by Gary Helmling <gh...@gmail.com>.
Jameson,

The TimeRange you set on the Increment is used in looking up the previous
value that you'll be incrementing.  It's not stored with the incremented
value as a data "lifetime" or anything.  If a previously stored value is
found within the given time range, it will be incremented.  If no value is
found within that range, a new value is stored with using the value from
your Increment.

As other have already covered, if you're looking for auto-cleanup of data
you would set a TTL on the column family.

So let me tweak your scenario a bit to explain how it might work:

0) Say you have a previous value on column "c1" of 2, last incremented 31
days ago

1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
days, maxStamp = now

2) There is now a new version of "c1", with value=1, timestamp=now.  The
previous version, with value=2, timestamp=now - 31 days, still exists and
may be automatically cleaned up, subject to your settings for max versions
and TTL.  So you would have:

c1:
  - v2: ts=now, value=1
  - v1: ts=now-31days, value=2

3) Reading the current value of "c1" will return 1

4a) If you repeat step #1 in 31 days from now, you would wind up with a
third version of "c1", again with value=1:

c1:
  - v3: ts=now, value=1
  - v2: ts=now-31days, value=1
  - v1: ts=now-62days, value=2

4b) If you instead repeat step #1 31 days from now, but using minStamp=now -
60 days, maxStamp=now, then you would be incrementing the existing "v2" of
"c1", since it falls within the time range:

c1:
  - v2: ts=now, value=2
  - v1: ts=now-62days, value=2


I hope this clarifies things.

--gh


On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp <ja...@bronto.com> wrote:

> Thanks! Nevertheless, can anyone confirm / deny if the scenario I described
> would play out in that manner? Just want to make sure I understand the
> functionality.
>
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 09/29/2011 03:32 PM, Doug Meil wrote:
>
>>
>> Here are a few links on table cleanup and major compactions...
>>
>> http://hbase.apache.org/book.**html#schema.minversions<http://hbase.apache.org/book.html#schema.minversions>  (ttl related)
>>
>> http://hbase.apache.org/book.**html#perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>
>>
>> http://hbase.apache.org/book.**html#compaction<http://hbase.apache.org/book.html#compaction>
>>
>>
>>
>>
>>
>> On 9/29/11 2:29 PM, "Ted Yu"<yu...@gmail.com>  wrote:
>>
>>  Doug Meil may point you to related doc.
>>>
>>> Take a look at this as well:
>>> https://issues.apache.org/**jira/browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>
>>>
>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<ja...@bronto.com>
>>>  wrote:
>>>
>>>  Hm, well I didn't mention a number of other requirements for the feature
>>>> I'm building, but long story short, I need to keep track of millions to
>>>> billions of these counters and need the lookup time to be as close to
>>>> constant time as possible, thus I was really hoping to avoid doing table
>>>> scans.
>>>>
>>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>>> article / documentation I could read about it? Google wasn't very
>>>> helpful.
>>>>
>>>>
>>>> --
>>>> Jameson Lopp
>>>> Software Engineer
>>>> Bronto Software, Inc
>>>>
>>>>
>>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>>
>>>>  My advice usually regarding timestamps is if it's part of your data
>>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>>> overloading the HBase timestamps is a bad idea, especially with
>>>>> counters since there's auto-pruning done in the Memstore!
>>>>>
>>>>> I would suggest you make time part of your row key, maybe one counter
>>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>>> that refers to some event id.
>>>>>
>>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<ja...@bronto.com>
>>>>>  wrote:
>>>>>
>>>>>  I wish to store a count of 30-day trailing event data (e.g. # of
>>>>>> clicks
>>>>>> in
>>>>>> past 30 days) and ended up reading the documentation for setTimeRange
>>>>>> in
>>>>>> the
>>>>>> Increment operation.
>>>>>> http://hbase.apache.org/****apidocs/org/apache/hadoop/**<http://hbase.apache.org/**apidocs/org/apache/hadoop/**>
>>>>>>
>>>>>> hbase/client/Increment.html#****getTimeRange%28%29<http://**
>>>>>> hbase.apache.or <http://hbase.apache.or>
>>>>>> g/apidocs/org/apache/hadoop/**hbase/client/Increment.html#**
>>>>>> getTimeRange%28
>>>>>> %29>
>>>>>>
>>>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>>> this
>>>>>> example scenario.
>>>>>>
>>>>>> 1) Current click count is 0
>>>>>>
>>>>>> 2) I process a click and I perform an increment operation with the
>>>>>> time
>>>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>>>
>>>>>> 3) I query for the value immediately and find it to be 1
>>>>>>
>>>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>>> days,
>>>>>> it
>>>>>> will be returned as 0
>>>>>>
>>>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>>> operation.
>>>>>> Is
>>>>>> this how it actually works? The documentation is a bit vague and I
>>>>>> could
>>>>>> imagine several other scenarios.
>>>>>> --
>>>>>> Jameson Lopp
>>>>>> Software Engineer
>>>>>> Bronto Software, Inc
>>>>>>
>>>>>>
>>>>>>
>>

Re: setTimeRange for HBase Increment

Posted by Jameson Lopp <ja...@bronto.com>.
Thanks! Nevertheless, can anyone confirm / deny if the scenario I 
described would play out in that manner? Just want to make sure I 
understand the functionality.

--
Jameson Lopp
Software Engineer
Bronto Software, Inc

On 09/29/2011 03:32 PM, Doug Meil wrote:
>
> Here are a few links on table cleanup and major compactions...
>
> http://hbase.apache.org/book.html#schema.minversions   (ttl related)
>
> http://hbase.apache.org/book.html#perf.deleting.queue
>
> http://hbase.apache.org/book.html#compaction
>
>
>
>
>
> On 9/29/11 2:29 PM, "Ted Yu"<yu...@gmail.com>  wrote:
>
>> Doug Meil may point you to related doc.
>>
>> Take a look at this as well:
>> https://issues.apache.org/jira/browse/HBASE-4241
>>
>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<ja...@bronto.com>  wrote:
>>
>>> Hm, well I didn't mention a number of other requirements for the feature
>>> I'm building, but long story short, I need to keep track of millions to
>>> billions of these counters and need the lookup time to be as close to
>>> constant time as possible, thus I was really hoping to avoid doing table
>>> scans.
>>>
>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>> article / documentation I could read about it? Google wasn't very
>>> helpful.
>>>
>>>
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc
>>>
>>>
>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>
>>>> My advice usually regarding timestamps is if it's part of your data
>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>> overloading the HBase timestamps is a bad idea, especially with
>>>> counters since there's auto-pruning done in the Memstore!
>>>>
>>>> I would suggest you make time part of your row key, maybe one counter
>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>> that refers to some event id.
>>>>
>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>>
>>>> J-D
>>>>
>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<ja...@bronto.com>
>>>>   wrote:
>>>>
>>>>> I wish to store a count of 30-day trailing event data (e.g. # of
>>>>> clicks
>>>>> in
>>>>> past 30 days) and ended up reading the documentation for setTimeRange
>>>>> in
>>>>> the
>>>>> Increment operation.
>>>>> http://hbase.apache.org/**apidocs/org/apache/hadoop/**
>>>>>
>>>>> hbase/client/Increment.html#**getTimeRange%28%29<http://hbase.apache.or
>>>>> g/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28
>>>>> %29>
>>>>>
>>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>> this
>>>>> example scenario.
>>>>>
>>>>> 1) Current click count is 0
>>>>>
>>>>> 2) I process a click and I perform an increment operation with the
>>>>> time
>>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>>
>>>>> 3) I query for the value immediately and find it to be 1
>>>>>
>>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>> days,
>>>>> it
>>>>> will be returned as 0
>>>>>
>>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>> operation.
>>>>> Is
>>>>> this how it actually works? The documentation is a bit vague and I
>>>>> could
>>>>> imagine several other scenarios.
>>>>> --
>>>>> Jameson Lopp
>>>>> Software Engineer
>>>>> Bronto Software, Inc
>>>>>
>>>>>
>

Re: setTimeRange for HBase Increment

Posted by Doug Meil <do...@explorysmedical.com>.
Here are a few links on table cleanup and major compactions...

http://hbase.apache.org/book.html#schema.minversions   (ttl related)

http://hbase.apache.org/book.html#perf.deleting.queue

http://hbase.apache.org/book.html#compaction





On 9/29/11 2:29 PM, "Ted Yu" <yu...@gmail.com> wrote:

>Doug Meil may point you to related doc.
>
>Take a look at this as well:
>https://issues.apache.org/jira/browse/HBASE-4241
>
>On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp <ja...@bronto.com> wrote:
>
>> Hm, well I didn't mention a number of other requirements for the feature
>> I'm building, but long story short, I need to keep track of millions to
>> billions of these counters and need the lookup time to be as close to
>> constant time as possible, thus I was really hoping to avoid doing table
>> scans.
>>
>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>> article / documentation I could read about it? Google wasn't very
>>helpful.
>>
>>
>> --
>> Jameson Lopp
>> Software Engineer
>> Bronto Software, Inc
>>
>>
>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>
>>> My advice usually regarding timestamps is if it's part of your data
>>> model, it should appear somewhere in an HBase key. 99% of the time
>>> overloading the HBase timestamps is a bad idea, especially with
>>> counters since there's auto-pruning done in the Memstore!
>>>
>>> I would suggest you make time part of your row key, maybe one counter
>>> per day, and then set the TTL on your table to 30 days. Then all you
>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>> that refers to some event id.
>>>
>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>
>>> J-D
>>>
>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<ja...@bronto.com>
>>>  wrote:
>>>
>>>> I wish to store a count of 30-day trailing event data (e.g. # of
>>>>clicks
>>>> in
>>>> past 30 days) and ended up reading the documentation for setTimeRange
>>>>in
>>>> the
>>>> Increment operation.
>>>> http://hbase.apache.org/**apidocs/org/apache/hadoop/**
>>>> 
>>>>hbase/client/Increment.html#**getTimeRange%28%29<http://hbase.apache.or
>>>>g/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28
>>>>%29>
>>>>
>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>this
>>>> example scenario.
>>>>
>>>> 1) Current click count is 0
>>>>
>>>> 2) I process a click and I perform an increment operation with the
>>>>time
>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>
>>>> 3) I query for the value immediately and find it to be 1
>>>>
>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>days,
>>>> it
>>>> will be returned as 0
>>>>
>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>operation.
>>>> Is
>>>> this how it actually works? The documentation is a bit vague and I
>>>>could
>>>> imagine several other scenarios.
>>>> --
>>>> Jameson Lopp
>>>> Software Engineer
>>>> Bronto Software, Inc
>>>>
>>>>


Re: setTimeRange for HBase Increment

Posted by Ted Yu <yu...@gmail.com>.
Doug Meil may point you to related doc.

Take a look at this as well:
https://issues.apache.org/jira/browse/HBASE-4241

On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp <ja...@bronto.com> wrote:

> Hm, well I didn't mention a number of other requirements for the feature
> I'm building, but long story short, I need to keep track of millions to
> billions of these counters and need the lookup time to be as close to
> constant time as possible, thus I was really hoping to avoid doing table
> scans.
>
> I'll admit I know nothing of the dangers of auto-pruning; is there an
> article / documentation I could read about it? Google wasn't very helpful.
>
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
>
> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>
>> My advice usually regarding timestamps is if it's part of your data
>> model, it should appear somewhere in an HBase key. 99% of the time
>> overloading the HBase timestamps is a bad idea, especially with
>> counters since there's auto-pruning done in the Memstore!
>>
>> I would suggest you make time part of your row key, maybe one counter
>> per day, and then set the TTL on your table to 30 days. Then all you
>> need to do is a sequential scan for those 30 days maybe with a prefix
>> that refers to some event id.
>>
>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>
>> J-D
>>
>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<ja...@bronto.com>
>>  wrote:
>>
>>> I wish to store a count of 30-day trailing event data (e.g. # of clicks
>>> in
>>> past 30 days) and ended up reading the documentation for setTimeRange in
>>> the
>>> Increment operation.
>>> http://hbase.apache.org/**apidocs/org/apache/hadoop/**
>>> hbase/client/Increment.html#**getTimeRange%28%29<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28%29>
>>>
>>> I was hoping someone could clarify if it works as I'm imagining in this
>>> example scenario.
>>>
>>> 1) Current click count is 0
>>>
>>> 2) I process a click and I perform an increment operation with the time
>>> range set to minStamp = now and maxStamp = 30 days from now
>>>
>>> 3) I query for the value immediately and find it to be 1
>>>
>>> 4) Assuming no other clicks come in, if I query for the value in 31 days,
>>> it
>>> will be returned as 0
>>>
>>> In essence, I'm looking for a way to set a TTL on my increment operation.
>>> Is
>>> this how it actually works? The documentation is a bit vague and I could
>>> imagine several other scenarios.
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc
>>>
>>>

Re: setTimeRange for HBase Increment

Posted by Jameson Lopp <ja...@bronto.com>.
Hm, well I didn't mention a number of other requirements for the feature 
I'm building, but long story short, I need to keep track of millions to 
billions of these counters and need the lookup time to be as close to 
constant time as possible, thus I was really hoping to avoid doing table 
scans.

I'll admit I know nothing of the dangers of auto-pruning; is there an 
article / documentation I could read about it? Google wasn't very helpful.

--
Jameson Lopp
Software Engineer
Bronto Software, Inc


On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
> My advice usually regarding timestamps is if it's part of your data
> model, it should appear somewhere in an HBase key. 99% of the time
> overloading the HBase timestamps is a bad idea, especially with
> counters since there's auto-pruning done in the Memstore!
>
> I would suggest you make time part of your row key, maybe one counter
> per day, and then set the TTL on your table to 30 days. Then all you
> need to do is a sequential scan for those 30 days maybe with a prefix
> that refers to some event id.
>
> OpenTSDB is another way of doing it: http://opentsdb.net/
>
> J-D
>
> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<ja...@bronto.com>  wrote:
>> I wish to store a count of 30-day trailing event data (e.g. # of clicks in
>> past 30 days) and ended up reading the documentation for setTimeRange in the
>> Increment operation.
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28%29
>>
>> I was hoping someone could clarify if it works as I'm imagining in this
>> example scenario.
>>
>> 1) Current click count is 0
>>
>> 2) I process a click and I perform an increment operation with the time
>> range set to minStamp = now and maxStamp = 30 days from now
>>
>> 3) I query for the value immediately and find it to be 1
>>
>> 4) Assuming no other clicks come in, if I query for the value in 31 days, it
>> will be returned as 0
>>
>> In essence, I'm looking for a way to set a TTL on my increment operation. Is
>> this how it actually works? The documentation is a bit vague and I could
>> imagine several other scenarios.
>> --
>> Jameson Lopp
>> Software Engineer
>> Bronto Software, Inc
>>

Re: setTimeRange for HBase Increment

Posted by Jean-Daniel Cryans <jd...@apache.org>.
My advice usually regarding timestamps is if it's part of your data
model, it should appear somewhere in an HBase key. 99% of the time
overloading the HBase timestamps is a bad idea, especially with
counters since there's auto-pruning done in the Memstore!

I would suggest you make time part of your row key, maybe one counter
per day, and then set the TTL on your table to 30 days. Then all you
need to do is a sequential scan for those 30 days maybe with a prefix
that refers to some event id.

OpenTSDB is another way of doing it: http://opentsdb.net/

J-D

On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp <ja...@bronto.com> wrote:
> I wish to store a count of 30-day trailing event data (e.g. # of clicks in
> past 30 days) and ended up reading the documentation for setTimeRange in the
> Increment operation.
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28%29
>
> I was hoping someone could clarify if it works as I'm imagining in this
> example scenario.
>
> 1) Current click count is 0
>
> 2) I process a click and I perform an increment operation with the time
> range set to minStamp = now and maxStamp = 30 days from now
>
> 3) I query for the value immediately and find it to be 1
>
> 4) Assuming no other clicks come in, if I query for the value in 31 days, it
> will be returned as 0
>
> In essence, I'm looking for a way to set a TTL on my increment operation. Is
> this how it actually works? The documentation is a bit vague and I could
> imagine several other scenarios.
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>