You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shawn Hermans <sh...@gmail.com> on 2013/12/05 23:47:20 UTC

Practical Upper Limit on Number of Version Stored?

All,
I am working on an HBase application where we store user events in an HBase
table.  The row key is the a user identifier and each column is an event
identifier.  Most users only have a handful of events (10 or less), but
some users have a few hundred thousand events or more and this causes
issues when an HBase client tries to retrieve all those events.

We are looking at different ways of limiting then number events returned.
 One idea is to store each event using its own column qualifier, but
instead use HBase's versioning capability to store the last 100 to 200
events. It doesn't seem like we would run into issues with this approach,
but I want to see if anyone has had any practical experience in this area.
 The advice given in http://hbase.apache.org/book/schema.versions.html is a
little ambiguous.

Thanks,
Shawn

Re: Practical Upper Limit on Number of Version Stored?

Posted by Shawn Hermans <sh...@gmail.com>.
My guess is 50 to 200 versions.  Row size is around 300KB of data. 

—
Sent from Mailbox for iPhone

On Thu, Dec 5, 2013 at 6:41 PM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:

> Hi Shawn,
> I personnaly like and often suggest this approach. However you need to be
> aware that there is an issue in the current pagination over the versions.
> Except that, your approach will allow you to get the current state by
> looking at the last version,  and the history by looking at all the
> versions. ..
> How big is an event in your system?  How many versions are you planning to
> keep?
> JM
> Le 2013-12-05 17:47, "Shawn Hermans" <sh...@gmail.com> a écrit :
>> All,
>> I am working on an HBase application where we store user events in an HBase
>> table.  The row key is the a user identifier and each column is an event
>> identifier.  Most users only have a handful of events (10 or less), but
>> some users have a few hundred thousand events or more and this causes
>> issues when an HBase client tries to retrieve all those events.
>>
>> We are looking at different ways of limiting then number events returned.
>>  One idea is to store each event using its own column qualifier, but
>> instead use HBase's versioning capability to store the last 100 to 200
>> events. It doesn't seem like we would run into issues with this approach,
>> but I want to see if anyone has had any practical experience in this area.
>>  The advice given in http://hbase.apache.org/book/schema.versions.html is
>> a
>> little ambiguous.
>>
>> Thanks,
>> Shawn
>>

Re: Practical Upper Limit on Number of Version Stored?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Shawn,

I personnaly like and often suggest this approach. However you need to be
aware that there is an issue in the current pagination over the versions.
Except that, your approach will allow you to get the current state by
looking at the last version,  and the history by looking at all the
versions. ..

How big is an event in your system?  How many versions are you planning to
keep?

JM
Le 2013-12-05 17:47, "Shawn Hermans" <sh...@gmail.com> a écrit :

> All,
> I am working on an HBase application where we store user events in an HBase
> table.  The row key is the a user identifier and each column is an event
> identifier.  Most users only have a handful of events (10 or less), but
> some users have a few hundred thousand events or more and this causes
> issues when an HBase client tries to retrieve all those events.
>
> We are looking at different ways of limiting then number events returned.
>  One idea is to store each event using its own column qualifier, but
> instead use HBase's versioning capability to store the last 100 to 200
> events. It doesn't seem like we would run into issues with this approach,
> but I want to see if anyone has had any practical experience in this area.
>  The advice given in http://hbase.apache.org/book/schema.versions.html is
> a
> little ambiguous.
>
> Thanks,
> Shawn
>

Re: Practical Upper Limit on Number of Version Stored?

Posted by Michael Segel <ms...@hotmail.com>.
You want the last n events?
Column name is (Epoch - timestamp)+event name or something
Then just return up to n columns
The events are in reverse order.


Sent from a remote device. Please excuse any typos...

Mike Segel

> On Dec 5, 2013, at 7:27 PM, "Shawn Hermans" <sh...@gmail.com> wrote:
> 
> I guess I don't really understand why I wouldn't want to do this.  For our use case we only really care about the user's last 50 to 200 events.  We don't really care about deleting events explicitly.  More than likely we would enable a TTL to get rid of events older than a certain time.  
> 
> 
> 
> 
> I guess my question is whether or not there is an issue with storing this many versions.  Are there any measurable drawbacks?  
> 
> —
> Sent from Mailbox for iPhone
> 
> On Thu, Dec 5, 2013 at 7:11 PM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> You really don't want to do this. 
>> Its not what the versioning was meant for and it has a couple of serious flaws. 
>> The biggest flaw... what happens when you want to delete a version? ...
>> There are other options... depending on your use case and how you use the events. 
>> Truly using versioning beyond versions of the same data.. not a good idea.
>>> On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com> wrote:
>>> All,
>>> I am working on an HBase application where we store user events in an HBase
>>> table.  The row key is the a user identifier and each column is an event
>>> identifier.  Most users only have a handful of events (10 or less), but
>>> some users have a few hundred thousand events or more and this causes
>>> issues when an HBase client tries to retrieve all those events.
>>> 
>>> We are looking at different ways of limiting then number events returned.
>>> One idea is to store each event using its own column qualifier, but
>>> instead use HBase's versioning capability to store the last 100 to 200
>>> events. It doesn't seem like we would run into issues with this approach,
>>> but I want to see if anyone has had any practical experience in this area.
>>> The advice given in http://hbase.apache.org/book/schema.versions.html is a
>>> little ambiguous.
>>> 
>>> Thanks,
>>> Shawn
>> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
>> Use at your own risk. 
>> Michael Segel
>> michael_segel (AT) hotmail.com

Re: Practical Upper Limit on Number of Version Stored?

Posted by Bryan Beaudreault <bb...@hubspot.com>.
I generally agree with Michael and avoid using versions for anything other
than versioning, but mostly out of personal preference.  That said, I also
agree with JM that 50-200 should be no problem at all.

We did do this in our early days of HBase, and eventually moved away from
it for a few reasons:

1) You get more control without it (we didn't in the beginning but
eventually wanted to do deletes, updates, and other things)

2) Once we had a case where someone got spammed for events, creating
thousands upon thousands of versions.  The excess versions only get cleaned
up on major compaction.  This row eventually grew to the size of a region
and became an operational nightmare, because you can't split inside the row
boundary and other bugs around this at the time (~2 years ago).

3) What do you if for some reason a user has 2 events in the same
millisecond.  This generally doesn't happen or is an error when it does,
but it's nice to expose it or otherwise be able to handle it.  (Note we
initially got around this by munging the version timestamp a bit so we
could include a hashCode in it ... this was an ugly hack).

4) As we wrote more and more hbase code, and most hbase code does not work
in this manner, it became nicer to unify around more normal access patterns
(this is probably mostly preference).


On Thu, Dec 5, 2013 at 8:29 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> And the respons is no.
>
> You don't have that much version. Up to 200 is not critical.
>
> Also you can easily give that a try.
>
> JM
> Le 2013-12-05 20:27, "Shawn Hermans" <sh...@gmail.com> a écrit :
>
> > I guess I don't really understand why I wouldn't want to do this.  For
> our
> > use case we only really care about the user's last 50 to 200 events.  We
> > don't really care about deleting events explicitly.  More than likely we
> > would enable a TTL to get rid of events older than a certain time.
> >
> >
> >
> >
> > I guess my question is whether or not there is an issue with storing this
> > many versions.  Are there any measurable drawbacks?
> >
> > —
> > Sent from Mailbox for iPhone
> >
> > On Thu, Dec 5, 2013 at 7:11 PM, Michael Segel <michael_segel@hotmail.com
> >
> > wrote:
> >
> > > You really don't want to do this.
> > > Its not what the versioning was meant for and it has a couple of
> serious
> > flaws.
> > > The biggest flaw... what happens when you want to delete a version? ...
> > > There are other options... depending on your use case and how you use
> > the events.
> > > Truly using versioning beyond versions of the same data.. not a good
> > idea.
> > > On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com>
> > wrote:
> > >> All,
> > >> I am working on an HBase application where we store user events in an
> > HBase
> > >> table.  The row key is the a user identifier and each column is an
> event
> > >> identifier.  Most users only have a handful of events (10 or less),
> but
> > >> some users have a few hundred thousand events or more and this causes
> > >> issues when an HBase client tries to retrieve all those events.
> > >>
> > >> We are looking at different ways of limiting then number events
> > returned.
> > >> One idea is to store each event using its own column qualifier, but
> > >> instead use HBase's versioning capability to store the last 100 to 200
> > >> events. It doesn't seem like we would run into issues with this
> > approach,
> > >> but I want to see if anyone has had any practical experience in this
> > area.
> > >> The advice given in
> http://hbase.apache.org/book/schema.versions.htmlis a
> > >> little ambiguous.
> > >>
> > >> Thanks,
> > >> Shawn
> > > The opinions expressed here are mine, while they may reflect a
> cognitive
> > thought, that is purely accidental.
> > > Use at your own risk.
> > > Michael Segel
> > > michael_segel (AT) hotmail.com
>

Re: Practical Upper Limit on Number of Version Stored?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
And the respons is no.

You don't have that much version. Up to 200 is not critical.

Also you can easily give that a try.

JM
Le 2013-12-05 20:27, "Shawn Hermans" <sh...@gmail.com> a écrit :

> I guess I don't really understand why I wouldn't want to do this.  For our
> use case we only really care about the user's last 50 to 200 events.  We
> don't really care about deleting events explicitly.  More than likely we
> would enable a TTL to get rid of events older than a certain time.
>
>
>
>
> I guess my question is whether or not there is an issue with storing this
> many versions.  Are there any measurable drawbacks?
>
> —
> Sent from Mailbox for iPhone
>
> On Thu, Dec 5, 2013 at 7:11 PM, Michael Segel <mi...@hotmail.com>
> wrote:
>
> > You really don't want to do this.
> > Its not what the versioning was meant for and it has a couple of serious
> flaws.
> > The biggest flaw... what happens when you want to delete a version? ...
> > There are other options... depending on your use case and how you use
> the events.
> > Truly using versioning beyond versions of the same data.. not a good
> idea.
> > On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com>
> wrote:
> >> All,
> >> I am working on an HBase application where we store user events in an
> HBase
> >> table.  The row key is the a user identifier and each column is an event
> >> identifier.  Most users only have a handful of events (10 or less), but
> >> some users have a few hundred thousand events or more and this causes
> >> issues when an HBase client tries to retrieve all those events.
> >>
> >> We are looking at different ways of limiting then number events
> returned.
> >> One idea is to store each event using its own column qualifier, but
> >> instead use HBase's versioning capability to store the last 100 to 200
> >> events. It doesn't seem like we would run into issues with this
> approach,
> >> but I want to see if anyone has had any practical experience in this
> area.
> >> The advice given in http://hbase.apache.org/book/schema.versions.htmlis a
> >> little ambiguous.
> >>
> >> Thanks,
> >> Shawn
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com

Re: Practical Upper Limit on Number of Version Stored?

Posted by Shawn Hermans <sh...@gmail.com>.
I guess I don't really understand why I wouldn't want to do this.  For our use case we only really care about the user's last 50 to 200 events.  We don't really care about deleting events explicitly.  More than likely we would enable a TTL to get rid of events older than a certain time.  




I guess my question is whether or not there is an issue with storing this many versions.  Are there any measurable drawbacks?  

—
Sent from Mailbox for iPhone

On Thu, Dec 5, 2013 at 7:11 PM, Michael Segel <mi...@hotmail.com>
wrote:

> You really don't want to do this. 
> Its not what the versioning was meant for and it has a couple of serious flaws. 
> The biggest flaw... what happens when you want to delete a version? ...
> There are other options... depending on your use case and how you use the events. 
> Truly using versioning beyond versions of the same data.. not a good idea.
> On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com> wrote:
>> All,
>> I am working on an HBase application where we store user events in an HBase
>> table.  The row key is the a user identifier and each column is an event
>> identifier.  Most users only have a handful of events (10 or less), but
>> some users have a few hundred thousand events or more and this causes
>> issues when an HBase client tries to retrieve all those events.
>> 
>> We are looking at different ways of limiting then number events returned.
>> One idea is to store each event using its own column qualifier, but
>> instead use HBase's versioning capability to store the last 100 to 200
>> events. It doesn't seem like we would run into issues with this approach,
>> but I want to see if anyone has had any practical experience in this area.
>> The advice given in http://hbase.apache.org/book/schema.versions.html is a
>> little ambiguous.
>> 
>> Thanks,
>> Shawn
> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
> Use at your own risk. 
> Michael Segel
> michael_segel (AT) hotmail.com

Re: Practical Upper Limit on Number of Version Stored?

Posted by lars hofhansl <la...@apache.org>.
To add some color... HBase will store version of KeyValues next to each other (at least after a compaction).
If your queries typically request most of the versions of a KV that works out nicely.

If, however, you typically query only the latest version or a specific version then HBase will load all other versions of the KV that happens to be on the same block.
That can be pretty inefficient, up to the point where scanning would require loading a new block for each KV.

HBase currently does not have a good story for the latter scenario. Solutions include a custom compaction policy that separates data along date ranges. That way HBase can rule out entire HFiles if they do not fall into the request time range of the query.

-- Lars



________________________________
 From: Vladimir Rodionov <vr...@carrieriq.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Friday, December 6, 2013 10:33 AM
Subject: RE: Practical Upper Limit on Number of Version Stored?
 

Michael,

Both: columns and timestamps are valid choices. Events have sources and in my approach source is in rowkey and time is in timestamp. 
In your approach you embed time into column qualifier.
Its easy to get last N events in my approach  using  "Give first N key-values"-type of Filter in your approach you need the same type of filter.
TTL will expire old events in both cases.

>Suppose you have event A occurring at time X.
>Then you have event B occurring at time X2.
>Are they the same?
>Based on the OPs limited description A and B are not.
>So why store them as versions as if they were the same?

There are no such things as "same" events. Frankly speaking, I am not following you here. 

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________

From: Michael Segel [michael_segel@hotmail.com]
Sent: Friday, December 06, 2013 4:23 AM
To: user@hbase.apache.org
Subject: Re: Practical Upper Limit on Number of Version Stored?

Look,

Just because you can do something, doesn't mean its a good idea.

From a design perspective its not a good idea.

Ask yourself why does versioning exist?  What purpose does versioning serve in HBase?

From a design perspective you have to ask yourself what are you attempting to do.

Here the OP says ..
"I guess I don't really understand why I wouldn't want to do this.  For our use case we only really care about the user's last 50 to 200 events.  We don't really care about deleting events explicitly.  More than likely we would enable a TTL to get rid of events older than a certain time. "

So his goal is to get the last N events first.

Remember columns are in sort order.
So if you have Event-XXXX or XXXX-Event as your column identifier (name), where XXXX is (Epoc - timestamp) ...
You will have your events in last event first.

This not only achieves what the OP wants, but ... I seem to recall some people posting here about methods to only return N results from a row at a time?


And here's the kicker...

From a design perspective...

Suppose you have event A occurring at time X.
Then you have event B occurring at time X2.

Are they the same?

Based on the OPs limited description A and B are not.
So why store them as versions as if they were the same?

Versioning may make sense if we were talking about an RSVP to a function.
At time T, Bob, may RSVP 'yes'.
At time T1, Bob may RSVP 'tentative'.
At time T2, Bob may RSVP, 'no'.

Each version is describing the same object.
Does that make sense?

Good design is critical...

Just putting it out there.  ;-)




On Dec 5, 2013, at 9:50 PM, Vladimir Rodionov <vr...@carrieriq.com> wrote:

> Version is just a timestamp (event time) => naturally fits time-series (event) types of data.
> Besides this, events are immutable objects, if they are not, not than they are not events.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Thursday, December 05, 2013 5:10 PM
> To: user@hbase.apache.org
> Subject: Re: Practical Upper Limit on Number of Version Stored?
>
> You really don't want to do this.
> Its not what the versioning was meant for and it has a couple of serious flaws.
>
> The biggest flaw... what happens when you want to delete a version? ...
>
> There are other options... depending on your use case and how you use the events.
>
> Truly using versioning beyond versions of the same data.. not a good idea.
>
> On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com> wrote:
>
>> All,
>> I am working on an HBase application where we store user events in an HBase
>> table.  The row key is the a user identifier and each column is an event
>> identifier.  Most users only have a handful of events (10 or less), but
>> some users have a few hundred thousand events or more and this causes
>> issues when an HBase client tries to retrieve all those events.
>>
>> We are looking at different ways of limiting then number events returned.
>> One idea is to store each event using its own column qualifier, but
>> instead use HBase's versioning capability to store the last 100 to 200
>> events. It doesn't seem like we would run into issues with this approach,
>> but I want to see if anyone has had any practical experience in this area.
>> The advice given in http://hbase.apache.org/book/schema.versions.html is a
>> little ambiguous.
>>
>> Thanks,
>> Shawn
>
> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.
>

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com

RE: Practical Upper Limit on Number of Version Stored?

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Michael,

Both: columns and timestamps are valid choices. Events have sources and in my approach source is in rowkey and time is in timestamp. 
In your approach you embed time into column qualifier.
Its easy to get last N events in my approach  using  "Give first N key-values"-type of Filter in your approach you need the same type of filter.
TTL will expire old events in both cases.

>Suppose you have event A occurring at time X.
>Then you have event B occurring at time X2.
>Are they the same?
>Based on the OPs limited description A and B are not.
>So why store them as versions as if they were the same?

There are no such things as "same" events. Frankly speaking, I am not following you here. 

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael Segel [michael_segel@hotmail.com]
Sent: Friday, December 06, 2013 4:23 AM
To: user@hbase.apache.org
Subject: Re: Practical Upper Limit on Number of Version Stored?

Look,

Just because you can do something, doesn't mean its a good idea.

>From a design perspective its not a good idea.

Ask yourself why does versioning exist?  What purpose does versioning serve in HBase?

>From a design perspective you have to ask yourself what are you attempting to do.

Here the OP says ..
"I guess I don't really understand why I wouldn't want to do this.  For our use case we only really care about the user's last 50 to 200 events.  We don't really care about deleting events explicitly.  More than likely we would enable a TTL to get rid of events older than a certain time. "

So his goal is to get the last N events first.

Remember columns are in sort order.
So if you have Event-XXXX or XXXX-Event as your column identifier (name), where XXXX is (Epoc - timestamp) ...
You will have your events in last event first.

This not only achieves what the OP wants, but ... I seem to recall some people posting here about methods to only return N results from a row at a time?


And here's the kicker...

>From a design perspective...

Suppose you have event A occurring at time X.
Then you have event B occurring at time X2.

Are they the same?

Based on the OPs limited description A and B are not.
So why store them as versions as if they were the same?

Versioning may make sense if we were talking about an RSVP to a function.
At time T, Bob, may RSVP 'yes'.
At time T1, Bob may RSVP 'tentative'.
At time T2, Bob may RSVP, 'no'.

Each version is describing the same object.
Does that make sense?

Good design is critical...

Just putting it out there.  ;-)




On Dec 5, 2013, at 9:50 PM, Vladimir Rodionov <vr...@carrieriq.com> wrote:

> Version is just a timestamp (event time) => naturally fits time-series (event) types of data.
> Besides this, events are immutable objects, if they are not, not than they are not events.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Thursday, December 05, 2013 5:10 PM
> To: user@hbase.apache.org
> Subject: Re: Practical Upper Limit on Number of Version Stored?
>
> You really don't want to do this.
> Its not what the versioning was meant for and it has a couple of serious flaws.
>
> The biggest flaw... what happens when you want to delete a version? ...
>
> There are other options... depending on your use case and how you use the events.
>
> Truly using versioning beyond versions of the same data.. not a good idea.
>
> On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com> wrote:
>
>> All,
>> I am working on an HBase application where we store user events in an HBase
>> table.  The row key is the a user identifier and each column is an event
>> identifier.  Most users only have a handful of events (10 or less), but
>> some users have a few hundred thousand events or more and this causes
>> issues when an HBase client tries to retrieve all those events.
>>
>> We are looking at different ways of limiting then number events returned.
>> One idea is to store each event using its own column qualifier, but
>> instead use HBase's versioning capability to store the last 100 to 200
>> events. It doesn't seem like we would run into issues with this approach,
>> but I want to see if anyone has had any practical experience in this area.
>> The advice given in http://hbase.apache.org/book/schema.versions.html is a
>> little ambiguous.
>>
>> Thanks,
>> Shawn
>
> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.
>

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com






Re: Practical Upper Limit on Number of Version Stored?

Posted by Shawn Hermans <sh...@gmail.com>.
We are evaluating a few different options at the moment.  One of them is an
option where we are refactor the column identifiers as you suggested.  The
use of HBase versions is another potential option. Rest assured we will end
up doing a more thorough analysis and take the overall design into account.
 I just wanted to make sure we wouldn't run into any performance issues
with using a large number of version.  Based upon feedback, it looks like
that shouldn't be a problem.

Thanks,
Shawn


On Fri, Dec 6, 2013 at 6:23 AM, Michael Segel <mi...@hotmail.com>wrote:

> Look,
>
> Just because you can do something, doesn't mean its a good idea.
>
> From a design perspective its not a good idea.
>
> Ask yourself why does versioning exist?  What purpose does versioning
> serve in HBase?
>
> From a design perspective you have to ask yourself what are you attempting
> to do.
>
> Here the OP says ..
> "I guess I don't really understand why I wouldn't want to do this.  For
> our use case we only really care about the user's last 50 to 200 events.
>  We don't really care about deleting events explicitly.  More than likely
> we would enable a TTL to get rid of events older than a certain time. "
>
> So his goal is to get the last N events first.
>
> Remember columns are in sort order.
> So if you have Event-XXXX or XXXX-Event as your column identifier (name),
> where XXXX is (Epoc - timestamp) ...
> You will have your events in last event first.
>
> This not only achieves what the OP wants, but ... I seem to recall some
> people posting here about methods to only return N results from a row at a
> time?
>
>
> And here's the kicker...
>
> From a design perspective...
>
> Suppose you have event A occurring at time X.
> Then you have event B occurring at time X2.
>
> Are they the same?
>
> Based on the OPs limited description A and B are not.
> So why store them as versions as if they were the same?
>
> Versioning may make sense if we were talking about an RSVP to a function.
> At time T, Bob, may RSVP 'yes'.
> At time T1, Bob may RSVP 'tentative'.
> At time T2, Bob may RSVP, 'no'.
>
> Each version is describing the same object.
> Does that make sense?
>
> Good design is critical...
>
> Just putting it out there.  ;-)
>
>
>
>
> On Dec 5, 2013, at 9:50 PM, Vladimir Rodionov <vr...@carrieriq.com>
> wrote:
>
> > Version is just a timestamp (event time) => naturally fits time-series
> (event) types of data.
> > Besides this, events are immutable objects, if they are not, not than
> they are not events.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Michael Segel [michael_segel@hotmail.com]
> > Sent: Thursday, December 05, 2013 5:10 PM
> > To: user@hbase.apache.org
> > Subject: Re: Practical Upper Limit on Number of Version Stored?
> >
> > You really don't want to do this.
> > Its not what the versioning was meant for and it has a couple of serious
> flaws.
> >
> > The biggest flaw... what happens when you want to delete a version? ...
> >
> > There are other options... depending on your use case and how you use
> the events.
> >
> > Truly using versioning beyond versions of the same data.. not a good
> idea.
> >
> > On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com>
> wrote:
> >
> >> All,
> >> I am working on an HBase application where we store user events in an
> HBase
> >> table.  The row key is the a user identifier and each column is an event
> >> identifier.  Most users only have a handful of events (10 or less), but
> >> some users have a few hundred thousand events or more and this causes
> >> issues when an HBase client tries to retrieve all those events.
> >>
> >> We are looking at different ways of limiting then number events
> returned.
> >> One idea is to store each event using its own column qualifier, but
> >> instead use HBase's versioning capability to store the last 100 to 200
> >> events. It doesn't seem like we would run into issues with this
> approach,
> >> but I want to see if anyone has had any practical experience in this
> area.
> >> The advice given in http://hbase.apache.org/book/schema.versions.htmlis a
> >> little ambiguous.
> >>
> >> Thanks,
> >> Shawn
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
> >
> > Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
> >
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: Practical Upper Limit on Number of Version Stored?

Posted by Michael Segel <mi...@hotmail.com>.
Look, 

Just because you can do something, doesn't mean its a good idea. 

From a design perspective its not a good idea. 

Ask yourself why does versioning exist?  What purpose does versioning serve in HBase? 

From a design perspective you have to ask yourself what are you attempting to do. 

Here the OP says .. 
"I guess I don't really understand why I wouldn't want to do this.  For our use case we only really care about the user's last 50 to 200 events.  We don't really care about deleting events explicitly.  More than likely we would enable a TTL to get rid of events older than a certain time. "

So his goal is to get the last N events first. 

Remember columns are in sort order. 
So if you have Event-XXXX or XXXX-Event as your column identifier (name), where XXXX is (Epoc - timestamp) ... 
You will have your events in last event first. 

This not only achieves what the OP wants, but ... I seem to recall some people posting here about methods to only return N results from a row at a time? 


And here's the kicker... 

From a design perspective... 

Suppose you have event A occurring at time X.
Then you have event B occurring at time X2. 

Are they the same? 

Based on the OPs limited description A and B are not. 
So why store them as versions as if they were the same? 

Versioning may make sense if we were talking about an RSVP to a function. 
At time T, Bob, may RSVP 'yes'. 
At time T1, Bob may RSVP 'tentative'. 
At time T2, Bob may RSVP, 'no'. 

Each version is describing the same object. 
Does that make sense?

Good design is critical... 

Just putting it out there.  ;-)




On Dec 5, 2013, at 9:50 PM, Vladimir Rodionov <vr...@carrieriq.com> wrote:

> Version is just a timestamp (event time) => naturally fits time-series (event) types of data.
> Besides this, events are immutable objects, if they are not, not than they are not events.
> 
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
> 
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Thursday, December 05, 2013 5:10 PM
> To: user@hbase.apache.org
> Subject: Re: Practical Upper Limit on Number of Version Stored?
> 
> You really don't want to do this.
> Its not what the versioning was meant for and it has a couple of serious flaws.
> 
> The biggest flaw... what happens when you want to delete a version? ...
> 
> There are other options... depending on your use case and how you use the events.
> 
> Truly using versioning beyond versions of the same data.. not a good idea.
> 
> On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com> wrote:
> 
>> All,
>> I am working on an HBase application where we store user events in an HBase
>> table.  The row key is the a user identifier and each column is an event
>> identifier.  Most users only have a handful of events (10 or less), but
>> some users have a few hundred thousand events or more and this causes
>> issues when an HBase client tries to retrieve all those events.
>> 
>> We are looking at different ways of limiting then number events returned.
>> One idea is to store each event using its own column qualifier, but
>> instead use HBase's versioning capability to store the last 100 to 200
>> events. It doesn't seem like we would run into issues with this approach,
>> but I want to see if anyone has had any practical experience in this area.
>> The advice given in http://hbase.apache.org/book/schema.versions.html is a
>> little ambiguous.
>> 
>> Thanks,
>> Shawn
> 
> The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
> 
> 
> 
> 
> 
> 
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.
> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






RE: Practical Upper Limit on Number of Version Stored?

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Version is just a timestamp (event time) => naturally fits time-series (event) types of data.
Besides this, events are immutable objects, if they are not, not than they are not events.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Michael Segel [michael_segel@hotmail.com]
Sent: Thursday, December 05, 2013 5:10 PM
To: user@hbase.apache.org
Subject: Re: Practical Upper Limit on Number of Version Stored?

You really don't want to do this.
Its not what the versioning was meant for and it has a couple of serious flaws.

The biggest flaw... what happens when you want to delete a version? ...

There are other options... depending on your use case and how you use the events.

Truly using versioning beyond versions of the same data.. not a good idea.

On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com> wrote:

> All,
> I am working on an HBase application where we store user events in an HBase
> table.  The row key is the a user identifier and each column is an event
> identifier.  Most users only have a handful of events (10 or less), but
> some users have a few hundred thousand events or more and this causes
> issues when an HBase client tries to retrieve all those events.
>
> We are looking at different ways of limiting then number events returned.
> One idea is to store each event using its own column qualifier, but
> instead use HBase's versioning capability to store the last 100 to 200
> events. It doesn't seem like we would run into issues with this approach,
> but I want to see if anyone has had any practical experience in this area.
> The advice given in http://hbase.apache.org/book/schema.versions.html is a
> little ambiguous.
>
> Thanks,
> Shawn

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com






Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Practical Upper Limit on Number of Version Stored?

Posted by Michael Segel <mi...@hotmail.com>.
You really don't want to do this. 
Its not what the versioning was meant for and it has a couple of serious flaws. 

The biggest flaw... what happens when you want to delete a version? ...

There are other options... depending on your use case and how you use the events. 

Truly using versioning beyond versions of the same data.. not a good idea.

On Dec 5, 2013, at 4:47 PM, Shawn Hermans <sh...@gmail.com> wrote:

> All,
> I am working on an HBase application where we store user events in an HBase
> table.  The row key is the a user identifier and each column is an event
> identifier.  Most users only have a handful of events (10 or less), but
> some users have a few hundred thousand events or more and this causes
> issues when an HBase client tries to retrieve all those events.
> 
> We are looking at different ways of limiting then number events returned.
> One idea is to store each event using its own column qualifier, but
> instead use HBase's versioning capability to store the last 100 to 200
> events. It doesn't seem like we would run into issues with this approach,
> but I want to see if anyone has had any practical experience in this area.
> The advice given in http://hbase.apache.org/book/schema.versions.html is a
> little ambiguous.
> 
> Thanks,
> Shawn

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com