You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David Koch <og...@googlemail.com> on 2012/08/24 16:47:02 UTC

Using timestamps as "transaction ids" for idempotent counters.

Hello,

I use a table for counting stuff and want to do updates by pushing
increments rather than get -> add in application -> put.

To ensure idempotence (i.e avoid over counting) I thought about (mis-)using
a cell's timestamp as a kind of <transaction id>. This transaction id would
be some strictly increasing number defined by the application writing the
increments, so let's call it <external_tmst>. I am looking for a call like:

incrementColumnValue(<row>, <colFam>, <counter_name>, <inc_value>,
<external_tmst>) //normal signature is without last argument

which applies the <inc_value> ONLY IF <external_tmst> is larger than the
cell's most recent version's timestamp (== last transaction id). This way,
if the external application attempts to re-insert the same data multiple
times no change would take place.

My questions are:
1. Is this a good idea to begin with?
2. Does the HBase client offer this kind of functionality, is it planned or
can it be implemented?

It appears that co-processors are able to handle this kind of logic but I
think I will be stuck with 0.90.6 for a while. I also heard about HBaseHUT (
https://github.com/sematext/HBaseHUT) but I am not sure it addresses the
issue of having idempotent counters.

Thank you,

/David

Re: Using timestamps as "transaction ids" for idempotent counters.

Posted by J Mohamed Zahoor <jm...@gmail.com>.
Hi

One of the supported data type in HBase is "Counters".
HTable already has a method called "incrementColumnValue", which atomically
increments a column value.
The lock is acquired at the RS so this does not appear atomic from client
side (but it is).

Have a look at http://hbase.apache.org/book/supported.datatypes.html

./Zahoor

On Fri, Aug 24, 2012 at 8:17 PM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> I use a table for counting stuff and want to do updates by pushing
> increments rather than get -> add in application -> put.
>
> To ensure idempotence (i.e avoid over counting) I thought about (mis-)using
> a cell's timestamp as a kind of <transaction id>. This transaction id would
> be some strictly increasing number defined by the application writing the
> increments, so let's call it <external_tmst>. I am looking for a call like:
>
> incrementColumnValue(<row>, <colFam>, <counter_name>, <inc_value>,
> <external_tmst>) //normal signature is without last argument
>
> which applies the <inc_value> ONLY IF <external_tmst> is larger than the
> cell's most recent version's timestamp (== last transaction id). This way,
> if the external application attempts to re-insert the same data multiple
> times no change would take place.
>
> My questions are:
> 1. Is this a good idea to begin with?
> 2. Does the HBase client offer this kind of functionality, is it planned or
> can it be implemented?
>
> It appears that co-processors are able to handle this kind of logic but I
> think I will be stuck with 0.90.6 for a while. I also heard about HBaseHUT
> (
> https://github.com/sematext/HBaseHUT) but I am not sure it addresses the
> issue of having idempotent counters.
>
> Thank you,
>
> /David
>