You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2014/03/16 08:19:44 UTC

[jira] [Resolved] (PHOENIX-340) Support atomic increment

     [ https://issues.apache.org/jira/browse/PHOENIX-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabriel Reid resolved PHOENIX-340.
----------------------------------

    Resolution: Fixed

Bulk resolve of closed issues imported from GitHub. This status was reached by first re-opening all closed imported issues and then resolving them in bulk.

> Support atomic increment
> ------------------------
>
>                 Key: PHOENIX-340
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-340
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: Raymond Liu
>
> At present, If you want to update a specific column and add an increment on itself, you can do that by
>   " UPSERT INTO T1 (id, count) SELECT id, count+1 FROM T1 WHERE id = id1 "
> There are several problems here:
> 1. If id = id1 is not there, it won't be added with base value say the increment 1. 
> 2. It is do not support concurrent update well, multiple thread running it at the same time will lead to error result.
> There are Htable.increment in HBASE which do support atomic increment, the problem is how to surface it to Phoenix.
> There are several way to do this job.
> Per -18 : implement "create sequence", while this only work for global counter usage, and not suitable for embeded in each row for e.g. page visit, link count etc.
> Make UPSERT SELECT support atomic operation. This is the ideal solution, while might involve too much overhead for normal operation which without atomic requirements.  And Hbase only support LONG type for increment. So this won't work for common data type therefore should be limited on the scope.
> Though we can start to invent a new DML , still for easy idea showcase, UPSERT is the most close thing that I can reuse, Thus I have the following tweak on exisiting UPSERT (by adding a INCREASE before VALUES to include the increment feature) e.g.
>     UPSERT INTO TEST(ID, COUNT) INCREASE VALUES('foo',1);
> That could reuse most of the UPSERT VALUES code path, and do not introduce too much extra overhead. When you have INCREASE in the Statement, the value for PRIMARY KEY still works as normal value for seeking the row, while the value for Non primary key will acting as increments.
> I have just made a initial version at here : SHA: 2466ee6 with unit test code for your reference on the usage and issues I mention below.
> Due to the limitation of current phoenix code structure and framework, there are a few problems in this initial version: SHA: 2466ee6a27d12b6c6bb29ba87ece95466e9df98a
> 1 Phoenix treat long/int etc. data type differently with HBASE, say, flip the sign bit. this will lead to incompatible operation on the same value when use hbase ICV to set initial value upon non exist column. 
> 2 UNSIGNED LONG could be used without this initial value problem, however negative value will not be supported.  Not only you can not store negative value in the column, but also you can not pass negative value to the UPSERT INCREASE VALUES statement, it won't pass the grammar checking.
> For this two issue, even you don't solve it for this issue, as long as you want to use increment ( say to implement create sequence ) you had to find a way to overcome it. To make the data type compatible with Hbase. Thus I am wondering, Maybe we could create two type for each of the number data type, say a RAW Version which do not flip the flag, and a flip flag version for use in the PK Column. They could still share one TYPE in DML, say LONG, but when DDL is called, it will change it to the corresponding TYPE and use it in the META TABLE. In this way, user will not need to know the difference. And the code could still deal with them easily without extra logic, maybe even faster, since the normal column's value do not need to go through encoding/decoding to flip flag anymore.
> 3 The current Mutation plan only accept PUT/DELETE and implement it by htable.batch. While Hbase increment go through htable.increment.  The mutation join strategy also just works for simple replacement.
> To overcome this, it do need to hack a lot of fundamental code. so, in my branch, I do enhance the MutationState by change the mutation value from Byte[] to a MutationValue class to store both byte[] for PUT/DELETE and long for increment operation. And with join operation for multiple DML, the later Put/DELETE will override previous mutation, while a later Increment will not override PUT/DELETE, it will be kept. And will also accumulate on Increment on the same column. And upon commit, all the PUT/DELETE will still be batched first, then Increment will be done one by one. 
> I am not sure is there better solution on this, but this approaching is the most easy one I can figure out which do not impact the whole framework too much.
> You can test out both of the scenario I mentioned above with the unit test case.
> At present, since issue 1/2 is not addressed well, some case will fail ( and so I comment them out). But with the solution for data type I mentioned above been implemented, I believe this could work quite well.
> Any idea?



--
This message was sent by Atlassian JIRA
(v6.2#6252)