You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by William <yh...@163.com> on 2016/05/04 07:42:43 UTC

Support HBase Increment and CheckAndMutate API

Hi all,
    I am trying to support hbase Increment and checkAndPut/checkAndDelete in phoenix and have done with Increment by introducing a new statement INCREMENT. 
Grammar:
  INCREMENT tableName (column defs) VALUES(....) where expressions


Because all increment operations can be stored in a single Increment object which can be committed by Table#batch(), it is not hard to do the job. 
But for checkAndPut/checkAndDelete, there isn't such a class that holds the conditions and values at the same time.  And for the existed write path, all methods and data structures are 
used to generate List<Mutation>, it is hard to add a new data type to the  write path. So I have to create a new write path. 
May someone give me some advice or solutions?
Thanks.


William

Re:Re: Re: Support HBase Increment and CheckAndMutate API

Posted by William <yh...@163.com>.

Hi James, 
  Thanks a lot for your advice, I will give it a shot.


William



At 2016-05-04 23:34:19, "James Taylor" <ja...@apache.org> wrote:
>Hi William,
>Tephra is used in production for every CDAP customer[1], so it does have
>production usage. If your scenario is OLTP, then transactional tables in
>Phoenix is the way to go. I don't think having atomic increment and check
>and put add up to OLTP.
>
>I also don't think transactions would put a larger performance overhead
>than check and put operations as both are doing a read before write. With
>transactions you'd typically do a SELECT before an UPSERT while check and
>put does a get before the put. Transactions have an extra RPC, but this is
>amortized through batching. I'd really encourage you to try this path.
>
>Thanks,
>James
>
>[1] http://cask.co/products/cdap/
>
>On Wed, May 4, 2016 at 3:24 AM, William <yh...@163.com> wrote:
>
>> Hi James,
>>  Thanks for your advice. I have thought of implementing it in standard
>> SQL,  for example:
>>  a) UPDATE tablename SET colA = colA + 10, colB = colB - 20 where pk = 1;
>>  or
>>  b) UPSERT INTO tableName (colA, colB) values (colA + 10, colB - 20) where
>> pk = 1;
>>
>>
>> At first, I chose a), because UpsertStatement is not a FilterableStatement
>> so doesn't have a WHERE clause. So introduce a new statement UPDATE is
>> easier.
>> But considering checkAndMutate,
>>  UPDATE tablename SET colA= 20 where pk = 1 and colB = 20;
>>
>>
>> for Table#checkAndMutate(), we must provide a row key and a column as the
>> condition. So the WHERE clause must be point-look-up and has one other
>> column.
>> But we'll get a problem when doing this:
>> UPDATE tablename SET colA = colA + 10, colB = 5 where pk = 1 and colC = 30;
>> Unfortunately, hbase doesn't support checkAndIncrement() so we cannot use
>> increment and checkAndMutate in the same SQL.
>> It is difficult to explain to the user why we couldn't support this.
>>
>>
>> So i have to implement increment and checkAndMutate in separate statements.
>> It seems that the best solution is to implement all these things in a
>> coprocessor instead of calling Table#increment() or
>> Table#checkAndMutate().But it costs too much to do this. Do you have a
>> better idea, James?
>>
>>
>> For Tephra, it dose so many things to do a single transaction, and i
>> really don't think the impact to performance is small enough(though I
>> haven't tested yet).
>> My scenario is OLTP and i must guarantee that SQL works nearly as fast as
>> HBase native API. And most of my clients are users migrated from HBase, we
>> just provide a simple and easy way to make up the row key and rich data
>> types.
>> Moreover, I don't know whether Tephra is stable enough to be used in
>> production environment.
>>
>>
>> Above all, I cannot see other choices to do the job quickly and  reliably.
>>
>>
>> At 2016-05-04 14:51:59, "James Taylor" <ja...@apache.org> wrote:
>> >Hi William,
>> >I'd recommend looking at supporting these HBase features through the
>> >standard SQL merge statement where they compile down to these native calls
>> >when possible. Also, with our transaction support, you can already do an
>> >increment atomically and retry if a conflict occurs.
>> >Thanks,
>> >James
>> >
>> >On Tuesday, May 3, 2016, William <yh...@163.com> wrote:
>> >
>> >> Hi all,
>> >>     I am trying to support hbase Increment and
>> checkAndPut/checkAndDelete
>> >> in phoenix and have done with Increment by introducing a new statement
>> >> INCREMENT.
>> >> Grammar:
>> >>   INCREMENT tableName (column defs) VALUES(....) where expressions
>> >>
>> >>
>> >> Because all increment operations can be stored in a single Increment
>> >> object which can be committed by Table#batch(), it is not hard to do the
>> >> job.
>> >> But for checkAndPut/checkAndDelete, there isn't such a class that holds
>> >> the conditions and values at the same time.  And for the existed write
>> >> path, all methods and data structures are
>> >> used to generate List<Mutation>, it is hard to add a new data type to
>> the
>> >> write path. So I have to create a new write path.
>> >> May someone give me some advice or solutions?
>> >> Thanks.
>> >>
>> >>
>> >> William
>>

Re: Re: Support HBase Increment and CheckAndMutate API

Posted by James Taylor <ja...@apache.org>.
Hi William,
Tephra is used in production for every CDAP customer[1], so it does have
production usage. If your scenario is OLTP, then transactional tables in
Phoenix is the way to go. I don't think having atomic increment and check
and put add up to OLTP.

I also don't think transactions would put a larger performance overhead
than check and put operations as both are doing a read before write. With
transactions you'd typically do a SELECT before an UPSERT while check and
put does a get before the put. Transactions have an extra RPC, but this is
amortized through batching. I'd really encourage you to try this path.

Thanks,
James

[1] http://cask.co/products/cdap/

On Wed, May 4, 2016 at 3:24 AM, William <yh...@163.com> wrote:

> Hi James,
>  Thanks for your advice. I have thought of implementing it in standard
> SQL,  for example:
>  a) UPDATE tablename SET colA = colA + 10, colB = colB - 20 where pk = 1;
>  or
>  b) UPSERT INTO tableName (colA, colB) values (colA + 10, colB - 20) where
> pk = 1;
>
>
> At first, I chose a), because UpsertStatement is not a FilterableStatement
> so doesn't have a WHERE clause. So introduce a new statement UPDATE is
> easier.
> But considering checkAndMutate,
>  UPDATE tablename SET colA= 20 where pk = 1 and colB = 20;
>
>
> for Table#checkAndMutate(), we must provide a row key and a column as the
> condition. So the WHERE clause must be point-look-up and has one other
> column.
> But we'll get a problem when doing this:
> UPDATE tablename SET colA = colA + 10, colB = 5 where pk = 1 and colC = 30;
> Unfortunately, hbase doesn't support checkAndIncrement() so we cannot use
> increment and checkAndMutate in the same SQL.
> It is difficult to explain to the user why we couldn't support this.
>
>
> So i have to implement increment and checkAndMutate in separate statements.
> It seems that the best solution is to implement all these things in a
> coprocessor instead of calling Table#increment() or
> Table#checkAndMutate().But it costs too much to do this. Do you have a
> better idea, James?
>
>
> For Tephra, it dose so many things to do a single transaction, and i
> really don't think the impact to performance is small enough(though I
> haven't tested yet).
> My scenario is OLTP and i must guarantee that SQL works nearly as fast as
> HBase native API. And most of my clients are users migrated from HBase, we
> just provide a simple and easy way to make up the row key and rich data
> types.
> Moreover, I don't know whether Tephra is stable enough to be used in
> production environment.
>
>
> Above all, I cannot see other choices to do the job quickly and  reliably.
>
>
> At 2016-05-04 14:51:59, "James Taylor" <ja...@apache.org> wrote:
> >Hi William,
> >I'd recommend looking at supporting these HBase features through the
> >standard SQL merge statement where they compile down to these native calls
> >when possible. Also, with our transaction support, you can already do an
> >increment atomically and retry if a conflict occurs.
> >Thanks,
> >James
> >
> >On Tuesday, May 3, 2016, William <yh...@163.com> wrote:
> >
> >> Hi all,
> >>     I am trying to support hbase Increment and
> checkAndPut/checkAndDelete
> >> in phoenix and have done with Increment by introducing a new statement
> >> INCREMENT.
> >> Grammar:
> >>   INCREMENT tableName (column defs) VALUES(....) where expressions
> >>
> >>
> >> Because all increment operations can be stored in a single Increment
> >> object which can be committed by Table#batch(), it is not hard to do the
> >> job.
> >> But for checkAndPut/checkAndDelete, there isn't such a class that holds
> >> the conditions and values at the same time.  And for the existed write
> >> path, all methods and data structures are
> >> used to generate List<Mutation>, it is hard to add a new data type to
> the
> >> write path. So I have to create a new write path.
> >> May someone give me some advice or solutions?
> >> Thanks.
> >>
> >>
> >> William
>

Re:Re: Support HBase Increment and CheckAndMutate API

Posted by William <yh...@163.com>.
Hi James, 
 Thanks for your advice. I have thought of implementing it in standard SQL,  for example:
 a) UPDATE tablename SET colA = colA + 10, colB = colB - 20 where pk = 1;
 or 
 b) UPSERT INTO tableName (colA, colB) values (colA + 10, colB - 20) where pk = 1;


At first, I chose a), because UpsertStatement is not a FilterableStatement so doesn't have a WHERE clause. So introduce a new statement UPDATE is easier.
But considering checkAndMutate,
 UPDATE tablename SET colA= 20 where pk = 1 and colB = 20;


for Table#checkAndMutate(), we must provide a row key and a column as the condition. So the WHERE clause must be point-look-up and has one other column. 
But we'll get a problem when doing this:
UPDATE tablename SET colA = colA + 10, colB = 5 where pk = 1 and colC = 30;
Unfortunately, hbase doesn't support checkAndIncrement() so we cannot use increment and checkAndMutate in the same SQL.
It is difficult to explain to the user why we couldn't support this. 


So i have to implement increment and checkAndMutate in separate statements.  
It seems that the best solution is to implement all these things in a coprocessor instead of calling Table#increment() or Table#checkAndMutate().But it costs too much to do this. Do you have a better idea, James?


For Tephra, it dose so many things to do a single transaction, and i really don't think the impact to performance is small enough(though I haven't tested yet).
My scenario is OLTP and i must guarantee that SQL works nearly as fast as HBase native API. And most of my clients are users migrated from HBase, we just provide a simple and easy way to make up the row key and rich data types.
Moreover, I don't know whether Tephra is stable enough to be used in production environment. 


Above all, I cannot see other choices to do the job quickly and  reliably.


At 2016-05-04 14:51:59, "James Taylor" <ja...@apache.org> wrote:
>Hi William,
>I'd recommend looking at supporting these HBase features through the
>standard SQL merge statement where they compile down to these native calls
>when possible. Also, with our transaction support, you can already do an
>increment atomically and retry if a conflict occurs.
>Thanks,
>James
>
>On Tuesday, May 3, 2016, William <yh...@163.com> wrote:
>
>> Hi all,
>>     I am trying to support hbase Increment and checkAndPut/checkAndDelete
>> in phoenix and have done with Increment by introducing a new statement
>> INCREMENT.
>> Grammar:
>>   INCREMENT tableName (column defs) VALUES(....) where expressions
>>
>>
>> Because all increment operations can be stored in a single Increment
>> object which can be committed by Table#batch(), it is not hard to do the
>> job.
>> But for checkAndPut/checkAndDelete, there isn't such a class that holds
>> the conditions and values at the same time.  And for the existed write
>> path, all methods and data structures are
>> used to generate List<Mutation>, it is hard to add a new data type to the
>> write path. So I have to create a new write path.
>> May someone give me some advice or solutions?
>> Thanks.
>>
>>
>> William

Re: Support HBase Increment and CheckAndMutate API

Posted by James Taylor <ja...@apache.org>.
Hi William,
I'd recommend looking at supporting these HBase features through the
standard SQL merge statement where they compile down to these native calls
when possible. Also, with our transaction support, you can already do an
increment atomically and retry if a conflict occurs.
Thanks,
James

On Tuesday, May 3, 2016, William <yh...@163.com> wrote:

> Hi all,
>     I am trying to support hbase Increment and checkAndPut/checkAndDelete
> in phoenix and have done with Increment by introducing a new statement
> INCREMENT.
> Grammar:
>   INCREMENT tableName (column defs) VALUES(....) where expressions
>
>
> Because all increment operations can be stored in a single Increment
> object which can be committed by Table#batch(), it is not hard to do the
> job.
> But for checkAndPut/checkAndDelete, there isn't such a class that holds
> the conditions and values at the same time.  And for the existed write
> path, all methods and data structures are
> used to generate List<Mutation>, it is hard to add a new data type to the
> write path. So I have to create a new write path.
> May someone give me some advice or solutions?
> Thanks.
>
>
> William