You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Suraj Varma <sv...@gmail.com> on 2011/12/03 15:24:56 UTC

Question on Coprocessors and Atomicity

I'm looking at the preCheckAndPut / postCheckAndPut api with
coprocessors and I'm wondering ... are these pre/post checks done
_after_ taking the row lock or is the row lock only done within the
checkAndPut api.

I'm interested in seeing if we can implement something like:
(in pseudo sql)
update table-name
set column-name = new-value
where (column-value - new-value) > threshold-value

Basically ... I want to enhance the checkAndPut to not just compare
"values" ... but apply an arbitrary function on the value _atomically_
in the Put call. Multiple threads would be firing these mutations and
I'd like the threshold-value above to never be breached under any
circumstance.

Is there a solution that can be implemented either via checkAndPut or
using coprocessors preCheckAndPut? If not, would this be a useful
feature to build in HBase?

Thanks,
--Suraj

Re: Question on Coprocessors and Atomicity

Posted by Jesse Yates <je...@gmail.com>.
The row lock stuff was just if you are really concerned with making sure
the atomicity works (though I don't see any reason it wouldn't). Since all
the puts are going to be filtered through the coprocessor (the same
instance in fact), in order and the CP is going to block the Put from
completing (Because the CP is going to be doing the check in prePut), then
you can be sure you have atomic access to that put. Rows are locked on
update anyways, so the lock there is implicit - you would want commonly
want to use the row lock when doing changes to other rows to ensure
entity/referential integrity.

Keep in mind that you will always see a consistent view and since there is
only one CP running on one regionserver accesing that one row, you are
going to get the atomicity - the single thread is should never cause
conflicts with itself under normal operation. Failure scenarios could
theoretically be a little weird, if some puts fail for a given use case
that is dependent on that previous writes (but this is not yours).

Quick soln - write a CP to check the single row (blocking the put). If you
are looking at multiple rows, then you need to start worrying about complex
locking situations, but right now that should be fine.

Right soln (at least trying not to overthink it) - incorporate local table
access into constraints and then just impl a constraint.

-Jesse


On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com> wrote:

> Just so my question is clear ... everything I'm suggesting is in the
> context of a single row (not cross row / table). - so, yes, I'm
> guessing obtaining a RowLock on the region side during preCheckAndPut
> / postCheckAndPut would certainly work. Which was why I was asking
> whether the pre/postCheckAndPut obtains the row lock or whether the
> row lock is only obtained within checkAndPut.
>
> Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
> that even work? i.e. can the same rowlock be inherited by the
> checkAndPut api within that thread's context? Or will preCheckAndPut
> have to release the lock so that checkAndPut can take it (which won't
> work for my case, as it has to be atomic between the preCheck and
> Put.)
>
> Thanks for pointing me to the Constraints functionality - I'll take a
> look at whether it could potentially work.
> --Suraj
>
> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com>
> wrote:
> > I think the feature you are looking for is a Constraint. Currently they
> are
> > being added to 0.94 in
> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
> > they are almost ready to be rolled in, and backporting to 0.92 is
> > definitely doable.
> >
> > However, Constraints aren't going to be quite flexible enough to
> > efficiently support what you are describing. For instance, with a
> > constraint, you are ideally just checking the put value against some
> simple
> > constraint (never over 10 or always an integer), but looking at the
> current
> > state of the table before allowing the put would currently require
> creating
> > a full blown connection to the local table through another HTable.
> >
> > In the short term, you could write a simple coprocessor to do this
> checking
> > and then move over to constraints (which are a simpler, more flexible,
> way
> > of doing this) when the necessary features have been added.
> >
> > It is worth discussing if it makes sense to have access to the local
> region
> > through a constraint, though that breaks the idea a little bit, it would
> > certainly be useful and not overly wasteful in terms of runtime.
> >
> > Supposing the feature would be added to talk to the local table, and
> since
> > the puts are going to be serialized on the regionserver (at least to that
> > single row you are trying to update), you will never get a situation
> where
> > the value added is over the threshold. If you were really worried about
> the
> > atomicity of the operation, then when doing the put, first get the
> RowLock,
> > then do the put and release the RowLock. However, that latter method is
> > going to be really slow, so should only be used as a stop gap if the
> > constraint doesn't work as expected, until a patch is made for
> constraints.
> >
> > Feel free to open up a ticket and link it to 4605 for adding the local
> > table access functionality, and we can discuss the de/merits of adding
> the
> > access.
> >
> > -Jesse
> >
> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com> wrote:
> >
> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
> >> coprocessors and I'm wondering ... are these pre/post checks done
> >> _after_ taking the row lock or is the row lock only done within the
> >> checkAndPut api.
> >>
> >> I'm interested in seeing if we can implement something like:
> >> (in pseudo sql)
> >> update table-name
> >> set column-name = new-value
> >> where (column-value - new-value) > threshold-value
> >>
> >> Basically ... I want to enhance the checkAndPut to not just compare
> >> "values" ... but apply an arbitrary function on the value _atomically_
> >> in the Put call. Multiple threads would be firing these mutations and
> >> I'd like the threshold-value above to never be breached under any
> >> circumstance.
> >>
> >> Is there a solution that can be implemented either via checkAndPut or
> >> using coprocessors preCheckAndPut? If not, would this be a useful
> >> feature to build in HBase?
> >>
> >> Thanks,
> >> --Suraj
> >>
> >
> >
> >
> > --
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: Question on Coprocessors and Atomicity

Posted by Suraj Varma <sv...@gmail.com>.
I understand - yes, if constraints are limited to local variables and
there is no "read & put" being done, we're fine. I think I
misunderstood the original intent of "constraint" to include read &
put and missed the "intercept and check whether the column values are
of the right data type / format" use case.

I think what you explained sounds very reasonable and we can defer the
check & put constraint to HBASE-4999

Thanks for the detailed clarification.
--Suraj

On Sun, Dec 11, 2011 at 10:26 PM, Jesse Yates <je...@gmail.com> wrote:
> Adding constraints was originally developed with the idea of checking
> incoming writes for validity, based on their internal properties. For
> instance, checking to make sure that a value is in the range 1-10, or that
> it is an integer, or not over a certain length (you get the idea).
>
> If the row lock is released between the time the coprocessor finishes
>> "preXXXX" checks and the core mutation method is invoked (as has been
>> discussed in this thread), how can the Constraint be ensured? If two
>> requests are being processed in parallel, there is every possibility
>> that both requests pass the "Constraint" check individually, but break
>> it together (e.g. even simple checks like column value == 10 would
>> break if two requests fire concurrently).
>>
>> So - I'm questioning whether a pure Coprocessor implementation alone
>> would be sufficient?
>>
>
> If you have a situation where are just doing the straight put (or
> checkAndPut) then the constraint will not affect the atomicity or
> consistency of writes to the table. Even if one of the puts finishes the
> constraint first, the single row will not be corrupted and the 'fastest'
> write will win. The underlying region takes care of breaking the tie and
> ensuring that writes get serialized to the table.
>
> Its true that the transactional nature will be an issue if you are doing a
> read on the table first. From HRegion.internalPut(), we can see that we
> first get a readlock and then, when doing the put actually do serialize the
> writes into the memstore, enforcing the consistency and atomicity of the
> update. However, there are no check that the prePut checking is going to
> enforce that atomicity. From internal put, we see:
>
> /* run pre put hook outside of lock to avoid deadlock */
>    if (coprocessorHost != null) {
>      if (coprocessorHost.prePut(put, walEdit, writeToWAL)) {
>        return;
>      }
>    }
>
> So yes, this doesn't ensure that we are going to get specific ordering or
> even a fully consistent view of the underlying data. However, as long as
> each constraint just uses local variables, each thread interacting with the
> constraint will be completely fine.
>
> Yeah, some documentation needs to be done on constraints as far as not
> using static fields or expecting multiple thread interaction. However, the
> checking ensures that the field passes - the underlying store takes care of
> ensuring atomicity and consistency.
>
> Take your example of column value == 10. Two different requests go into the
> HRegion. They each run the Equals10Constraint in a different thread. The
> constraint allows each to pass (lets assume here that they both do ==10).
> Then it will hit the remainder of HRegion.internalPut(), go through the
> updatesLock and then also get serialized in the call to
> HRegion.applyFamilyMapToMemstore(familyMap, null). So the atomicity of the
> writes are preserved. This also works for n concurrent writes and where not
> all writes pass the constraint.
>
>
>
>>
>> I think we'll need an approach that makes the constraint checking and
>> mutation to be _atomically_ achieved
>> a) either by taking a row lock and passing that into put / checkAndPut
>> b) referencing & checking the constraint directly from within the put
>> / checkAndPut methods (like we do with the comparator, for instance)
>>
>>
> However, if we wanted to do a read of the table, within the coprocessor, we
> will want to actually pass in the rowlock so we can ensure that we don't
> get a contention/atomicity issue. This could be pretty easily handled by
> setting the rowlock on the Put that is passed into prePut (which either
> resets that current rowlock or sets it to the found rowlock). Then the
> underlying constraint could make that accessible if we wanted to scan the
> region.
>
> I also just did a quick test to modifying TestSimpleRegionObserver to do a
> read from the region when doing the prePut() and came up working fine - no
> deadlock issues. Checked for doing bothPut and checkAndPut, and didn't come
> up with anything locking.
>
> Heavy concurrent use only will be slowed down by the reads, but nothing
> should be broken. We might be able to do some internal work on constraints
> and create a CheckAndWriteConstriant that will be used by makes it easier
> to check the value already present. Doing something like that would be best
> handled in HBASE-4999.
>
>
> Does that seem reasonable?
>
> -Jesse
>
>
>>
>>
>> On Fri, Dec 9, 2011 at 1:31 PM, Suraj Varma <sv...@gmail.com> wrote:
>> > Hi:
>> > I opened a jira ticket on this:
>> https://issues.apache.org/jira/browse/HBASE-4999
>> >
>> > I have linked to HBASE-4605 in the description to show related work on
>> > Constraints by Jesse.
>> >
>> > Thanks!
>> > --Suraj
>> >
>> > On Sun, Dec 4, 2011 at 1:10 PM, Ted Yu <yu...@gmail.com> wrote:
>> >> Currently ConstraintProcessor latches onto prePut() to perform
>> validation
>> >> check.
>> >>
>> >> From HRegion.doMiniBatchPut() where prePut() is called:
>> >>    /* Run coprocessor pre hook outside of locks to avoid deadlock */
>> >> So to make use of Constraint in Suraj's scenario, we have some
>> decisions to
>> >> make about various factors.
>> >>
>> >> Cheers
>> >>
>> >> On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <sv...@gmail.com>
>> wrote:
>> >>
>> >>> Jesse:
>> >>> >> Quick soln - write a CP to check the single row (blocking the put).
>> >>>
>> >>> Yeah - given that I want this to be atomically done, I'm wondering if
>> >>> this would even work (because, I believe I'd need to unlock the row so
>> >>> that the checkAndMutate can take the lock - so, there is a brief
>> >>> window between where there is no lock being held - and some other
>> >>> thread could take that lock). One option would be to pass in a lock to
>> >>> checkAndMutate ... but that would increase the locking period and may
>> >>> have performance implications, I think.
>> >>>
>> >>> I see a lot of potential in the Constraints implementation - it would
>> >>> really open up CAS operations to do functional constraint checking,
>> >>> rather than just value comparisons.
>> >>>
>> >>> --Suraj
>> >>>
>> >>> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <sv...@gmail.com>
>> wrote:
>> >>> > Thanks - I see that the lock is taken internal to checkAndMutate.
>> >>> >
>> >>> > I'm wondering whether it is a better idea to actually pass in a
>> >>> > Constraint (or even Constraints) as the checkAndMutate argument.
>> Right
>> >>> > now it is taking in an Comparator and a CompareOp for verification.
>> >>> > But, this could just be a special case of Constraint which is
>> >>> > evaluated within the lock.
>> >>> >
>> >>> > In other words, we could open up a richer Constraint checking api
>> >>> > where any "functional" Constraint check can be performed in the
>> >>> > checkAndPut operation.
>> >>> >
>> >>> > This would also not have the same performance impact of taking a
>> >>> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
>> >>> > really (in my mind) implementing the compare-and-set more
>> generically.
>> >>> >
>> >>> > I also see the potential of passing in multiple constraints (say
>> >>> > upper/lower bounds in Increment/Decrement operations) etc.
>> >>> >
>> >>> > --Suraj
>> >>> >
>> >>> >
>> >>> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yu...@gmail.com> wrote:
>> >>> >> From HRegionServer.checkAndPut():
>> >>> >>    if (region.getCoprocessorHost() != null) {
>> >>> >>      Boolean result = region.getCoprocessorHost()
>> >>> >>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL,
>> >>> comparator,
>> >>> >>          put);
>> >>> >> ...
>> >>> >>    boolean result = checkAndMutate(regionName, row, family,
>> qualifier,
>> >>> >>      CompareOp.EQUAL, new BinaryComparator(value), put,
>> >>> >>      lock);
>> >>> >> We can see that the lock isn't taken for preCheckAndPut().
>> >>> >>
>> >>> >> To satisfy Suraj's requirement, I think a slight change to
>> >>> checkAndPut() is
>> >>> >> needed so that atomicity can be achieved across preCheckAndPut() and
>> >>> >> checkAndMutate().
>> >>> >>
>> >>> >> Cheers
>> >>> >>
>> >>> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com>
>> >>> wrote:
>> >>> >>
>> >>> >>> Just so my question is clear ... everything I'm suggesting is in
>> the
>> >>> >>> context of a single row (not cross row / table). - so, yes, I'm
>> >>> >>> guessing obtaining a RowLock on the region side during
>> preCheckAndPut
>> >>> >>> / postCheckAndPut would certainly work. Which was why I was asking
>> >>> >>> whether the pre/postCheckAndPut obtains the row lock or whether the
>> >>> >>> row lock is only obtained within checkAndPut.
>> >>> >>>
>> >>> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ...
>> will
>> >>> >>> that even work? i.e. can the same rowlock be inherited by the
>> >>> >>> checkAndPut api within that thread's context? Or will
>> preCheckAndPut
>> >>> >>> have to release the lock so that checkAndPut can take it (which
>> won't
>> >>> >>> work for my case, as it has to be atomic between the preCheck and
>> >>> >>> Put.)
>> >>> >>>
>> >>> >>> Thanks for pointing me to the Constraints functionality - I'll
>> take a
>> >>> >>> look at whether it could potentially work.
>> >>> >>> --Suraj
>> >>> >>>
>> >>> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <
>> jesse.k.yates@gmail.com>
>> >>> >>> wrote:
>> >>> >>> > I think the feature you are looking for is a Constraint.
>> Currently
>> >>> they
>> >>> >>> are
>> >>> >>> > being added to 0.94 in
>> >>> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
>> >>> >>> > they are almost ready to be rolled in, and backporting to 0.92 is
>> >>> >>> > definitely doable.
>> >>> >>> >
>> >>> >>> > However, Constraints aren't going to be quite flexible enough to
>> >>> >>> > efficiently support what you are describing. For instance, with a
>> >>> >>> > constraint, you are ideally just checking the put value against
>> some
>> >>> >>> simple
>> >>> >>> > constraint (never over 10 or always an integer), but looking at
>> the
>> >>> >>> current
>> >>> >>> > state of the table before allowing the put would currently
>> require
>> >>> >>> creating
>> >>> >>> > a full blown connection to the local table through another
>> HTable.
>> >>> >>> >
>> >>> >>> > In the short term, you could write a simple coprocessor to do
>> this
>> >>> >>> checking
>> >>> >>> > and then move over to constraints (which are a simpler, more
>> >>> flexible,
>> >>> >>> way
>> >>> >>> > of doing this) when the necessary features have been added.
>> >>> >>> >
>> >>> >>> > It is worth discussing if it makes sense to have access to the
>> local
>> >>> >>> region
>> >>> >>> > through a constraint, though that breaks the idea a little bit,
>> it
>> >>> would
>> >>> >>> > certainly be useful and not overly wasteful in terms of runtime.
>> >>> >>> >
>> >>> >>> > Supposing the feature would be added to talk to the local table,
>> and
>> >>> >>> since
>> >>> >>> > the puts are going to be serialized on the regionserver (at
>> least to
>> >>> that
>> >>> >>> > single row you are trying to update), you will never get a
>> situation
>> >>> >>> where
>> >>> >>> > the value added is over the threshold. If you were really worried
>> >>> about
>> >>> >>> the
>> >>> >>> > atomicity of the operation, then when doing the put, first get
>> the
>> >>> >>> RowLock,
>> >>> >>> > then do the put and release the RowLock. However, that latter
>> method
>> >>> is
>> >>> >>> > going to be really slow, so should only be used as a stop gap if
>> the
>> >>> >>> > constraint doesn't work as expected, until a patch is made for
>> >>> >>> constraints.
>> >>> >>> >
>> >>> >>> > Feel free to open up a ticket and link it to 4605 for adding the
>> >>> local
>> >>> >>> > table access functionality, and we can discuss the de/merits of
>> >>> adding
>> >>> >>> the
>> >>> >>> > access.
>> >>> >>> >
>> >>> >>> > -Jesse
>> >>> >>> >
>> >>> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <svarma.ng@gmail.com
>> >
>> >>> wrote:
>> >>> >>> >
>> >>> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
>> >>> >>> >> coprocessors and I'm wondering ... are these pre/post checks
>> done
>> >>> >>> >> _after_ taking the row lock or is the row lock only done within
>> the
>> >>> >>> >> checkAndPut api.
>> >>> >>> >>
>> >>> >>> >> I'm interested in seeing if we can implement something like:
>> >>> >>> >> (in pseudo sql)
>> >>> >>> >> update table-name
>> >>> >>> >> set column-name = new-value
>> >>> >>> >> where (column-value - new-value) > threshold-value
>> >>> >>> >>
>> >>> >>> >> Basically ... I want to enhance the checkAndPut to not just
>> compare
>> >>> >>> >> "values" ... but apply an arbitrary function on the value
>> >>> _atomically_
>> >>> >>> >> in the Put call. Multiple threads would be firing these
>> mutations
>> >>> and
>> >>> >>> >> I'd like the threshold-value above to never be breached under
>> any
>> >>> >>> >> circumstance.
>> >>> >>> >>
>> >>> >>> >> Is there a solution that can be implemented either via
>> checkAndPut
>> >>> or
>> >>> >>> >> using coprocessors preCheckAndPut? If not, would this be a
>> useful
>> >>> >>> >> feature to build in HBase?
>> >>> >>> >>
>> >>> >>> >> Thanks,
>> >>> >>> >> --Suraj
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > --
>> >>> >>> > -------------------
>> >>> >>> > Jesse Yates
>> >>> >>> > 240-888-2200
>> >>> >>> > @jesse_yates
>> >>> >>>
>> >>>
>>
>
> --
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates

Re: Question on Coprocessors and Atomicity

Posted by Jesse Yates <je...@gmail.com>.
Adding constraints was originally developed with the idea of checking
incoming writes for validity, based on their internal properties. For
instance, checking to make sure that a value is in the range 1-10, or that
it is an integer, or not over a certain length (you get the idea).

If the row lock is released between the time the coprocessor finishes
> "preXXXX" checks and the core mutation method is invoked (as has been
> discussed in this thread), how can the Constraint be ensured? If two
> requests are being processed in parallel, there is every possibility
> that both requests pass the "Constraint" check individually, but break
> it together (e.g. even simple checks like column value == 10 would
> break if two requests fire concurrently).
>
> So - I'm questioning whether a pure Coprocessor implementation alone
> would be sufficient?
>

If you have a situation where are just doing the straight put (or
checkAndPut) then the constraint will not affect the atomicity or
consistency of writes to the table. Even if one of the puts finishes the
constraint first, the single row will not be corrupted and the 'fastest'
write will win. The underlying region takes care of breaking the tie and
ensuring that writes get serialized to the table.

Its true that the transactional nature will be an issue if you are doing a
read on the table first. From HRegion.internalPut(), we can see that we
first get a readlock and then, when doing the put actually do serialize the
writes into the memstore, enforcing the consistency and atomicity of the
update. However, there are no check that the prePut checking is going to
enforce that atomicity. From internal put, we see:

/* run pre put hook outside of lock to avoid deadlock */
    if (coprocessorHost != null) {
      if (coprocessorHost.prePut(put, walEdit, writeToWAL)) {
        return;
      }
    }

So yes, this doesn't ensure that we are going to get specific ordering or
even a fully consistent view of the underlying data. However, as long as
each constraint just uses local variables, each thread interacting with the
constraint will be completely fine.

Yeah, some documentation needs to be done on constraints as far as not
using static fields or expecting multiple thread interaction. However, the
checking ensures that the field passes - the underlying store takes care of
ensuring atomicity and consistency.

Take your example of column value == 10. Two different requests go into the
HRegion. They each run the Equals10Constraint in a different thread. The
constraint allows each to pass (lets assume here that they both do ==10).
Then it will hit the remainder of HRegion.internalPut(), go through the
updatesLock and then also get serialized in the call to
HRegion.applyFamilyMapToMemstore(familyMap, null). So the atomicity of the
writes are preserved. This also works for n concurrent writes and where not
all writes pass the constraint.



>
> I think we'll need an approach that makes the constraint checking and
> mutation to be _atomically_ achieved
> a) either by taking a row lock and passing that into put / checkAndPut
> b) referencing & checking the constraint directly from within the put
> / checkAndPut methods (like we do with the comparator, for instance)
>
>
However, if we wanted to do a read of the table, within the coprocessor, we
will want to actually pass in the rowlock so we can ensure that we don't
get a contention/atomicity issue. This could be pretty easily handled by
setting the rowlock on the Put that is passed into prePut (which either
resets that current rowlock or sets it to the found rowlock). Then the
underlying constraint could make that accessible if we wanted to scan the
region.

I also just did a quick test to modifying TestSimpleRegionObserver to do a
read from the region when doing the prePut() and came up working fine - no
deadlock issues. Checked for doing bothPut and checkAndPut, and didn't come
up with anything locking.

Heavy concurrent use only will be slowed down by the reads, but nothing
should be broken. We might be able to do some internal work on constraints
and create a CheckAndWriteConstriant that will be used by makes it easier
to check the value already present. Doing something like that would be best
handled in HBASE-4999.


Does that seem reasonable?

-Jesse


>
>
> On Fri, Dec 9, 2011 at 1:31 PM, Suraj Varma <sv...@gmail.com> wrote:
> > Hi:
> > I opened a jira ticket on this:
> https://issues.apache.org/jira/browse/HBASE-4999
> >
> > I have linked to HBASE-4605 in the description to show related work on
> > Constraints by Jesse.
> >
> > Thanks!
> > --Suraj
> >
> > On Sun, Dec 4, 2011 at 1:10 PM, Ted Yu <yu...@gmail.com> wrote:
> >> Currently ConstraintProcessor latches onto prePut() to perform
> validation
> >> check.
> >>
> >> From HRegion.doMiniBatchPut() where prePut() is called:
> >>    /* Run coprocessor pre hook outside of locks to avoid deadlock */
> >> So to make use of Constraint in Suraj's scenario, we have some
> decisions to
> >> make about various factors.
> >>
> >> Cheers
> >>
> >> On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <sv...@gmail.com>
> wrote:
> >>
> >>> Jesse:
> >>> >> Quick soln - write a CP to check the single row (blocking the put).
> >>>
> >>> Yeah - given that I want this to be atomically done, I'm wondering if
> >>> this would even work (because, I believe I'd need to unlock the row so
> >>> that the checkAndMutate can take the lock - so, there is a brief
> >>> window between where there is no lock being held - and some other
> >>> thread could take that lock). One option would be to pass in a lock to
> >>> checkAndMutate ... but that would increase the locking period and may
> >>> have performance implications, I think.
> >>>
> >>> I see a lot of potential in the Constraints implementation - it would
> >>> really open up CAS operations to do functional constraint checking,
> >>> rather than just value comparisons.
> >>>
> >>> --Suraj
> >>>
> >>> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <sv...@gmail.com>
> wrote:
> >>> > Thanks - I see that the lock is taken internal to checkAndMutate.
> >>> >
> >>> > I'm wondering whether it is a better idea to actually pass in a
> >>> > Constraint (or even Constraints) as the checkAndMutate argument.
> Right
> >>> > now it is taking in an Comparator and a CompareOp for verification.
> >>> > But, this could just be a special case of Constraint which is
> >>> > evaluated within the lock.
> >>> >
> >>> > In other words, we could open up a richer Constraint checking api
> >>> > where any "functional" Constraint check can be performed in the
> >>> > checkAndPut operation.
> >>> >
> >>> > This would also not have the same performance impact of taking a
> >>> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
> >>> > really (in my mind) implementing the compare-and-set more
> generically.
> >>> >
> >>> > I also see the potential of passing in multiple constraints (say
> >>> > upper/lower bounds in Increment/Decrement operations) etc.
> >>> >
> >>> > --Suraj
> >>> >
> >>> >
> >>> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yu...@gmail.com> wrote:
> >>> >> From HRegionServer.checkAndPut():
> >>> >>    if (region.getCoprocessorHost() != null) {
> >>> >>      Boolean result = region.getCoprocessorHost()
> >>> >>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL,
> >>> comparator,
> >>> >>          put);
> >>> >> ...
> >>> >>    boolean result = checkAndMutate(regionName, row, family,
> qualifier,
> >>> >>      CompareOp.EQUAL, new BinaryComparator(value), put,
> >>> >>      lock);
> >>> >> We can see that the lock isn't taken for preCheckAndPut().
> >>> >>
> >>> >> To satisfy Suraj's requirement, I think a slight change to
> >>> checkAndPut() is
> >>> >> needed so that atomicity can be achieved across preCheckAndPut() and
> >>> >> checkAndMutate().
> >>> >>
> >>> >> Cheers
> >>> >>
> >>> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com>
> >>> wrote:
> >>> >>
> >>> >>> Just so my question is clear ... everything I'm suggesting is in
> the
> >>> >>> context of a single row (not cross row / table). - so, yes, I'm
> >>> >>> guessing obtaining a RowLock on the region side during
> preCheckAndPut
> >>> >>> / postCheckAndPut would certainly work. Which was why I was asking
> >>> >>> whether the pre/postCheckAndPut obtains the row lock or whether the
> >>> >>> row lock is only obtained within checkAndPut.
> >>> >>>
> >>> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ...
> will
> >>> >>> that even work? i.e. can the same rowlock be inherited by the
> >>> >>> checkAndPut api within that thread's context? Or will
> preCheckAndPut
> >>> >>> have to release the lock so that checkAndPut can take it (which
> won't
> >>> >>> work for my case, as it has to be atomic between the preCheck and
> >>> >>> Put.)
> >>> >>>
> >>> >>> Thanks for pointing me to the Constraints functionality - I'll
> take a
> >>> >>> look at whether it could potentially work.
> >>> >>> --Suraj
> >>> >>>
> >>> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <
> jesse.k.yates@gmail.com>
> >>> >>> wrote:
> >>> >>> > I think the feature you are looking for is a Constraint.
> Currently
> >>> they
> >>> >>> are
> >>> >>> > being added to 0.94 in
> >>> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
> >>> >>> > they are almost ready to be rolled in, and backporting to 0.92 is
> >>> >>> > definitely doable.
> >>> >>> >
> >>> >>> > However, Constraints aren't going to be quite flexible enough to
> >>> >>> > efficiently support what you are describing. For instance, with a
> >>> >>> > constraint, you are ideally just checking the put value against
> some
> >>> >>> simple
> >>> >>> > constraint (never over 10 or always an integer), but looking at
> the
> >>> >>> current
> >>> >>> > state of the table before allowing the put would currently
> require
> >>> >>> creating
> >>> >>> > a full blown connection to the local table through another
> HTable.
> >>> >>> >
> >>> >>> > In the short term, you could write a simple coprocessor to do
> this
> >>> >>> checking
> >>> >>> > and then move over to constraints (which are a simpler, more
> >>> flexible,
> >>> >>> way
> >>> >>> > of doing this) when the necessary features have been added.
> >>> >>> >
> >>> >>> > It is worth discussing if it makes sense to have access to the
> local
> >>> >>> region
> >>> >>> > through a constraint, though that breaks the idea a little bit,
> it
> >>> would
> >>> >>> > certainly be useful and not overly wasteful in terms of runtime.
> >>> >>> >
> >>> >>> > Supposing the feature would be added to talk to the local table,
> and
> >>> >>> since
> >>> >>> > the puts are going to be serialized on the regionserver (at
> least to
> >>> that
> >>> >>> > single row you are trying to update), you will never get a
> situation
> >>> >>> where
> >>> >>> > the value added is over the threshold. If you were really worried
> >>> about
> >>> >>> the
> >>> >>> > atomicity of the operation, then when doing the put, first get
> the
> >>> >>> RowLock,
> >>> >>> > then do the put and release the RowLock. However, that latter
> method
> >>> is
> >>> >>> > going to be really slow, so should only be used as a stop gap if
> the
> >>> >>> > constraint doesn't work as expected, until a patch is made for
> >>> >>> constraints.
> >>> >>> >
> >>> >>> > Feel free to open up a ticket and link it to 4605 for adding the
> >>> local
> >>> >>> > table access functionality, and we can discuss the de/merits of
> >>> adding
> >>> >>> the
> >>> >>> > access.
> >>> >>> >
> >>> >>> > -Jesse
> >>> >>> >
> >>> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <svarma.ng@gmail.com
> >
> >>> wrote:
> >>> >>> >
> >>> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
> >>> >>> >> coprocessors and I'm wondering ... are these pre/post checks
> done
> >>> >>> >> _after_ taking the row lock or is the row lock only done within
> the
> >>> >>> >> checkAndPut api.
> >>> >>> >>
> >>> >>> >> I'm interested in seeing if we can implement something like:
> >>> >>> >> (in pseudo sql)
> >>> >>> >> update table-name
> >>> >>> >> set column-name = new-value
> >>> >>> >> where (column-value - new-value) > threshold-value
> >>> >>> >>
> >>> >>> >> Basically ... I want to enhance the checkAndPut to not just
> compare
> >>> >>> >> "values" ... but apply an arbitrary function on the value
> >>> _atomically_
> >>> >>> >> in the Put call. Multiple threads would be firing these
> mutations
> >>> and
> >>> >>> >> I'd like the threshold-value above to never be breached under
> any
> >>> >>> >> circumstance.
> >>> >>> >>
> >>> >>> >> Is there a solution that can be implemented either via
> checkAndPut
> >>> or
> >>> >>> >> using coprocessors preCheckAndPut? If not, would this be a
> useful
> >>> >>> >> feature to build in HBase?
> >>> >>> >>
> >>> >>> >> Thanks,
> >>> >>> >> --Suraj
> >>> >>> >>
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>> > --
> >>> >>> > -------------------
> >>> >>> > Jesse Yates
> >>> >>> > 240-888-2200
> >>> >>> > @jesse_yates
> >>> >>>
> >>>
>

-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates

Re: Question on Coprocessors and Atomicity

Posted by Suraj Varma <sv...@gmail.com>.
I've been following HBASE-4605 with interest and I'm going through the
patches. I don't want to take away from all the hard work that's gone
into it ...

The more I think of it, I'm wondering how the Constraint can be
enforced without enforcing atomicity.

>From the jira description, the intention of this feature is:
"Essentially, people would implement a 'Constraint' interface for
checking keys before they are put into a table. Puts that are valid
get written to the table, but if not people can will throw an
exception that gets propagated back to the client explaining why the
put was invalid."

If the row lock is released between the time the coprocessor finishes
"preXXXX" checks and the core mutation method is invoked (as has been
discussed in this thread), how can the Constraint be ensured? If two
requests are being processed in parallel, there is every possibility
that both requests pass the "Constraint" check individually, but break
it together (e.g. even simple checks like column value == 10 would
break if two requests fire concurrently).

So - I'm questioning whether a pure Coprocessor implementation alone
would be sufficient?

I think we'll need an approach that makes the constraint checking and
mutation to be _atomically_ achieved
a) either by taking a row lock and passing that into put / checkAndPut
b) referencing & checking the constraint directly from within the put
/ checkAndPut methods (like we do with the comparator, for instance)
under a row lock.

Without being able to atomically enforce the constraint, I'm wondering
if it is misleading to future users who may create a constraint that
may fail to be enforced under heavy concurrent use.

I know a lot of work has gone into the patches ... but I thought it
better to discuss this before rolling it out to the larger community
... :)

Thanks,
--Suraj


On Fri, Dec 9, 2011 at 1:31 PM, Suraj Varma <sv...@gmail.com> wrote:
> Hi:
> I opened a jira ticket on this: https://issues.apache.org/jira/browse/HBASE-4999
>
> I have linked to HBASE-4605 in the description to show related work on
> Constraints by Jesse.
>
> Thanks!
> --Suraj
>
> On Sun, Dec 4, 2011 at 1:10 PM, Ted Yu <yu...@gmail.com> wrote:
>> Currently ConstraintProcessor latches onto prePut() to perform validation
>> check.
>>
>> From HRegion.doMiniBatchPut() where prePut() is called:
>>    /* Run coprocessor pre hook outside of locks to avoid deadlock */
>> So to make use of Constraint in Suraj's scenario, we have some decisions to
>> make about various factors.
>>
>> Cheers
>>
>> On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <sv...@gmail.com> wrote:
>>
>>> Jesse:
>>> >> Quick soln - write a CP to check the single row (blocking the put).
>>>
>>> Yeah - given that I want this to be atomically done, I'm wondering if
>>> this would even work (because, I believe I'd need to unlock the row so
>>> that the checkAndMutate can take the lock - so, there is a brief
>>> window between where there is no lock being held - and some other
>>> thread could take that lock). One option would be to pass in a lock to
>>> checkAndMutate ... but that would increase the locking period and may
>>> have performance implications, I think.
>>>
>>> I see a lot of potential in the Constraints implementation - it would
>>> really open up CAS operations to do functional constraint checking,
>>> rather than just value comparisons.
>>>
>>> --Suraj
>>>
>>> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <sv...@gmail.com> wrote:
>>> > Thanks - I see that the lock is taken internal to checkAndMutate.
>>> >
>>> > I'm wondering whether it is a better idea to actually pass in a
>>> > Constraint (or even Constraints) as the checkAndMutate argument. Right
>>> > now it is taking in an Comparator and a CompareOp for verification.
>>> > But, this could just be a special case of Constraint which is
>>> > evaluated within the lock.
>>> >
>>> > In other words, we could open up a richer Constraint checking api
>>> > where any "functional" Constraint check can be performed in the
>>> > checkAndPut operation.
>>> >
>>> > This would also not have the same performance impact of taking a
>>> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
>>> > really (in my mind) implementing the compare-and-set more generically.
>>> >
>>> > I also see the potential of passing in multiple constraints (say
>>> > upper/lower bounds in Increment/Decrement operations) etc.
>>> >
>>> > --Suraj
>>> >
>>> >
>>> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yu...@gmail.com> wrote:
>>> >> From HRegionServer.checkAndPut():
>>> >>    if (region.getCoprocessorHost() != null) {
>>> >>      Boolean result = region.getCoprocessorHost()
>>> >>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL,
>>> comparator,
>>> >>          put);
>>> >> ...
>>> >>    boolean result = checkAndMutate(regionName, row, family, qualifier,
>>> >>      CompareOp.EQUAL, new BinaryComparator(value), put,
>>> >>      lock);
>>> >> We can see that the lock isn't taken for preCheckAndPut().
>>> >>
>>> >> To satisfy Suraj's requirement, I think a slight change to
>>> checkAndPut() is
>>> >> needed so that atomicity can be achieved across preCheckAndPut() and
>>> >> checkAndMutate().
>>> >>
>>> >> Cheers
>>> >>
>>> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com>
>>> wrote:
>>> >>
>>> >>> Just so my question is clear ... everything I'm suggesting is in the
>>> >>> context of a single row (not cross row / table). - so, yes, I'm
>>> >>> guessing obtaining a RowLock on the region side during preCheckAndPut
>>> >>> / postCheckAndPut would certainly work. Which was why I was asking
>>> >>> whether the pre/postCheckAndPut obtains the row lock or whether the
>>> >>> row lock is only obtained within checkAndPut.
>>> >>>
>>> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
>>> >>> that even work? i.e. can the same rowlock be inherited by the
>>> >>> checkAndPut api within that thread's context? Or will preCheckAndPut
>>> >>> have to release the lock so that checkAndPut can take it (which won't
>>> >>> work for my case, as it has to be atomic between the preCheck and
>>> >>> Put.)
>>> >>>
>>> >>> Thanks for pointing me to the Constraints functionality - I'll take a
>>> >>> look at whether it could potentially work.
>>> >>> --Suraj
>>> >>>
>>> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com>
>>> >>> wrote:
>>> >>> > I think the feature you are looking for is a Constraint. Currently
>>> they
>>> >>> are
>>> >>> > being added to 0.94 in
>>> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
>>> >>> > they are almost ready to be rolled in, and backporting to 0.92 is
>>> >>> > definitely doable.
>>> >>> >
>>> >>> > However, Constraints aren't going to be quite flexible enough to
>>> >>> > efficiently support what you are describing. For instance, with a
>>> >>> > constraint, you are ideally just checking the put value against some
>>> >>> simple
>>> >>> > constraint (never over 10 or always an integer), but looking at the
>>> >>> current
>>> >>> > state of the table before allowing the put would currently require
>>> >>> creating
>>> >>> > a full blown connection to the local table through another HTable.
>>> >>> >
>>> >>> > In the short term, you could write a simple coprocessor to do this
>>> >>> checking
>>> >>> > and then move over to constraints (which are a simpler, more
>>> flexible,
>>> >>> way
>>> >>> > of doing this) when the necessary features have been added.
>>> >>> >
>>> >>> > It is worth discussing if it makes sense to have access to the local
>>> >>> region
>>> >>> > through a constraint, though that breaks the idea a little bit, it
>>> would
>>> >>> > certainly be useful and not overly wasteful in terms of runtime.
>>> >>> >
>>> >>> > Supposing the feature would be added to talk to the local table, and
>>> >>> since
>>> >>> > the puts are going to be serialized on the regionserver (at least to
>>> that
>>> >>> > single row you are trying to update), you will never get a situation
>>> >>> where
>>> >>> > the value added is over the threshold. If you were really worried
>>> about
>>> >>> the
>>> >>> > atomicity of the operation, then when doing the put, first get the
>>> >>> RowLock,
>>> >>> > then do the put and release the RowLock. However, that latter method
>>> is
>>> >>> > going to be really slow, so should only be used as a stop gap if the
>>> >>> > constraint doesn't work as expected, until a patch is made for
>>> >>> constraints.
>>> >>> >
>>> >>> > Feel free to open up a ticket and link it to 4605 for adding the
>>> local
>>> >>> > table access functionality, and we can discuss the de/merits of
>>> adding
>>> >>> the
>>> >>> > access.
>>> >>> >
>>> >>> > -Jesse
>>> >>> >
>>> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com>
>>> wrote:
>>> >>> >
>>> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
>>> >>> >> coprocessors and I'm wondering ... are these pre/post checks done
>>> >>> >> _after_ taking the row lock or is the row lock only done within the
>>> >>> >> checkAndPut api.
>>> >>> >>
>>> >>> >> I'm interested in seeing if we can implement something like:
>>> >>> >> (in pseudo sql)
>>> >>> >> update table-name
>>> >>> >> set column-name = new-value
>>> >>> >> where (column-value - new-value) > threshold-value
>>> >>> >>
>>> >>> >> Basically ... I want to enhance the checkAndPut to not just compare
>>> >>> >> "values" ... but apply an arbitrary function on the value
>>> _atomically_
>>> >>> >> in the Put call. Multiple threads would be firing these mutations
>>> and
>>> >>> >> I'd like the threshold-value above to never be breached under any
>>> >>> >> circumstance.
>>> >>> >>
>>> >>> >> Is there a solution that can be implemented either via checkAndPut
>>> or
>>> >>> >> using coprocessors preCheckAndPut? If not, would this be a useful
>>> >>> >> feature to build in HBase?
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >> --Suraj
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > -------------------
>>> >>> > Jesse Yates
>>> >>> > 240-888-2200
>>> >>> > @jesse_yates
>>> >>>
>>>

Re: Question on Coprocessors and Atomicity

Posted by Suraj Varma <sv...@gmail.com>.
Hi:
I opened a jira ticket on this: https://issues.apache.org/jira/browse/HBASE-4999

I have linked to HBASE-4605 in the description to show related work on
Constraints by Jesse.

Thanks!
--Suraj

On Sun, Dec 4, 2011 at 1:10 PM, Ted Yu <yu...@gmail.com> wrote:
> Currently ConstraintProcessor latches onto prePut() to perform validation
> check.
>
> From HRegion.doMiniBatchPut() where prePut() is called:
>    /* Run coprocessor pre hook outside of locks to avoid deadlock */
> So to make use of Constraint in Suraj's scenario, we have some decisions to
> make about various factors.
>
> Cheers
>
> On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <sv...@gmail.com> wrote:
>
>> Jesse:
>> >> Quick soln - write a CP to check the single row (blocking the put).
>>
>> Yeah - given that I want this to be atomically done, I'm wondering if
>> this would even work (because, I believe I'd need to unlock the row so
>> that the checkAndMutate can take the lock - so, there is a brief
>> window between where there is no lock being held - and some other
>> thread could take that lock). One option would be to pass in a lock to
>> checkAndMutate ... but that would increase the locking period and may
>> have performance implications, I think.
>>
>> I see a lot of potential in the Constraints implementation - it would
>> really open up CAS operations to do functional constraint checking,
>> rather than just value comparisons.
>>
>> --Suraj
>>
>> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <sv...@gmail.com> wrote:
>> > Thanks - I see that the lock is taken internal to checkAndMutate.
>> >
>> > I'm wondering whether it is a better idea to actually pass in a
>> > Constraint (or even Constraints) as the checkAndMutate argument. Right
>> > now it is taking in an Comparator and a CompareOp for verification.
>> > But, this could just be a special case of Constraint which is
>> > evaluated within the lock.
>> >
>> > In other words, we could open up a richer Constraint checking api
>> > where any "functional" Constraint check can be performed in the
>> > checkAndPut operation.
>> >
>> > This would also not have the same performance impact of taking a
>> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
>> > really (in my mind) implementing the compare-and-set more generically.
>> >
>> > I also see the potential of passing in multiple constraints (say
>> > upper/lower bounds in Increment/Decrement operations) etc.
>> >
>> > --Suraj
>> >
>> >
>> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yu...@gmail.com> wrote:
>> >> From HRegionServer.checkAndPut():
>> >>    if (region.getCoprocessorHost() != null) {
>> >>      Boolean result = region.getCoprocessorHost()
>> >>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL,
>> comparator,
>> >>          put);
>> >> ...
>> >>    boolean result = checkAndMutate(regionName, row, family, qualifier,
>> >>      CompareOp.EQUAL, new BinaryComparator(value), put,
>> >>      lock);
>> >> We can see that the lock isn't taken for preCheckAndPut().
>> >>
>> >> To satisfy Suraj's requirement, I think a slight change to
>> checkAndPut() is
>> >> needed so that atomicity can be achieved across preCheckAndPut() and
>> >> checkAndMutate().
>> >>
>> >> Cheers
>> >>
>> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com>
>> wrote:
>> >>
>> >>> Just so my question is clear ... everything I'm suggesting is in the
>> >>> context of a single row (not cross row / table). - so, yes, I'm
>> >>> guessing obtaining a RowLock on the region side during preCheckAndPut
>> >>> / postCheckAndPut would certainly work. Which was why I was asking
>> >>> whether the pre/postCheckAndPut obtains the row lock or whether the
>> >>> row lock is only obtained within checkAndPut.
>> >>>
>> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
>> >>> that even work? i.e. can the same rowlock be inherited by the
>> >>> checkAndPut api within that thread's context? Or will preCheckAndPut
>> >>> have to release the lock so that checkAndPut can take it (which won't
>> >>> work for my case, as it has to be atomic between the preCheck and
>> >>> Put.)
>> >>>
>> >>> Thanks for pointing me to the Constraints functionality - I'll take a
>> >>> look at whether it could potentially work.
>> >>> --Suraj
>> >>>
>> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com>
>> >>> wrote:
>> >>> > I think the feature you are looking for is a Constraint. Currently
>> they
>> >>> are
>> >>> > being added to 0.94 in
>> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
>> >>> > they are almost ready to be rolled in, and backporting to 0.92 is
>> >>> > definitely doable.
>> >>> >
>> >>> > However, Constraints aren't going to be quite flexible enough to
>> >>> > efficiently support what you are describing. For instance, with a
>> >>> > constraint, you are ideally just checking the put value against some
>> >>> simple
>> >>> > constraint (never over 10 or always an integer), but looking at the
>> >>> current
>> >>> > state of the table before allowing the put would currently require
>> >>> creating
>> >>> > a full blown connection to the local table through another HTable.
>> >>> >
>> >>> > In the short term, you could write a simple coprocessor to do this
>> >>> checking
>> >>> > and then move over to constraints (which are a simpler, more
>> flexible,
>> >>> way
>> >>> > of doing this) when the necessary features have been added.
>> >>> >
>> >>> > It is worth discussing if it makes sense to have access to the local
>> >>> region
>> >>> > through a constraint, though that breaks the idea a little bit, it
>> would
>> >>> > certainly be useful and not overly wasteful in terms of runtime.
>> >>> >
>> >>> > Supposing the feature would be added to talk to the local table, and
>> >>> since
>> >>> > the puts are going to be serialized on the regionserver (at least to
>> that
>> >>> > single row you are trying to update), you will never get a situation
>> >>> where
>> >>> > the value added is over the threshold. If you were really worried
>> about
>> >>> the
>> >>> > atomicity of the operation, then when doing the put, first get the
>> >>> RowLock,
>> >>> > then do the put and release the RowLock. However, that latter method
>> is
>> >>> > going to be really slow, so should only be used as a stop gap if the
>> >>> > constraint doesn't work as expected, until a patch is made for
>> >>> constraints.
>> >>> >
>> >>> > Feel free to open up a ticket and link it to 4605 for adding the
>> local
>> >>> > table access functionality, and we can discuss the de/merits of
>> adding
>> >>> the
>> >>> > access.
>> >>> >
>> >>> > -Jesse
>> >>> >
>> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com>
>> wrote:
>> >>> >
>> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
>> >>> >> coprocessors and I'm wondering ... are these pre/post checks done
>> >>> >> _after_ taking the row lock or is the row lock only done within the
>> >>> >> checkAndPut api.
>> >>> >>
>> >>> >> I'm interested in seeing if we can implement something like:
>> >>> >> (in pseudo sql)
>> >>> >> update table-name
>> >>> >> set column-name = new-value
>> >>> >> where (column-value - new-value) > threshold-value
>> >>> >>
>> >>> >> Basically ... I want to enhance the checkAndPut to not just compare
>> >>> >> "values" ... but apply an arbitrary function on the value
>> _atomically_
>> >>> >> in the Put call. Multiple threads would be firing these mutations
>> and
>> >>> >> I'd like the threshold-value above to never be breached under any
>> >>> >> circumstance.
>> >>> >>
>> >>> >> Is there a solution that can be implemented either via checkAndPut
>> or
>> >>> >> using coprocessors preCheckAndPut? If not, would this be a useful
>> >>> >> feature to build in HBase?
>> >>> >>
>> >>> >> Thanks,
>> >>> >> --Suraj
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > -------------------
>> >>> > Jesse Yates
>> >>> > 240-888-2200
>> >>> > @jesse_yates
>> >>>
>>

Re: Question on Coprocessors and Atomicity

Posted by Ted Yu <yu...@gmail.com>.
Currently ConstraintProcessor latches onto prePut() to perform validation
check.

>From HRegion.doMiniBatchPut() where prePut() is called:
    /* Run coprocessor pre hook outside of locks to avoid deadlock */
So to make use of Constraint in Suraj's scenario, we have some decisions to
make about various factors.

Cheers

On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <sv...@gmail.com> wrote:

> Jesse:
> >> Quick soln - write a CP to check the single row (blocking the put).
>
> Yeah - given that I want this to be atomically done, I'm wondering if
> this would even work (because, I believe I'd need to unlock the row so
> that the checkAndMutate can take the lock - so, there is a brief
> window between where there is no lock being held - and some other
> thread could take that lock). One option would be to pass in a lock to
> checkAndMutate ... but that would increase the locking period and may
> have performance implications, I think.
>
> I see a lot of potential in the Constraints implementation - it would
> really open up CAS operations to do functional constraint checking,
> rather than just value comparisons.
>
> --Suraj
>
> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <sv...@gmail.com> wrote:
> > Thanks - I see that the lock is taken internal to checkAndMutate.
> >
> > I'm wondering whether it is a better idea to actually pass in a
> > Constraint (or even Constraints) as the checkAndMutate argument. Right
> > now it is taking in an Comparator and a CompareOp for verification.
> > But, this could just be a special case of Constraint which is
> > evaluated within the lock.
> >
> > In other words, we could open up a richer Constraint checking api
> > where any "functional" Constraint check can be performed in the
> > checkAndPut operation.
> >
> > This would also not have the same performance impact of taking a
> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
> > really (in my mind) implementing the compare-and-set more generically.
> >
> > I also see the potential of passing in multiple constraints (say
> > upper/lower bounds in Increment/Decrement operations) etc.
> >
> > --Suraj
> >
> >
> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yu...@gmail.com> wrote:
> >> From HRegionServer.checkAndPut():
> >>    if (region.getCoprocessorHost() != null) {
> >>      Boolean result = region.getCoprocessorHost()
> >>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL,
> comparator,
> >>          put);
> >> ...
> >>    boolean result = checkAndMutate(regionName, row, family, qualifier,
> >>      CompareOp.EQUAL, new BinaryComparator(value), put,
> >>      lock);
> >> We can see that the lock isn't taken for preCheckAndPut().
> >>
> >> To satisfy Suraj's requirement, I think a slight change to
> checkAndPut() is
> >> needed so that atomicity can be achieved across preCheckAndPut() and
> >> checkAndMutate().
> >>
> >> Cheers
> >>
> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com>
> wrote:
> >>
> >>> Just so my question is clear ... everything I'm suggesting is in the
> >>> context of a single row (not cross row / table). - so, yes, I'm
> >>> guessing obtaining a RowLock on the region side during preCheckAndPut
> >>> / postCheckAndPut would certainly work. Which was why I was asking
> >>> whether the pre/postCheckAndPut obtains the row lock or whether the
> >>> row lock is only obtained within checkAndPut.
> >>>
> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
> >>> that even work? i.e. can the same rowlock be inherited by the
> >>> checkAndPut api within that thread's context? Or will preCheckAndPut
> >>> have to release the lock so that checkAndPut can take it (which won't
> >>> work for my case, as it has to be atomic between the preCheck and
> >>> Put.)
> >>>
> >>> Thanks for pointing me to the Constraints functionality - I'll take a
> >>> look at whether it could potentially work.
> >>> --Suraj
> >>>
> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com>
> >>> wrote:
> >>> > I think the feature you are looking for is a Constraint. Currently
> they
> >>> are
> >>> > being added to 0.94 in
> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
> >>> > they are almost ready to be rolled in, and backporting to 0.92 is
> >>> > definitely doable.
> >>> >
> >>> > However, Constraints aren't going to be quite flexible enough to
> >>> > efficiently support what you are describing. For instance, with a
> >>> > constraint, you are ideally just checking the put value against some
> >>> simple
> >>> > constraint (never over 10 or always an integer), but looking at the
> >>> current
> >>> > state of the table before allowing the put would currently require
> >>> creating
> >>> > a full blown connection to the local table through another HTable.
> >>> >
> >>> > In the short term, you could write a simple coprocessor to do this
> >>> checking
> >>> > and then move over to constraints (which are a simpler, more
> flexible,
> >>> way
> >>> > of doing this) when the necessary features have been added.
> >>> >
> >>> > It is worth discussing if it makes sense to have access to the local
> >>> region
> >>> > through a constraint, though that breaks the idea a little bit, it
> would
> >>> > certainly be useful and not overly wasteful in terms of runtime.
> >>> >
> >>> > Supposing the feature would be added to talk to the local table, and
> >>> since
> >>> > the puts are going to be serialized on the regionserver (at least to
> that
> >>> > single row you are trying to update), you will never get a situation
> >>> where
> >>> > the value added is over the threshold. If you were really worried
> about
> >>> the
> >>> > atomicity of the operation, then when doing the put, first get the
> >>> RowLock,
> >>> > then do the put and release the RowLock. However, that latter method
> is
> >>> > going to be really slow, so should only be used as a stop gap if the
> >>> > constraint doesn't work as expected, until a patch is made for
> >>> constraints.
> >>> >
> >>> > Feel free to open up a ticket and link it to 4605 for adding the
> local
> >>> > table access functionality, and we can discuss the de/merits of
> adding
> >>> the
> >>> > access.
> >>> >
> >>> > -Jesse
> >>> >
> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com>
> wrote:
> >>> >
> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
> >>> >> coprocessors and I'm wondering ... are these pre/post checks done
> >>> >> _after_ taking the row lock or is the row lock only done within the
> >>> >> checkAndPut api.
> >>> >>
> >>> >> I'm interested in seeing if we can implement something like:
> >>> >> (in pseudo sql)
> >>> >> update table-name
> >>> >> set column-name = new-value
> >>> >> where (column-value - new-value) > threshold-value
> >>> >>
> >>> >> Basically ... I want to enhance the checkAndPut to not just compare
> >>> >> "values" ... but apply an arbitrary function on the value
> _atomically_
> >>> >> in the Put call. Multiple threads would be firing these mutations
> and
> >>> >> I'd like the threshold-value above to never be breached under any
> >>> >> circumstance.
> >>> >>
> >>> >> Is there a solution that can be implemented either via checkAndPut
> or
> >>> >> using coprocessors preCheckAndPut? If not, would this be a useful
> >>> >> feature to build in HBase?
> >>> >>
> >>> >> Thanks,
> >>> >> --Suraj
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > -------------------
> >>> > Jesse Yates
> >>> > 240-888-2200
> >>> > @jesse_yates
> >>>
>

Re: Question on Coprocessors and Atomicity

Posted by Suraj Varma <sv...@gmail.com>.
Jesse:
>> Quick soln - write a CP to check the single row (blocking the put).

Yeah - given that I want this to be atomically done, I'm wondering if
this would even work (because, I believe I'd need to unlock the row so
that the checkAndMutate can take the lock - so, there is a brief
window between where there is no lock being held - and some other
thread could take that lock). One option would be to pass in a lock to
checkAndMutate ... but that would increase the locking period and may
have performance implications, I think.

I see a lot of potential in the Constraints implementation - it would
really open up CAS operations to do functional constraint checking,
rather than just value comparisons.

--Suraj

On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <sv...@gmail.com> wrote:
> Thanks - I see that the lock is taken internal to checkAndMutate.
>
> I'm wondering whether it is a better idea to actually pass in a
> Constraint (or even Constraints) as the checkAndMutate argument. Right
> now it is taking in an Comparator and a CompareOp for verification.
> But, this could just be a special case of Constraint which is
> evaluated within the lock.
>
> In other words, we could open up a richer Constraint checking api
> where any "functional" Constraint check can be performed in the
> checkAndPut operation.
>
> This would also not have the same performance impact of taking a
> rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
> really (in my mind) implementing the compare-and-set more generically.
>
> I also see the potential of passing in multiple constraints (say
> upper/lower bounds in Increment/Decrement operations) etc.
>
> --Suraj
>
>
> On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yu...@gmail.com> wrote:
>> From HRegionServer.checkAndPut():
>>    if (region.getCoprocessorHost() != null) {
>>      Boolean result = region.getCoprocessorHost()
>>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL, comparator,
>>          put);
>> ...
>>    boolean result = checkAndMutate(regionName, row, family, qualifier,
>>      CompareOp.EQUAL, new BinaryComparator(value), put,
>>      lock);
>> We can see that the lock isn't taken for preCheckAndPut().
>>
>> To satisfy Suraj's requirement, I think a slight change to checkAndPut() is
>> needed so that atomicity can be achieved across preCheckAndPut() and
>> checkAndMutate().
>>
>> Cheers
>>
>> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com> wrote:
>>
>>> Just so my question is clear ... everything I'm suggesting is in the
>>> context of a single row (not cross row / table). - so, yes, I'm
>>> guessing obtaining a RowLock on the region side during preCheckAndPut
>>> / postCheckAndPut would certainly work. Which was why I was asking
>>> whether the pre/postCheckAndPut obtains the row lock or whether the
>>> row lock is only obtained within checkAndPut.
>>>
>>> Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
>>> that even work? i.e. can the same rowlock be inherited by the
>>> checkAndPut api within that thread's context? Or will preCheckAndPut
>>> have to release the lock so that checkAndPut can take it (which won't
>>> work for my case, as it has to be atomic between the preCheck and
>>> Put.)
>>>
>>> Thanks for pointing me to the Constraints functionality - I'll take a
>>> look at whether it could potentially work.
>>> --Suraj
>>>
>>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com>
>>> wrote:
>>> > I think the feature you are looking for is a Constraint. Currently they
>>> are
>>> > being added to 0.94 in
>>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
>>> > they are almost ready to be rolled in, and backporting to 0.92 is
>>> > definitely doable.
>>> >
>>> > However, Constraints aren't going to be quite flexible enough to
>>> > efficiently support what you are describing. For instance, with a
>>> > constraint, you are ideally just checking the put value against some
>>> simple
>>> > constraint (never over 10 or always an integer), but looking at the
>>> current
>>> > state of the table before allowing the put would currently require
>>> creating
>>> > a full blown connection to the local table through another HTable.
>>> >
>>> > In the short term, you could write a simple coprocessor to do this
>>> checking
>>> > and then move over to constraints (which are a simpler, more flexible,
>>> way
>>> > of doing this) when the necessary features have been added.
>>> >
>>> > It is worth discussing if it makes sense to have access to the local
>>> region
>>> > through a constraint, though that breaks the idea a little bit, it would
>>> > certainly be useful and not overly wasteful in terms of runtime.
>>> >
>>> > Supposing the feature would be added to talk to the local table, and
>>> since
>>> > the puts are going to be serialized on the regionserver (at least to that
>>> > single row you are trying to update), you will never get a situation
>>> where
>>> > the value added is over the threshold. If you were really worried about
>>> the
>>> > atomicity of the operation, then when doing the put, first get the
>>> RowLock,
>>> > then do the put and release the RowLock. However, that latter method is
>>> > going to be really slow, so should only be used as a stop gap if the
>>> > constraint doesn't work as expected, until a patch is made for
>>> constraints.
>>> >
>>> > Feel free to open up a ticket and link it to 4605 for adding the local
>>> > table access functionality, and we can discuss the de/merits of adding
>>> the
>>> > access.
>>> >
>>> > -Jesse
>>> >
>>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com> wrote:
>>> >
>>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
>>> >> coprocessors and I'm wondering ... are these pre/post checks done
>>> >> _after_ taking the row lock or is the row lock only done within the
>>> >> checkAndPut api.
>>> >>
>>> >> I'm interested in seeing if we can implement something like:
>>> >> (in pseudo sql)
>>> >> update table-name
>>> >> set column-name = new-value
>>> >> where (column-value - new-value) > threshold-value
>>> >>
>>> >> Basically ... I want to enhance the checkAndPut to not just compare
>>> >> "values" ... but apply an arbitrary function on the value _atomically_
>>> >> in the Put call. Multiple threads would be firing these mutations and
>>> >> I'd like the threshold-value above to never be breached under any
>>> >> circumstance.
>>> >>
>>> >> Is there a solution that can be implemented either via checkAndPut or
>>> >> using coprocessors preCheckAndPut? If not, would this be a useful
>>> >> feature to build in HBase?
>>> >>
>>> >> Thanks,
>>> >> --Suraj
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > -------------------
>>> > Jesse Yates
>>> > 240-888-2200
>>> > @jesse_yates
>>>

Re: Question on Coprocessors and Atomicity

Posted by Suraj Varma <sv...@gmail.com>.
Thanks - I see that the lock is taken internal to checkAndMutate.

I'm wondering whether it is a better idea to actually pass in a
Constraint (or even Constraints) as the checkAndMutate argument. Right
now it is taking in an Comparator and a CompareOp for verification.
But, this could just be a special case of Constraint which is
evaluated within the lock.

In other words, we could open up a richer Constraint checking api
where any "functional" Constraint check can be performed in the
checkAndPut operation.

This would also not have the same performance impact of taking a
rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
really (in my mind) implementing the compare-and-set more generically.

I also see the potential of passing in multiple constraints (say
upper/lower bounds in Increment/Decrement operations) etc.

--Suraj


On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <yu...@gmail.com> wrote:
> From HRegionServer.checkAndPut():
>    if (region.getCoprocessorHost() != null) {
>      Boolean result = region.getCoprocessorHost()
>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL, comparator,
>          put);
> ...
>    boolean result = checkAndMutate(regionName, row, family, qualifier,
>      CompareOp.EQUAL, new BinaryComparator(value), put,
>      lock);
> We can see that the lock isn't taken for preCheckAndPut().
>
> To satisfy Suraj's requirement, I think a slight change to checkAndPut() is
> needed so that atomicity can be achieved across preCheckAndPut() and
> checkAndMutate().
>
> Cheers
>
> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com> wrote:
>
>> Just so my question is clear ... everything I'm suggesting is in the
>> context of a single row (not cross row / table). - so, yes, I'm
>> guessing obtaining a RowLock on the region side during preCheckAndPut
>> / postCheckAndPut would certainly work. Which was why I was asking
>> whether the pre/postCheckAndPut obtains the row lock or whether the
>> row lock is only obtained within checkAndPut.
>>
>> Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
>> that even work? i.e. can the same rowlock be inherited by the
>> checkAndPut api within that thread's context? Or will preCheckAndPut
>> have to release the lock so that checkAndPut can take it (which won't
>> work for my case, as it has to be atomic between the preCheck and
>> Put.)
>>
>> Thanks for pointing me to the Constraints functionality - I'll take a
>> look at whether it could potentially work.
>> --Suraj
>>
>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com>
>> wrote:
>> > I think the feature you are looking for is a Constraint. Currently they
>> are
>> > being added to 0.94 in
>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
>> > they are almost ready to be rolled in, and backporting to 0.92 is
>> > definitely doable.
>> >
>> > However, Constraints aren't going to be quite flexible enough to
>> > efficiently support what you are describing. For instance, with a
>> > constraint, you are ideally just checking the put value against some
>> simple
>> > constraint (never over 10 or always an integer), but looking at the
>> current
>> > state of the table before allowing the put would currently require
>> creating
>> > a full blown connection to the local table through another HTable.
>> >
>> > In the short term, you could write a simple coprocessor to do this
>> checking
>> > and then move over to constraints (which are a simpler, more flexible,
>> way
>> > of doing this) when the necessary features have been added.
>> >
>> > It is worth discussing if it makes sense to have access to the local
>> region
>> > through a constraint, though that breaks the idea a little bit, it would
>> > certainly be useful and not overly wasteful in terms of runtime.
>> >
>> > Supposing the feature would be added to talk to the local table, and
>> since
>> > the puts are going to be serialized on the regionserver (at least to that
>> > single row you are trying to update), you will never get a situation
>> where
>> > the value added is over the threshold. If you were really worried about
>> the
>> > atomicity of the operation, then when doing the put, first get the
>> RowLock,
>> > then do the put and release the RowLock. However, that latter method is
>> > going to be really slow, so should only be used as a stop gap if the
>> > constraint doesn't work as expected, until a patch is made for
>> constraints.
>> >
>> > Feel free to open up a ticket and link it to 4605 for adding the local
>> > table access functionality, and we can discuss the de/merits of adding
>> the
>> > access.
>> >
>> > -Jesse
>> >
>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com> wrote:
>> >
>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
>> >> coprocessors and I'm wondering ... are these pre/post checks done
>> >> _after_ taking the row lock or is the row lock only done within the
>> >> checkAndPut api.
>> >>
>> >> I'm interested in seeing if we can implement something like:
>> >> (in pseudo sql)
>> >> update table-name
>> >> set column-name = new-value
>> >> where (column-value - new-value) > threshold-value
>> >>
>> >> Basically ... I want to enhance the checkAndPut to not just compare
>> >> "values" ... but apply an arbitrary function on the value _atomically_
>> >> in the Put call. Multiple threads would be firing these mutations and
>> >> I'd like the threshold-value above to never be breached under any
>> >> circumstance.
>> >>
>> >> Is there a solution that can be implemented either via checkAndPut or
>> >> using coprocessors preCheckAndPut? If not, would this be a useful
>> >> feature to build in HBase?
>> >>
>> >> Thanks,
>> >> --Suraj
>> >>
>> >
>> >
>> >
>> > --
>> > -------------------
>> > Jesse Yates
>> > 240-888-2200
>> > @jesse_yates
>>

Re: Question on Coprocessors and Atomicity

Posted by Ted Yu <yu...@gmail.com>.
>From HRegionServer.checkAndPut():
    if (region.getCoprocessorHost() != null) {
      Boolean result = region.getCoprocessorHost()
        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL, comparator,
          put);
...
    boolean result = checkAndMutate(regionName, row, family, qualifier,
      CompareOp.EQUAL, new BinaryComparator(value), put,
      lock);
We can see that the lock isn't taken for preCheckAndPut().

To satisfy Suraj's requirement, I think a slight change to checkAndPut() is
needed so that atomicity can be achieved across preCheckAndPut() and
checkAndMutate().

Cheers

On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <sv...@gmail.com> wrote:

> Just so my question is clear ... everything I'm suggesting is in the
> context of a single row (not cross row / table). - so, yes, I'm
> guessing obtaining a RowLock on the region side during preCheckAndPut
> / postCheckAndPut would certainly work. Which was why I was asking
> whether the pre/postCheckAndPut obtains the row lock or whether the
> row lock is only obtained within checkAndPut.
>
> Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
> that even work? i.e. can the same rowlock be inherited by the
> checkAndPut api within that thread's context? Or will preCheckAndPut
> have to release the lock so that checkAndPut can take it (which won't
> work for my case, as it has to be atomic between the preCheck and
> Put.)
>
> Thanks for pointing me to the Constraints functionality - I'll take a
> look at whether it could potentially work.
> --Suraj
>
> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com>
> wrote:
> > I think the feature you are looking for is a Constraint. Currently they
> are
> > being added to 0.94 in
> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
> > they are almost ready to be rolled in, and backporting to 0.92 is
> > definitely doable.
> >
> > However, Constraints aren't going to be quite flexible enough to
> > efficiently support what you are describing. For instance, with a
> > constraint, you are ideally just checking the put value against some
> simple
> > constraint (never over 10 or always an integer), but looking at the
> current
> > state of the table before allowing the put would currently require
> creating
> > a full blown connection to the local table through another HTable.
> >
> > In the short term, you could write a simple coprocessor to do this
> checking
> > and then move over to constraints (which are a simpler, more flexible,
> way
> > of doing this) when the necessary features have been added.
> >
> > It is worth discussing if it makes sense to have access to the local
> region
> > through a constraint, though that breaks the idea a little bit, it would
> > certainly be useful and not overly wasteful in terms of runtime.
> >
> > Supposing the feature would be added to talk to the local table, and
> since
> > the puts are going to be serialized on the regionserver (at least to that
> > single row you are trying to update), you will never get a situation
> where
> > the value added is over the threshold. If you were really worried about
> the
> > atomicity of the operation, then when doing the put, first get the
> RowLock,
> > then do the put and release the RowLock. However, that latter method is
> > going to be really slow, so should only be used as a stop gap if the
> > constraint doesn't work as expected, until a patch is made for
> constraints.
> >
> > Feel free to open up a ticket and link it to 4605 for adding the local
> > table access functionality, and we can discuss the de/merits of adding
> the
> > access.
> >
> > -Jesse
> >
> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com> wrote:
> >
> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
> >> coprocessors and I'm wondering ... are these pre/post checks done
> >> _after_ taking the row lock or is the row lock only done within the
> >> checkAndPut api.
> >>
> >> I'm interested in seeing if we can implement something like:
> >> (in pseudo sql)
> >> update table-name
> >> set column-name = new-value
> >> where (column-value - new-value) > threshold-value
> >>
> >> Basically ... I want to enhance the checkAndPut to not just compare
> >> "values" ... but apply an arbitrary function on the value _atomically_
> >> in the Put call. Multiple threads would be firing these mutations and
> >> I'd like the threshold-value above to never be breached under any
> >> circumstance.
> >>
> >> Is there a solution that can be implemented either via checkAndPut or
> >> using coprocessors preCheckAndPut? If not, would this be a useful
> >> feature to build in HBase?
> >>
> >> Thanks,
> >> --Suraj
> >>
> >
> >
> >
> > --
> > -------------------
> > Jesse Yates
> > 240-888-2200
> > @jesse_yates
>

Re: Question on Coprocessors and Atomicity

Posted by Suraj Varma <sv...@gmail.com>.
Just so my question is clear ... everything I'm suggesting is in the
context of a single row (not cross row / table). - so, yes, I'm
guessing obtaining a RowLock on the region side during preCheckAndPut
/ postCheckAndPut would certainly work. Which was why I was asking
whether the pre/postCheckAndPut obtains the row lock or whether the
row lock is only obtained within checkAndPut.

Let's say the coprocessor takes a rowlock in preCheckAndPut ... will
that even work? i.e. can the same rowlock be inherited by the
checkAndPut api within that thread's context? Or will preCheckAndPut
have to release the lock so that checkAndPut can take it (which won't
work for my case, as it has to be atomic between the preCheck and
Put.)

Thanks for pointing me to the Constraints functionality - I'll take a
look at whether it could potentially work.
--Suraj

On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <je...@gmail.com> wrote:
> I think the feature you are looking for is a Constraint. Currently they are
> being added to 0.94 in
> HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
> they are almost ready to be rolled in, and backporting to 0.92 is
> definitely doable.
>
> However, Constraints aren't going to be quite flexible enough to
> efficiently support what you are describing. For instance, with a
> constraint, you are ideally just checking the put value against some simple
> constraint (never over 10 or always an integer), but looking at the current
> state of the table before allowing the put would currently require creating
> a full blown connection to the local table through another HTable.
>
> In the short term, you could write a simple coprocessor to do this checking
> and then move over to constraints (which are a simpler, more flexible, way
> of doing this) when the necessary features have been added.
>
> It is worth discussing if it makes sense to have access to the local region
> through a constraint, though that breaks the idea a little bit, it would
> certainly be useful and not overly wasteful in terms of runtime.
>
> Supposing the feature would be added to talk to the local table, and since
> the puts are going to be serialized on the regionserver (at least to that
> single row you are trying to update), you will never get a situation where
> the value added is over the threshold. If you were really worried about the
> atomicity of the operation, then when doing the put, first get the RowLock,
> then do the put and release the RowLock. However, that latter method is
> going to be really slow, so should only be used as a stop gap if the
> constraint doesn't work as expected, until a patch is made for constraints.
>
> Feel free to open up a ticket and link it to 4605 for adding the local
> table access functionality, and we can discuss the de/merits of adding the
> access.
>
> -Jesse
>
> On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com> wrote:
>
>> I'm looking at the preCheckAndPut / postCheckAndPut api with
>> coprocessors and I'm wondering ... are these pre/post checks done
>> _after_ taking the row lock or is the row lock only done within the
>> checkAndPut api.
>>
>> I'm interested in seeing if we can implement something like:
>> (in pseudo sql)
>> update table-name
>> set column-name = new-value
>> where (column-value - new-value) > threshold-value
>>
>> Basically ... I want to enhance the checkAndPut to not just compare
>> "values" ... but apply an arbitrary function on the value _atomically_
>> in the Put call. Multiple threads would be firing these mutations and
>> I'd like the threshold-value above to never be breached under any
>> circumstance.
>>
>> Is there a solution that can be implemented either via checkAndPut or
>> using coprocessors preCheckAndPut? If not, would this be a useful
>> feature to build in HBase?
>>
>> Thanks,
>> --Suraj
>>
>
>
>
> --
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates

Re: Question on Coprocessors and Atomicity

Posted by Jesse Yates <je...@gmail.com>.
I think the feature you are looking for is a Constraint. Currently they are
being added to 0.94 in
HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
they are almost ready to be rolled in, and backporting to 0.92 is
definitely doable.

However, Constraints aren't going to be quite flexible enough to
efficiently support what you are describing. For instance, with a
constraint, you are ideally just checking the put value against some simple
constraint (never over 10 or always an integer), but looking at the current
state of the table before allowing the put would currently require creating
a full blown connection to the local table through another HTable.

In the short term, you could write a simple coprocessor to do this checking
and then move over to constraints (which are a simpler, more flexible, way
of doing this) when the necessary features have been added.

It is worth discussing if it makes sense to have access to the local region
through a constraint, though that breaks the idea a little bit, it would
certainly be useful and not overly wasteful in terms of runtime.

Supposing the feature would be added to talk to the local table, and since
the puts are going to be serialized on the regionserver (at least to that
single row you are trying to update), you will never get a situation where
the value added is over the threshold. If you were really worried about the
atomicity of the operation, then when doing the put, first get the RowLock,
then do the put and release the RowLock. However, that latter method is
going to be really slow, so should only be used as a stop gap if the
constraint doesn't work as expected, until a patch is made for constraints.

Feel free to open up a ticket and link it to 4605 for adding the local
table access functionality, and we can discuss the de/merits of adding the
access.

-Jesse

On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <sv...@gmail.com> wrote:

> I'm looking at the preCheckAndPut / postCheckAndPut api with
> coprocessors and I'm wondering ... are these pre/post checks done
> _after_ taking the row lock or is the row lock only done within the
> checkAndPut api.
>
> I'm interested in seeing if we can implement something like:
> (in pseudo sql)
> update table-name
> set column-name = new-value
> where (column-value - new-value) > threshold-value
>
> Basically ... I want to enhance the checkAndPut to not just compare
> "values" ... but apply an arbitrary function on the value _atomically_
> in the Put call. Multiple threads would be firing these mutations and
> I'd like the threshold-value above to never be breached under any
> circumstance.
>
> Is there a solution that can be implemented either via checkAndPut or
> using coprocessors preCheckAndPut? If not, would this be a useful
> feature to build in HBase?
>
> Thanks,
> --Suraj
>



-- 
-------------------
Jesse Yates
240-888-2200
@jesse_yates